ANALYTICS PROJECTS
Bioinformatics Study - Assessing Druggability in Drug Discovery
-
Language: SAS
Using in-vitro absorption, distribution, metabolism, and elimination (ADME) variables to predict the druggability (good vs. bad) of 1279 small molecules from the DrugBank 3.0 database.
Some of the multivariate analyses were performed in this study included:
​
​
-
Principal Component Analysis (PCA) with a reduced dimension of 3 components
Non-Violator




2.Discriminant Analysis on 2 groups of molecules




A discriminant analysis using the quadratic discriminant function (derived from within-group covariance matrices) was performed.
Resubstition

About 93.75% of actual non-violator molecules were correctly classified as a non-violator, and about 64.67% of actual violator molecules were correctly classified as a violator. The total error count estimate was about 14.97%.
BLE RSSI - Indoor Localisation
-
Language: Python
The research goal was to find a classification model that would be able to predict a location from RSSI readings of Beacons.
​
Data visualisations helped to understand the distribution of the locations, and the relationship between beacons.
​
Two classifiers: k-nearest neighbour (knn) and decision tree were built and compared.


Both knn and decision tree models performed relatively well in predicting the locations from RSSI signals, around 80%, but KNN is preferred, because it does not overfit the model like the decision tree. The amount of instances per location are not balanced, that would cause a problem with cross-validation, where there are not enough minimum samples in each class. As there are too many RSSI signals that were out-of-range, the author should identify the possible reasons, or maybe reposition the beacons.
Investigating the Relationship Between Total Sugar and Energy in Australian Breakfast Cereals
-
Language: R
Hypothesis: There is a weak positive linear relationship between total sugar and energy levels in breakfast cereals.



