ANALYTICS PROJECTS
The Influence of Demographic & Socio-Economic Factors on Transport Mode to School in Melbourne
-
Language: R
There are many ways which school students travel to school, and there are many factors that can affect these travel choices.
The aim of this project was to explore whether socio-economic (e.g. home region, household income, no. of vehicles/bicycles) and demographic (e.g. age, gender) factors influence the type of transport that Melbourne primary and high school students use to travel to school.
A multinomial logistic regression model (with variable selection) was fitted to determine which factors were significant. The predictive performance of the model was also tested.






Twitter Posts - Language Identification
-
Language: Python
Classifier: Linear Support Vector Machine
Training Text Sources: JRC-Acquis, Debian, Wikipedia, Twitter
​
Two features (text (vectorised), and location ((one-hot enoding)) were used to calculate the interactions between them.
1-4 character n-grams were considered for the corpus dictionary.
A base threshold (at least 35%) needed to be met for the predicted language probability. A further threshold, separately, for common languages (English (85%), Italian (78%), Spanish (80%), Dutch (75%), etc..) were added.
Accuracy for development set was 92.722%, and on Kaggle testing set was 92.665%. So trained model was fitted very well.
Evaluating Trained Classifier





Preprocessing Text


Predicting Languages on Test Set

