top of page

ANALYTICS PROJECTS

The Influence of Demographic & Socio-Economic Factors on Transport Mode to School in Melbourne

  • Language: R

There are many ways which school students travel to school, and there are many factors that can affect these travel choices. 

The aim of this project was to explore whether socio-economic (e.g. home region, household income, no. of vehicles/bicycles) and demographic (e.g. age, gender) factors influence the type of transport that Melbourne primary and high school students use to travel to school. 

 

A multinomial logistic regression model (with variable selection) was fitted to determine which factors were significant. The predictive performance of the model was also tested.

Transport Analysis - Age Probability.png
Transport Analysis - Confusion Matrix.png

Twitter Posts - Language Identification

  • Language: Python

Classifier: Linear Support Vector Machine

 

Training Text Sources: JRC-Acquis, Debian, Wikipedia, Twitter

​

Two features (text (vectorised), and location ((one-hot enoding)) were used to calculate the interactions between them.

 

1-4 character n-grams were considered for the corpus dictionary.

 

A base threshold (at least 35%)  needed to be met for the predicted language probability. A further threshold, separately, for common languages (English (85%), Italian (78%), Spanish (80%), Dutch (75%), etc..) were added.

 

Accuracy for development set was 92.722%, and on Kaggle testing set was 92.665%. So trained model was fitted very well.

Evaluating Trained Classifier
Twitter Language - Document Frequency by Language and Source.png
Preprocessing Text
Twitter Language - JRC-Acquis - Original Text.png
Twitter Language - JRC-Acquis - Pre-processed Text.png
Predicting Languages on Test Set
Twitter Language - Test Set - Preview.png
Twitter Language -  Test Set Predicted.png

Website by Tiffany Ng

bottom of page