PORTFOLIO PROJECTS
2016-2022 Vancouver Crime Data Exploration and Modelling
In this project, I applied data analytics and machine learning methodologies for the Vancouver Police Department (VPD) to predict hourly theft crimes across different neighbourhoods in Vancouver BC. Multiple data sources were eventually being incorporated into the original crime dataset from VPD for data exploration and feature engineering. The objective of initial data analysis was to identify key crime patterns that could provide direction for further analysis, dashboard building and model-development. We examined the overall trends of reported crime cases, and analyzed time-related patterns and geographical related patterns. Theft crime accounts for most of the crimes in Vancouver so I decided to narrow down our focus to Theft crimes when building machine learning models. The ultimate goal was to implement predictive policing to help reduce crime, while mitigating risk to law enforcement officers.
Datasets were prepared via cleaning, transforming, joining and aggregating in SQL and Python
Visualizing time-related patterns and geographical related patterns on Tableau
Building Binary Classification model to classify high risk and low risk of theft crime activities at a given location and hour
The model successfully captured 81% of unseen instances where the actual theft crime was greater than 3 cases per neighbourhood per hour
New York Taxi Analysis
In this project, I applied and demonstrated the data science pipeline to explore and use deterministic features to predict how much a cab driver can earn per hour in different areas of New York. The knowledge of which areas earn more or earn less in any given day and any given hour allows the union to distribute cab assignments more equitably across different areas, and to rotate cab drivers between higher and lower-income areas on a daily basis. This way, there will not be cab driver would dominate in a high-income area while another cab driver would be continuously evicted to a low-income area.
Utilizing Python pandas, numpy, matplotlib for Data Exploration, Data Cleaning and Data Preparation
Identifying trends and visualizing time series data of taxi trips
Building machine learning algorithms to predict taxi fares in New York
Detailed documentation of the each step in jupyter notebook
The Python packages I utilized: pandas|numpy|matplotlib|scikit-learn
COVID-19 DATA EXPLORATIONS
Identifying trends and visualizing time series data of global, continental, country cases and deaths
Examining how different vaccine manufacturers contributed in case reduction
Examining the adequacy of policy response to pandemic by analyzing stringency index and cases
The tools I utilized: SQL|Tableau
POPULATION GROWTH AND ENVIRONMENTAL DESTRUCTION
How has the planet been adversely affected since the population boom of the past 100 years?
What are the links between climate change, resource depletion, natural disasters and overpopulation, and what are the implications for humanity?
Why is the "Great Green Goal" obsolete and our existing problems will increase further? What is the role of celebrities? Politicians, or even activists? Have they really accomplished anything?
The tools I utilized: Excel|Tableau|Canva
Air Transport Database Design
Before the pandemic, the number of air travel had reached its all time peak. But at the same time there have been many disturbing incidents such as long delays and overbooked flights. There were even passengers being dragged out from the plane and airline companies paying large sums of fines for their extreme delays. Therefore, in our BCIT course project, I and my group had looked into the problem from the perspective of IATA and see how we could utilize data to analyze the inefficiency of airline operation. The database is modelled based on the 7Ws dimensional modelling technique.
The goal of the database is to measure the data of passenger counts and delay figures for each flight
Developing Star Schema ER diagram in MySQL Workbench
Performed query in SQL to narrow down problems and answer core questions
Transform query results to actionable insights to tackle flight overbooking and delays
The project includes 3 sections: Project Overview, Database Design Process, SQL Query