![](https://images.squarespace-cdn.com/content/v1/6126e55f8a151570c718b269/9bf1e470-3f70-4ab2-b081-9a4392f3cddc/unsplash-image-DHzJKAfeKFM.jpg)
JONATHAN AU
Data Analyst | Financial Analyst
Hi, I am Jonathan
I am a financial data analyst based in Vancouver Canada.
I’ve been working as a financial analyst for the past 3 years and have a passion for sharing my learnings in data analytics. I obtained my commerce degree specializing in Finance from UBC Vancouver.
I love conducting industry research and analyzing data, trends, results and generating insights. I am in the transition phase to start a career in data and BI analyst; hence why I created this site to share my thoughts and learnings.
My mission is to transform data into valuable, comprehensive insights to improve results and make data-driven decisions. My application framework is not limited to driving business growth, but also includes discovering the truth about our environment and society. I care about the future of our nature and the well-being of humanity because these issues will always be there and deserve a lifetime of attention.
Skillsets
Investment Research
As a financial analyst, I have developed a keen sense for interpreting financial data through the calculations of different metrics. The SOP for investment research is to understand the industry, the company, and its valuation. Such experience allows me to quickly understand a company's business nature, profitability, and competitive edge.
Dashboards
Presenting data in an inspiring way can be done by an interactive design. By clicking and drilling, stakeholders can examine trends and patterns themselves. Using storytelling technique to tell data insights makes it easy for everyone involved in the project to absorb the message and meaning within it than if the same message was presented simply in facts and figures.
Data Analysis
Turning raw data into meaningful insights requires analytical tools and contextual information. I follow SOPs to guide my data analysis process, with the goal of combining quantitative and qualitative insights to discover causal relationships and make better decisions. I can approach data problems with data analytical approach and the data science approach, depending on the problem statement and context. Both approaches do share some knowledge but the approach to solve a problem works differently, as visualized below.
Recent Projects
2016-2022 Vancouver Crime Data Exploration and Modelling
In this project, I applied data analytics and machine learning methodologies for the Vancouver Police Department (VPD) to predict hourly theft crimes across different neighbourhoods in Vancouver BC. Multiple data sources were eventually being incorporated into the original crime dataset from VPD for data exploration and feature engineering. The objective of initial data analysis was to identify key crime patterns that could provide direction for further analysis, dashboard building and model-development. We examined the overall trends of reported crime cases, and analyzed time-related patterns and geographical related patterns. Theft crime accounts for most of the crimes in Vancouver so I decided to narrow down our focus to Theft crimes when building machine learning models. The ultimate goal was to implement predictive policing to help reduce crime, while mitigating risk to law enforcement officers.
Datasets were prepared via cleaning, transforming, joining and aggregating in SQL and Python
Visualizing time-related patterns and geographical related patterns on Tableau
Building Binary Classification model to classify high risk and low risk of theft crime activities at a given location and hour
The model successfully captured 81% of unseen instances where the actual theft crime was greater than 3 cases per neighbourhood per hour
New York Taxi Analysis
In this project, I applied and demonstrated the data science pipeline to explore and use deterministic features to predict how much a cab driver can earn per hour in different areas of New York. The knowledge of which areas earn more or earn less in any given day and any given hour allows the union to distribute cab assignments more equitably across different areas, and to rotate cab drivers between higher and lower-income areas on a daily basis. This way, there will not be cab driver would dominate in a high-income area while another cab driver would be continuously evicted to a low-income area.
Utilizing Python pandas, numpy, matplotlib for Data Exploration, Data Cleaning and Data Preparation
Identifying trends and visualizing time series data of taxi trips
Building machine learning algorithms to predict taxi fares in New York
Detailed documentation of the each step in jupyter notebook
The Python packages I utilized: pandas|numpy|matplotlib|scikit-learn
COVID-19 DATA EXPLORATIONS
Identifying trends and visualizing time series data of global, continental, country cases and deaths
Examining how different vaccine manufacturers contributed in case reduction
Examining the adequacy of policy response to pandemic by analyzing stringency index and cases
The tools I utilized: SQL|Tableau
POPULATION GROWTH AND ENVIRONMENTAL DESTRUCTION
How has the planet been adversely affected since the population boom of the past 100 years?
What are the links between climate change, resource depletion, natural disasters and overpopulation, and what are the implications for humanity?
Why is the "Great Green Goal" obsolete and our existing problems will increase further? What is the role of celebrities? Politicians, or even activists? Have they really accomplished anything?
The tools I utilized: Excel|Tableau|Canva
Air Transport Database Design Project
Before the pandemic, the number of air travel had reached its all time peak. But at the same time there have been many disturbing incidents such as long delays and overbooked flights. There were even passengers being dragged out from the plane and airline companies paying large sums of fines for their extreme delays. Therefore, in our BCIT course project, I and my group had looked into the problem from the perspective of IATA and see how we could utilize data to analyze the inefficiency of airline operation. The database is modelled based on the 7Ws dimensional modelling technique.
The goal of the database is to measure the data of passenger counts and delay figures for each flight
Developing Star Schema ER diagram in MySQL Workbench
Performed query in SQL to narrow down problems and answer core questions
Transform query results to actionable insights to tackle flight overbooking and delays
The project includes 3 sections: Project Overview, Database Design Process, SQL Query