Devin's Portfolio

About Me

I am a data scientist currently gaining industry experience working as an Associate Machine Learning Developer at AltaML's Applied AI Lab. I am also enrolled in the Data Science and Analytics program at the University of Calgary, where I expect to graduate with my Master's degree at the end of this year. My undergraduate degree is also from the University of Calgary, where I graduated in 2019 with a Bachelor of Science in Mathematics.

I am fascinated by the way data science and machine learning apply mathematical and statistical principles to the real world in order to solve complex problems. I have experience working on data science and machine learning projects in both an academic and industry setting. In my personal life, I am an avid record collector and music and film enthusiast, and I have played drums both in the studio and live on stage around Calgary.

Feel free to reach out! For a quick response, you can contact me by email at devinnorris@live.ca or on Linkedin. Thanks for checking out my site!

Technical Aptitudes

Programming Languages and Tools
  • Python
  • Jupyter Notebooks
  • R
  • SQL
  • Tableau
  • Git
  • Google Cloud Platform
  • Data Science Packages and Libraries
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Plotly
  • Scikit-learn
  • TensorFlow
  • Keras

Project Experience

A collection of professional, personal and academic projects demonstrating my proficiency and interest in areas including machine learning, statistical modelling, and data analysis. Click the photo to access the GitHub repository or article for each project.

Professional Projects

AltaML Applied AI Lab

  • Worked to ideate, develop, and communicate data-driven business insight for industry partner ATB Financial.
  • Gained technical experience regarding project ideation and research, data processing, cloud computing, version control, ML model development, and software development.
  • Collaborated with a diverse team of data scientists to communicate our results to ATB via weekly external client facing presentations.
  • Contributed to the development of multiple ML projects, including but not limited to supervised learning for value prediction and unsupervised learning for high value customer segregation.

Personal Projects

Exploratory Data Analysis and Classification Algorithms on Insurance Claim Data

  • Performed cleaning and feature engineering on car insurance claim data using NumPy and Pandas.
  • Used Matplotlib and Seaborn to create visualizations and explore trends in the data. Trained multiple classification models via Scikit-learn and tuned parameters to improve performance.
  • Final product is a ML model which takes customer attributes as input and predicts whether they will make more than one insurance claim.

A Data Driven Exploration of Video Games

  • A data story in which I explore global sales of video games and their Metacritic ratings using data scraped from the web.
  • Generated insight about video game sales over time, the most popular video games and consoles, and how critic and user scores compare to a game's popularity.
  • Back end data processing and visualizations were done using Tableau and Python libraries Pandas and Seaborn.
  • Published in Analytics Vidhya on Medium.

Academic Projects

Movie Genre Classifier/NLP

  • Built a classification model trained on text data that predicts a movie's genre based on its plot summary.
  • Feature encoding and text preprocessing done using Scikit-learn with Count and TF_IDF Vectorizers.
  • Both classical machine learning and deep learning methods were used, including Logistic Regression, Random Forest, Multinomial Naive Bayes, and LSTM networks.

Multiple Regression Analysis of Canadian COVID-19 Data

  • Created a statistical model in R using COVID-19 and provincial health measures datasets.
  • Model predicts per capita COVID-19 infection rates using predictor variables like rural population percentage and prevalence of COPD and mood disorders.
  • Worked collaboratively with three teammates to formulate the project and present our findings.

Exploratory Data Analysis of Calgary's Traffic Data

  • Analyzed traffic trends in Calgary using data visualizations such as bar charts, treemaps, geospatial visualizations, and more with pandas, matplotlib, geopandas, and plotly.
  • Results could be meaningful for traffic control, policing, insurance, urban planning, municipal budgeting, driver education, medical and emergency services.
  • Worked collaboratively with two teammates to formulate the project and present our findings.

Data Science Coursework

Fall 2020

DATA 601 - A+
Working with Data and Visualization
Fundamental data science concepts including data organization, data collection, and data cleaning in Python. Includes a review of programming concepts in Python, as well as an introduction to the fundamentals of data visualization and critical thinking with data. Also provides an introduction to data ethics, security, and privacy.
DATA 602 - A
Statistical Data Analysis
The foundations of statistical inference including the application of probability models to data, as well as an introduction to simulation-based and classical statistical inference, and the creation of statistical models with R.
DATA 603 - A+
Statistical Modelling with Data
The creation of complex statistical models, including exposure to multivariate model selection, prediction, the statistical design of experiments and analysis of data in R.
DATA 604 - A+
Big Data Management
Data storage and manipulation at both desktop and cloud scales. Introduces core database concepts and provides a practical introduction to both SQL and NoSQL systems. Also introduces parallel and distributed computing concepts including distributed storage and large scale parallel data processing using MapReduce. Design and implementation of new data visualizations to aid analysis, with emphasis on the practical and ethical implications of design and analysis decisions.

Winter 2021

DATA 605 - A
Actionable Visualization and Analytics
Deeper tools, skills, and techniques for collecting, manipulating, visualizing, analyzing, and presenting a number of different common types of data. With a data life-cycle perspective, looks into data elicitation and preparation as well as the actual usage of data in a decision-making context. Introduces techniques for visualizing and supporting the interactive analysis and decision making on large complex datasets. Focus on critical thinking and good analysis practices to avoid cognitive biases when designing, thinking, analyzing, and making decisions based on data.
DATA 606 - A
Statistical Methods in Data Science
Design of surveys and data collection, bias and efficiency of surveys. Sampling weights and variance estimation. Multi-way contingency tables and introduction to generalized linear models with emphasis on applications.
DATA 607 - A
Statistical and Machine Learning
Advancement of the linear statistical model including introduction to data transformation methods, classification, model assessment and selection. Exposure to both supervised learning and unsupervised learning.
DATA 608 - A+
Developing Big Data Applications
Provides advanced coverage of tools and techniques for big data management and for processing, mining, and building applications that leverage large datasets. Addresses database and distributed storage design for both SQL and NoSQL systems, and focuses on the application of distributed computing tools to perform data integration, apply machine learning, and build applications that leverage big data.