CMPT 353 Lecture Notes

  1. Course Introduction [“Course Introduction” slides]
    1. This Course [This Course slides]
    2. Offering Strategy [Offering Strategy slides]
    3. Grades [Grades slides]
    4. Exercises [Exercises slides]
    5. Project [Project slides]
    6. Quizzes [Quizzes slides]
    7. Us [Us slides]
    8. Lectures and Labs [Lectures and Labs slides]
    9. References [References slides]
    10. Programming [Programming slides]
    11. Expectations [Expectations slides]
    12. Computational Data Science? [Computational Data Science? slides]
    13. Data Science? [Data Science? slides]
    14. Why Data Science? [Why Data Science? slides]
    15. Topics (1) [Topics (1) slides]
  2. Data Analysis Pipeline [“Data Analysis Pipeline” slides]
    1. Your Question [Your Question slides]
    2. Getting Data [Getting Data slides]
    3. Preparing Data [Preparing Data slides]
    4. Analyzing Data [Analyzing Data slides]
    5. Presenting Results [Presenting Results slides]
    6. Creating a Pipeline [Creating a Pipeline slides]
    7. Manual Pipeline Steps [Manual Pipeline Steps slides]
    8. The Pipeline [The Pipeline slides]
  3. Data In Python [“Data In Python” slides]
    1. Built-In Data Structures [Built-In Data Structures slides]
    2. NumPy [NumPy slides]
    3. Operating on Arrays [Operating on Arrays slides]
    4. Pandas [Pandas slides]
    5. Working With Pandas [Working With Pandas slides]
  4. Getting Data [“Getting Data” slides]
    1. Where Data Comes From [Where Data Comes From slides]
    2. Data from Files [Data from Files slides]
    3. Databases [Databases slides]
    4. Web APIs [Web APIs slides]
    5. Scraping HTML [Scraping HTML slides]
    6. File Formats [File Formats slides]
    7. CSV [CSV slides]
    8. JSON [JSON slides]
    9. XML [XML slides]
    10. Others [Others slides]
  5. Extract-Transform-Load [“Extract-Transform-Load” slides]
    1. Extract [Extract slides]
    2. Transform [Transform slides]
    3. Load [Load slides]
    4. Summary [Summary slides]
  6. Noise Filtering [“Noise Filtering” slides]
    1. Noise [Noise slides]
    2. LOESS Smoothing [LOESS Smoothing slides]
    3. LOESS in Python [LOESS in Python slides]
    4. Kalman Filtering [Kalman Filtering slides]
    5. Probability Distributions [Probability Distributions slides]
    6. Kalman Operation [Kalman Operation slides]
    7. Kalman Predictions [Kalman Predictions slides]
    8. Kalman Variances [Kalman Variances slides]
    9. pykalman [pykalman slides]
    10. Kalman Example [Kalman Example slides]
    11. Kalman Parameters [Kalman Parameters slides]
    12. Kalman Summary [Kalman Summary slides]
    13. Kalman Links [Kalman Links slides]
    14. Other Filtering [Other Filtering slides]
  7. Cleaning Data [“Cleaning Data” slides]
    1. Validity [Validity slides]
    2. Outliers [Outliers slides]
    3. Finding Outliers [Finding Outliers slides]
    4. Handling Outliers [Handling Outliers slides]
    5. Imputation [Imputation slides]
    6. Noise Filtering [Noise Filtering slides]
    7. Entity Resolution [Entity Resolution slides]
    8. Regular Expressions [Regular Expressions slides]
    9. Python re [Python re slides]
    10. Regex Summary [Regex Summary slides]
  8. Stats Review [“Stats Review” slides]
    1. Context [Context slides]
    2. Types of Data [Types of Data slides]
    3. Population and Samples [Population and Samples slides]
    4. Probability Distributions [Probability Distributions slides]
    5. Central Tendancy [Central Tendancy slides]
    6. Dispersion [Dispersion slides]
    7. Relationships [Relationships slides]
    8. Plotting Data [Plotting Data slides]
    9. Specific Distributions [Specific Distributions slides]
    10. Normal Distribution [Normal Distribution slides]
  9. Inferential Stats [“Inferential Stats” slides]
    1. Hypotheses [Hypotheses slides]
    2. T-Test [T-Test slides]
    3. p-values [p-values slides]
    4. Failure to Reject [Failure to Reject slides]
    5. Test Assumptions [Test Assumptions slides]
    6. Testing Normality [Testing Normality slides]
    7. Equal Variance Test [Equal Variance Test slides]
    8. Transforming Data [Transforming Data slides]
  10. Statistical Tests [“Statistical Tests” slides]
    1. Multiple Groups [Multiple Groups slides]
    2. ANOVA [ANOVA slides]
    3. Post Hoc Analysis [Post Hoc Analysis slides]
    4. One- vs Two-Tailed Tests [One- vs Two-Tailed Tests slides]
    5. Hacking p-values [Hacking p-values slides]
    6. Central Limit Theorem [Central Limit Theorem slides]
    7. It's Probably Okay [It's Probably Okay slides]
    8. Mann–Whitney U-test [Mann–Whitney U-test slides]
    9. Chi-Square [Chi-Square slides]
    10. Regression [Regression slides]
    11. Stats Summary [Stats Summary slides]
  11. Machine Learning
  12. ML: Classification
  13. ML: Other Techniques
  14. Big Data and Spark
  15. How Spark Calculates
  16. Working With Spark
  17. Other DataFrame Tools
  18. Data Warehouses
  19. Aside: NumPy/Pandas Speed
  20. Communicating
  21. More Data Science

Course home page. Data Analysis Pipeline slide.

Schedule, Summer 2025

Week Deliverables (*) Lecture Hour Lecture Date First Slide Video Link
1 1 May 13
2 May 13
3 May 16
2 Exer 1 4 May 20
5 May 20
6 May 23
3 Exer 2 7 May 27
8 May 27
9 May 30
4 Exer 3 10 Jun 3
11 Jun 3
12 Jun 6
5 Exer 4 13 Jun 10
14 Jun 10
15 Jun 13
6 Exer 5, Quiz 1 16 Jun 17
17 Jun 17
18 Jun 20
7 Exer 6 19 Jun 24
20 Jun 24
21 Jun 27
8 Exer 7 22 Jul 1
23 Jul 1
24 Jul 4
9 Exer 8 25 Jul 8
26 Jul 8
27 Jul 11
10 Exer 9, Quiz 2 28 Jul 15
29 Jul 15
30 Jul 18
11 Exer 10 31 Jul 22
32 Jul 22
33 Jul 25
12 Exer 11 34 Jul 29
35 Jul 29
36 Aug 1
13 Exer 12, Quiz 3 37 Aug 5
38 Aug 5
39 Aug 8
14 Project

* Check CourSys for the actual due dates and times.