CMPT 353 Lecture Notes

  1. Course Introduction [“Course Introduction” slides]
    1. This Course [This Course slides]
    2. Offering Strategy [Offering Strategy slides]
    3. Grades [Grades slides]
    4. Exercises [Exercises slides]
    5. Project [Project slides]
    6. Quizzes/Exam [Quizzes/Exam slides]
    7. Us [Us slides]
    8. Lectures and Labs [Lectures and Labs slides]
    9. References [References slides]
    10. Programming [Programming slides]
    11. Expectations [Expectations slides]
    12. Computational Data Science? [Computational Data Science? slides]
    13. Data Science? [Data Science? slides]
    14. Why Data Science? [Why Data Science? slides]
    15. Topics (1) [Topics (1) slides]
  2. Data Analysis Pipeline [“Data Analysis Pipeline” slides]
    1. Your Question [Your Question slides]
    2. Getting Data [Getting Data slides]
    3. Preparing Data [Preparing Data slides]
    4. Analyzing Data [Analyzing Data slides]
    5. Presenting Results [Presenting Results slides]
    6. Creating a Pipeline [Creating a Pipeline slides]
    7. Manual Pipeline Steps [Manual Pipeline Steps slides]
    8. The Pipeline [The Pipeline slides]
  3. Data In Python [“Data In Python” slides]
    1. Built-In Data Structures [Built-In Data Structures slides]
    2. NumPy [NumPy slides]
    3. Operating on Arrays [Operating on Arrays slides]
    4. Pandas [Pandas slides]
    5. Working With Pandas [Working With Pandas slides]
  4. Getting Data [“Getting Data” slides]
    1. Where Data Comes From [Where Data Comes From slides]
    2. Data from Files [Data from Files slides]
    3. Databases [Databases slides]
    4. Web APIs [Web APIs slides]
    5. Scraping HTML [Scraping HTML slides]
    6. File Formats [File Formats slides]
    7. CSV [CSV slides]
    8. JSON [JSON slides]
    9. XML [XML slides]
    10. Others [Others slides]
  5. Extract-Transform-Load [“Extract-Transform-Load” slides]
    1. Extract [Extract slides]
    2. Transform [Transform slides]
    3. Load [Load slides]
    4. Summary [Summary slides]
  6. Noise Filtering [“Noise Filtering” slides]
    1. Noise [Noise slides]
    2. LOESS Smoothing [LOESS Smoothing slides]
    3. LOESS in Python [LOESS in Python slides]
    4. Kalman Filtering [Kalman Filtering slides]
    5. Probability Distributions [Probability Distributions slides]
    6. Kalman Operation [Kalman Operation slides]
    7. Kalman Predictions [Kalman Predictions slides]
    8. Kalman Variances [Kalman Variances slides]
    9. pykalman [pykalman slides]
    10. Kalman Example [Kalman Example slides]
    11. Kalman Parameters [Kalman Parameters slides]
    12. Kalman Summary [Kalman Summary slides]
    13. Kalman Links [Kalman Links slides]
    14. Other Filtering [Other Filtering slides]
  7. Cleaning Data [“Cleaning Data” slides]
    1. Validity [Validity slides]
    2. Outliers [Outliers slides]
    3. Finding Outliers [Finding Outliers slides]
    4. Handling Outliers [Handling Outliers slides]
    5. Imputation [Imputation slides]
    6. Noise Filtering [Noise Filtering slides]
    7. Entity Resolution [Entity Resolution slides]
    8. Regular Expressions [Regular Expressions slides]
    9. Python re [Python re slides]
    10. Regex Summary [Regex Summary slides]
  8. Stats Review [“Stats Review” slides]
    1. Context [Context slides]
    2. Types of Data [Types of Data slides]
    3. Population and Samples [Population and Samples slides]
    4. Probability Distributions [Probability Distributions slides]
    5. Central Tendancy [Central Tendancy slides]
    6. Dispersion [Dispersion slides]
    7. Relationships [Relationships slides]
    8. Plotting Data [Plotting Data slides]
    9. Specific Distributions [Specific Distributions slides]
    10. Normal Distribution [Normal Distribution slides]
  9. Inferential Stats [“Inferential Stats” slides]
    1. Hypotheses [Hypotheses slides]
    2. T-Test [T-Test slides]
    3. p-values [p-values slides]
    4. Failure to Reject [Failure to Reject slides]
    5. Test Assumptions [Test Assumptions slides]
    6. Testing Normality [Testing Normality slides]
    7. Equal Variance Test [Equal Variance Test slides]
    8. Transforming Data [Transforming Data slides]
  10. Statistical Tests [“Statistical Tests” slides]
    1. Multiple Groups [Multiple Groups slides]
    2. ANOVA [ANOVA slides]
    3. Post Hoc Analysis [Post Hoc Analysis slides]
    4. One- vs Two-Tailed Tests [One- vs Two-Tailed Tests slides]
    5. Hacking p-values [Hacking p-values slides]
    6. Central Limit Theorem [Central Limit Theorem slides]
    7. It's Probably Okay [It's Probably Okay slides]
    8. Mann–Whitney U-test [Mann–Whitney U-test slides]
    9. Chi-Square [Chi-Square slides]
    10. Regression [Regression slides]
    11. Stats Summary [Stats Summary slides]
  11. Machine Learning
  12. ML: Classification
  13. ML: Other Techniques
  14. Big Data and Spark
  15. How Spark Calculates
  16. Working With Spark
  17. Aside: Dask
  18. Aside: NumPy/Pandas Speed
  19. Communicating
  20. More Data Science

Course home page.

Schedule, Summer 2023

Week Deliverables (*) Lecture Hour Lecture Date First Slide Video Link
1 1 May 9
2 May 9
3 May 12
2 Exer 1 4 May 16
5 May 16
6 May 19
3 Exer 2 7 May 23
8 May 23
9 May 26
4 Exer 3 10 May 30
11 May 30
12 Jun 2
5 Exer 4 13 Jun 6
14 Jun 6
15 Jun 9
6 Exer 5 16 Jun 13
17 Jun 13
18 Jun 16
7 Exer 6, Quiz 1 19 Jun 20
20 Jun 20
21 Jun 23
8 Exer 7 22 Jun 27
23 Jun 27
24 Jun 30
9 Exer 8 25 Jul 4
26 Jul 4
27 Jul 7
10 Exer 9 28 Jul 11
29 Jul 11
30 Jul 14
11 Exer 10, Quiz 2 31 Jul 18
32 Jul 18
33 Jul 21
12 Exer 11 34 Jul 25
35 Jul 25
36 Jul 28
13 Exer 12 37 Aug 1
38 Aug 1
39 Aug 4
14+ Project, Final Quiz

* Check CourSys for the actual due dates and times.