CMPT 732 Lecture Notes

  1. Course Introduction [“Course Introduction” slides]
    1. Us [Us slides]
    2. Welcome [Welcome slides]
    3. This Course [This Course slides]
    4. What is Big Data? [What is Big Data? slides]
    5. How big is “Big Data”? [How big is “Big Data”? slides]
    6. “Big data” isn't always big. [“Big data” isn't always big. slides]
    7. None [None slides]
    8. Clusters [Clusters slides]
    9. Hadoop [Hadoop slides]
    10. Our Environment [Our Environment slides]
    11. Things you will do [Things you will do slides]
    12. Lecture and Labs [Lecture and Labs slides]
    13. Course Topics [Course Topics slides]
    14. Expectations [Expectations slides]
  2. Hadoop Concepts [“Hadoop Concepts” slides]
    1. Our Cluster [Our Cluster slides]
    2. Hadoop Pieces [Hadoop Pieces slides]
    3. HDFS [HDFS slides]
    4. YARN [YARN slides]
    5. (Simplified) Cluster Overview [(Simplified) Cluster Overview slides]
    6. Work on Hadoop [Work on Hadoop slides]
    7. MapReduce [MapReduce slides]
    8. MapReduce Stages [MapReduce Stages slides]
    9. Example: word count [Example: word count slides]
    10. Whiteboard: Fall 2019 [Whiteboard: Fall 2019 slides]
    11. MapReduce Anatomy [MapReduce Anatomy slides]
    12. Hadoop MapReduce Details [Hadoop MapReduce Details slides]
    13. Summary Output [Summary Output slides]
    14. MapReduce Parallelism [MapReduce Parallelism slides]
    15. Writables [Writables slides]
    16. Example: word count [Example: word count slides]
    17. About MapReduce [About MapReduce slides]
    18. MapReduce: One more way [MapReduce: One more way slides]
    19. MapReduce Data Flow [MapReduce Data Flow slides]
  3. Python Preliminaries [“Python Preliminaries” slides]
    1. About Python [About Python slides]
    2. Data Types [Data Types slides]
    3. Unpacking Tuples [Unpacking Tuples slides]
    4. First-Class Functions [First-Class Functions slides]
    5. Lambda Functions [Lambda Functions slides]
    6. Iterators and Generators [Iterators and Generators slides]
    7. Imperative vs declarative [Imperative vs declarative slides]
  4. Spark Concepts [“Spark Concepts” slides]
    1. Spark [Spark slides]
    2. An Example [An Example slides]
    3. RDDs [RDDs slides]
    4. RDD Operations [RDD Operations slides]
    5. Operations and Partitions [Operations and Partitions slides]
    6. Partitions [Partitions slides]
    7. Lazy Evaluation [Lazy Evaluation slides]
    8. Chaining Calculations [Chaining Calculations slides]
    9. Combining Calculations [Combining Calculations slides]
    10. Shuffle Operations [Shuffle Operations slides]
    11. Drivers & Executors [Drivers & Executors slides]
    12. Controlling Executors [Controlling Executors slides]
    13. Spark Web Frontend [Spark Web Frontend slides]
    14. Spark vs MapReduce [Spark vs MapReduce slides]
    15. Spark DAG [Spark DAG slides]
    16. Stages [Stages slides]
    17. Job, Stages, Tasks [Job, Stages, Tasks slides]
    18. RDD Methods [RDD Methods slides]
  5. Spark DataFrames Concepts
  6. NoSQL & Cassandra Concepts
  7. Data Management
  8. Small Data
  9. NumPy/Pandas Speed
  10. Spark Streaming
  11. Spark Machine Learning
  12. Why MapReduce?
  13. Hadoop/Spark Config
  14. Other Big Data Tools
  15. Deploying Hadoop

Course home page.

Schedule

WeekDateStarting Point
1Sept 2Labour Day Holiday 👷
2Sept 9Intro
3Sept 16Hadoop MapReduce Details
4Sept 23
5Sept 30
6Oct 7
7Oct 14Thanksgiving holiday 🦃
8Oct 21
9Oct 28
10Nov 4
11Nov 11Remembrance Day holiday 🎖
12Nov 18
13Nov 25
14Dec 2