Computational Data Science. We'll come back to what that is.
Course web site: in CourSys,
Instructor: Greg Baker <firstname.lastname@example.org>.
Office hours: Tuesday 11:00–12:00 and Friday 10:30–12:20 in TASC1 9229.
TAs: Logan and Ipsita.
Office hours: details later.
Computational Data Science: data science, but with computation as the focus.
But what is data science?
According to Wikipedia:
an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms…
According to Pat Hanrahan, Tableau Software:
[The combination of] business knowledge, analytical skills, and computer science.
According to Daniel Tunkelang, LinkedIn:
[The ability to] obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning.
According to Joel Grus:
There's a joke that says a data scientist is someone who knows more statistics than a computer scientist and more computer science than a statistician.… We'll says that a data scientist is someone who extracts insights from messy data.
According to Drew Conway, Alluvium:
data science suddenly so popular?
There's more data being collected: web access logs, purchase history, click-through rates, location history, sensor data, ….
Sometimes the volume of data is big: too big to manage easily. That's where
big data starts.
People want answers/insights from that data: Is the marketing campaign working? Is the UI actually usable? What if we did X instead of Y?
New techniques: Machine learning lets us attack questions that were previously unanswerable. Computer scientists are realizing that statistics is important; statisticians are realizing that computer science is important.
where do we find data?
it turns out that stats course was useful.
it's like AI, except it works.
Due Wednesdays. My goal: make sure you actually try out the things we have talked about and see the reality of applying them.
Will contain some short problems to get you used to the tools, expanding to something a more interesting
In the lectures/exercises, I intend to explore what I consider the
core of data science.
The project will let you explore an idea on the edges of that, depending what interests you.
I will (hope to?) post project topic options related to topics like…
Or if you have something else in mind, we can discuss it.
A few details:
Quizzes: in lecture time on…
Final Exam: December 10, 12:00–15:00.
Python 3 will be the primary programming language language used in the course. If you aren't comfortable with it, you need to be (very) soon.
StackExchange Data Science tags (as of May 2019):
This will be a programming-heavy course. If you don't really like programming, this might not be the course for you.
The programming style will be very library-heavy, which is realistic in the modern world. We will use many libraries: NumPy, Pandas, matplotlib, scikit-learn, statsmodels, ….
That means you'll spend a lot of time reading the docs and fighting to make the tools do what you want them to, and less implementing the logic yourself. That's also realistic.
The code you would have written would almost certainly have been slower and worse.
I will feel free to increase the amount of assignment work a little from my usual level because…
Thursday: lectures as expected.
Tuesday: Usually no lectures. The TAs and I will be in CSIL, ASB 9840 (and overflow to 9804). Informal lab time. Because of the class size, we'll have a little extra time: 1:30–3:00.
… but no promises: there may be lectures on some Tuesdays. (Default will be
no Tuesday lecture unless otherwise announced.)
There will be lecture today, obviously.
Possible reference material:
Possible reference material (continued):
To get credit for this course, I expect you to demonstrate that you know how to use programming techniques to manipulate and analyse data. That means:
Failure to do these may result in failing the course.
Academic Honesty: it's important, as always.
If you're using an online source, leave a comment.
def this_function(p1, p2): # adapted from http://stackoverflow.com/a/21623206/1236542 ...
If you work with another student, we shouldn't be able to tell from the results.
More details on course web site.