Clever data analysis isn't useful if…
Communicating about both requirements and results is a big part of the data science
job.
From some data science job ads, requirements about asking the right questions:
Energy domain expertise is highly desired; a passion for the energy domain is essential. EnerNOC
Liaising with credit risk strategy managers, leaders, and other stakeholders to identify requirements by capturing the distinct problems and the expected outcomes which impact critical business processes and/or decisions BMO Financial Group
Requirements about asking the right questions:
… applying advanced analytics to tackle complex and non-routine business problems to drive to actionable business insights. REHUMAN Inc
Gather and refine specifications and requirements based on business needs. MasterCard
Requirements about communicating results:
Engage with stakeholders to ensure that data insights are effectively communicated through the most appropriate data visualization and navigation tools. REHUMAN Inc
We're looking for people who are constantly trying to improve not only their technical skills but their communication and interpersonal skills as well Best Buy Canada
Requirements about communicating results:
• Provide timely, relevant, coherent results (reports, data analyses, etc.) designed to meet the client's specific needs, and tailored to specific audiences;
• Transfer technology and knowledge through reports, handbooks, workshops, and presentations to members, clients and general conferences. FPInnovations
It's hard to say much here.
Real data science questions are going to be about whatever field/industry they come from. You may be asked about finance, or marketing campaigns, or customer behaviour, or forestry site productivity, or …
You have to be able to communicate with people who understand the problem at hand, and make sure you know what is needed. Ask questions as necessary.
Nobody is going to expect the co-op student or new hire to be a domain expert.
Remember: the goal is to get the information that is needed. That may be only loosely related to what was requested.
My experience: the question people ask always sounds perfectly reasonable.
The question they meant is sometimes trivial, sometimes reasonable, or sometimes impossible.
It's best to find out which early.
Communicating data science results is a lot like communicating in general.
Hopefully your W courses (CMPT 376 or similar) point you in the right direction.
When explaining your results, make sure you are clear and honest about what you found.
Resist the urge to make your results sound cooler than they actually are. If the results aren't very definitive, then say so.
Also don't be afraid of limitations of your analysis.
If there isn't enough data, or the right data, or a technique to find the answer you're seeking, then you should be able to explain that clearly.
Being honest might include technical details: assumptions about data, \(p\)-values, possible artifacts of the method.
You should probably address those (depending on the context and audience). Do your best to explain them in a way your audience can understand.
Of course, visualizations (charts, graphs, etc) are a frequently-useful way to present data.
We have used matplotlib several times in the course. Maybe also look at Seaborn. For more, have a look at the Visualization with Matplotlib
chapter in the Python Data Science Handbook.
When creating visualizations, make sure you display your data so that the reality of the data is easy to see. The goal should be to help readers understand what is happening.
Choose a visualization that makes the interesting differences clear. *
Make sure you label what's going on and make sure formatting of the data makes it readable.
The same plot after:
import seaborn seaborn.set()