On becoming a data scientist...
Disclaimer: this article is about the Microsoft Professional Program certificate in Data Science. I am employed by Microsoft, but the opinions expressed herein are my own.
Hal Varian, Chief Economist at Google, is often quoted as having said in 2009 that "the sexy job in the next ten years will be [data scientists]." (The actual word he used was "statisticians", but I'm sticking with the current buzz word). Hal went on to say:
The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades...
While I don't personally claim the title of "data scientist" and certainly don't qualify on standards of sexiness, I was recently part of the first graduating cohort of the Microsoft Professional Program in Data Science (MPP), a 10-course series focused on various data-related skills and a challenging capstone competition. The program leverages both open source and Microsoft technologies, in a digestible, easy-to-use way. If you're looking to up your data game, or just want to get more familiar with data science in general, read on.
Program Structure:
The MPP is delivered on the edX website. Like many massive open online courses (MOOCs), most of the courses in the program can be audited for free (but you have to pay the registration fees if you want to earn the official certificate). The program consists of 9 courses and capstone data competition. Topics covered include working with databases, analyzing data visually, a foundation of key statistical concepts, data exploration, data science / machine learning fundamentals, manipulating data with Python or R, and practical applications.
Check out the webpage to see the specific program structure: https://www.edx.org/microsoft-professional-program-certficate-data-science
Why this is a great course:
I've taken my fair share of online courses, including other data science mini-degrees. Here's what I like about the MPP:
- Grounded in business use cases: almost all of the examples, labs, and final projects use business-related data as their subject matter. This makes the material more accessible to a wider audience than other courses, which sometimes use highly technical / medical / or irrelevant data (Iris anyone?).
- Gradual progression from simple to complex: I think that many students new to data science get discouraged because the "intro course" plunges them into a cold pool of esoteric words and concepts. Not the MPP. The very first course starts at a conversational level and carefully builds on each concept. The sequence of follow-on courses is logical.
- Good balance of theory / practice: the typical format of a MPP course pairs theory and discussion with hands on labs that guide you through tasks and then asks you to apply it. This may just be my personal learning style, but it worked nicely. There is hands on work all along the way, with a capstone project to tie everything together.
Tools Used:
Being a Microsoft program, it's not suprising that the toolset has a Microsoft flavor:
What is surprising to some people, however, are the portions of the program taught in the R or Python programming languages. Microsoft is embracing these open source technologies in a big way. And, because R and Python skills are very desirable to prospective employers, that's good news for aspiring data scientists.
Words of Caution:
- Plan ahead to make sure you are staying on track with each course. The concepts build on each other, so missing something early on could cause trouble down the road.
- Some of the courses may be more challenging if you do not have any prior experience. For example, if you've never worked with a database before, you may find some of the concepts in "Querying Data with Transact-SQL" a challenge. You won't get totally left behind, but give yourself some time to figure things out.
- For the statistical foundations course, "Statistical Thinking for Data Science and Analytics" Microsoft has contracted with Columbia University to use one of their courses. For me, this was one of the more challenging experiences -- longer videos, more formal presentation, and less accessible content. The information is all very good and important, but it's delivered in more of a classic stats format that I found to be less engaging and digestible than the Microsoft content.
- The capstone project is a data science competition, which was challenging and VERY addictive. I found myself up late several nights in a row experimenting and trying to improve the accuracy of my submission. More fun than bingeing Netflix, but similarly time consuming.
Conclusion:
In summary, if you are a business-focused, data-interested person, or if you'd like to increase your overall level of professional sexiness, I would highly recommend checking out the Microsoft Professional Program for Data Science.
Dear Brady, Congratulations. Can I get a good job after taking this certification?
Congrats...
Just completed my capstone project. Waiting on results to get posted to Microsoft's dashboard. You are right, it WAS very addictive trying to shoot for just a bit more accuracy on the predictive model. :)
Great write-up. As I mentioned to you the other day I will be starting this program soon.
Interesting read, I am definitely gonna give this MPP a try.