What are the key skills of a data scientist?
Answer by A Quora admin:
My two cents on the skills from the real data scientist life. This job needs some ‘beyond-textbook’ skills and ‘non-technical’ skills.
Update: 01-07-2014 thanks for these 12k+ views these 140+ upvotes. My Linkedin friend Swami Chandrasekaran made some good articles and a recently-becoming-famous roadmap / metromap about the systematic review of data science skills. Please have a lookThis map mentions 10 skill sets, one can start with these 10 one at a time or all at the same time, and keep moving forward.
update: 09-11-2013 some interesting article as a comprehensive introduction about a data scientist’s daily work:
Usually, this is what a data scientist in a development team is doing everyday in some silicon valley companies:
1. Check emails, put important requests to to-do list and reply ‘I will do it’, ignore these ‘Ganglia status is not stable, and hadoop has some dead nodes’ emails. Reply and discuss on some interesting topics.
2. Check to-do list, find out important functions to implement today
3. Start planning and evaluating possible (and smart if possible, but ~50% time not) ways to implement these functions.
4. Start programming.
5. Take a break. Re-think.
6. Back to program.
7. Lunch. Random chat with other team members about many things.
8. After lunch break. Random chat continued.
9. Another team member or the manager suddenly drops by the cube and makes some urgent request for help, usually some statistical reports from 100TB data for her/his next presentation tomorrow.
10. Help that person.
11. Back for implementing these functions which could have been done 2 hours before.
12. Meeting with managers and other team members. Present some cool discoveries/new features/progress for the work from last week. Make sure managers are happy, and then brainstorm some new ideas. Discuss what to do for the next step.
13. Already 6:00 PM. Again? My brain is full of ideas, I need to put some into the notebook and some into my todo list tomorrow.
14. Continue work to 6:30 or 6:45 to avoid the bad traffic in US-101/85/I-280/Lawrence expressway/San Thomas. Think of edge cases and other practical cases which may fail these new functions.
15. Drive home, get stuck in the traffic. Think of some machine learning predictive models to solve traffic problem.
And, please insert ‘checking facebook/twitter/weibo/blogs’ between each pair of index integers.
It involves many interesting skills.
0. Machine learning: people have mentioned quite a lot about this. A data scientist can be a hard-core machine learning guy, or a beginner to machine learning, both OK. One suggestion can be: we can follow the trend of machine learning and keep learning (human learning, I mean), for example, Deep learning is hot, so we need to know what it is.
1. Programming: it is everyday work! A good data scientist can programming nicely: one or more programming languages and their libraries. A nice understanding about algorithm. I can suggest future scientists focusing on:
- Hands-on programming experience: implement a function using some algorithms quickly and neatly. Please trust me that I know 90% of so-called ‘data scientist’ who put programming skills in their resume but can not code for sorting an array. Please warm-up the programming skills.
- Algorithms: Many people have enrolled the algorithm class and/or read the book ‘Introduction to Algorithm’. In the real life, scientist need to know which algorithms/methods can solve the current problem in the development, e.g. I have 1 million products in the store, can you give a real time (~1s) top 20 popularity rank list? What cases can fail your algorithm under which conditions? Some easy and intermediate level of ACM-ICPC programming contests (e.g. ) and project Euler ( ) can give very good practice. Please practice the understanding of algorithms
2. Intuition of data: to proof some ideas for business decision using data, how? For example, the boss asks: we need to boost the sales of product A, B and C, please give me some ideas. The scientist can check what types of people would like to buy A, what kind of people who bought A also bought B, why C is always on the top sales. And then, he/she can build a machine learning classifier or use some other techniques to boost the sales. These kind of ‘checks’ need intuition of data, no one can tell a scientist what to check.
3. Understanding of data: scientists are not ‘histogram makers’. We are story tellers: what do these two curves mean to the business decision, what ideas we want to proof using these two curves, if we boost curve 1, how much money we can earn more. And, more important: we make the story.
4. Team work: A good scientist should talk and help.
- Host a discussion for the whole team, either in emails or in the room.
- Daily discussion, in the cube, in the coffee table, in the restaurant etc.
- Presentation for team members, managers and the boss.
- Help other people on the technical part, as much as you can (but limit the amount of helps if no much time)
- Help on bringing up new ideas for other people’s work.
5. Keep thinking. If thinking is happy, just keep doing it.
6. Self-motivated working: as I mentioned, no one can tell a scientist what to check, and no one can assign the every day work to a scientist. In the company, scientists usually have the least monitoring from the manager if a good manager. A scientist should know how to set the pace for the work, also coordinate the pace with other people.
In short, data scientist is a intelligence analyst+programmer+algorithm guru+big data guy+presenter+helper.
Cover photo courtesy – Flickr user Jason Goto
This article was published by Maslowed.Me – a career portal
- powered by Analytics
- committed to helping you make better career decisions
Visit us at www.maslowed.me