Data Science Team Roles
Data science teams need to have 6 different roles covered. Multiple roles can be covered by a single person or even by an external service but they must be handled somehow.
- Business Owner
- Problem Domain Expert
- Data Engineer
- Data Scientist
- Machine Learning Engineer
- Product Engineer
1. Business Owner
Understands the business and benefits gained outside the team.
Drives the discussion what are the goals of data science.
Has some understanding of machine learning.
Handles all data-access related issues or delegating that work.
2. Problem Domain Expert
Understands what we are predicting and why.
Knows what data is available and what the data communicates in fine detail.
Has some understanding of machine learning.
Communicates with data scientist (4) what different data points mean and provides detailed problem domain specific knowledge.
3. Data Engineer
Gathers data from given sources.
Helps data scientist (4) to process large quantities of data. Communicates with data engineer (3) and data scientist (4) how the whole pipeline works.
Knows how to plan maintainable data pipelines and manages how the data is tested.
Is good in programming to be able to automate infrastructure.
4. Data Scientist
Explores and tries to understand correlations in the gathered data.
Defines what kind of predictive models will be built by machine learning engineers (5), usually with Jupyter Notebooks or similar environment.
Understands well how statistics and machine learning works.
Knows some programming.
5. Machine Learning Engineer
Wraps models created by the data scientist (4) to be used in production.
Writes units tests for the code created and manages how the models are tested.
Has a good understanding how machine learning works.
Is excellent in writing maintainable code as it is pure software development at this point.
6. Product Engineer
Handles taking a machine learning model and running it in production.
"Production" can mean various things; deploying to server, apps, phones, ships, cars, etc.
Communicates with (5) machine learning engineer how to deploy the production models.
Communicates with (3) data engineer how to gather the data the machine learning system requires.