What’s the difference between a Data Engineer and a Data Scientist?

Comparison between the pizza base and the toppings of the data meal

GoustoTech
Gousto Engineering & Data

--

This is the first in a series where we compare professions. We interviewed Carmela Brook (data engineering) and Rita Figueiredio (data science) to find out more!

What made you want to get into data science / data engineering?

Rita: I studied Biomedical Engineering in university, a super broad degree with knowledge from all areas from physics & biology to maths & programming! So by the time I was choosing my MSc I was still a bit clueless about what I wanted to become. When I started working on things like predicting melanomas from skin lesions, or predicting diabetic retinopathy from diabetic patients’ history, I figured that was what I loved — to solve problems using past data. When I started working I moved away from the health sector — causing my parents a bit of a heart break — to industries where Data Science was growing more steadily, but using large amounts of data to predict the future remained the common denominator!

Carmela: After graduating university I had no idea what I wanted to do. I took an interest in data from reading blog posts online and stumbled upon data engineering. At the time, I was more familiar with what a data analyst or data science role entailed but hadn’t heard of data engineering. I liked that it was more concerned with the storage, movement and transformation of data — enabling others to harness it, as opposed to using data to make predictions or extract insight. This was well suited to my chemical engineering background, which was concerned with the exact same aspects but on chemicals instead of data. I knew immediately my skills would be transferable.

What does your typical day look like?

Carmela: We have a large data platform team and we operate in small quarterly feature teams, focused on delivering a set of OKRs. This may involve; building new engineering improvements, adding features to our existing data platform, ingesting a new data source, or building data pipelines so that stakeholders can access the data in a useful format. We might also be on-call to provide operational support to stakeholders, solve any production incidents in our platform and fix any identified bugs. We aim to pair with other engineers when working on new tickets and frequently have design sessions to collaborate with the entire team on architectural decisions.

Rita: Our data scientists are embedded into cross functional squads. I work in the Haricots squad within the Care Tribe, which has the goal of making sure customers feel valued and have their problems solved quickly, with a smile! I am currently working on optimising the compensation we give our customers when there is something wrong on their box. My day depends on what stage of product development we are at — in the discovery phase we will be brainstorming with the relevant stakeholders, in the exploratory phase we will be discussing approaches with other data scientists and exploring the data, while towards the end of the development cycle we will be collaborating with our squad’s software engineers to productionising models.

What skills/tools are most crucial for your job?

Rita: If we’re gathering data from our data lake then we’re using Spark/SQL. If we’re doing data exploration then we’ll use Python on a Databricks notebook. Whereas when we move to the product development phase we’ll be working in an IDE, and using CircleCi, Cloudformation, AWS and Databricks to deploy our products. Having product vision, understanding the domain and working on the relationship with the stakeholders are the most crucial skills to be a good Data Scientist!

Carmela: Python, SQL, Spark, Linux, Databricks, Delta Lake, AWS, Docker, CircleCi (CICD), Cloudformation (IaaC). It is also important to be able to break down complex problems into small, manageable pieces. Good communication, to be able to adapt your technical language to suit your audience and ask the right questions. Teamwork and an overall eagerness to learn.

Who are your key stakeholders?

Carmela: Our key stakeholders are the rest of the data team — this includes data analysts, data scientists and data governance. This list is expanding as we build more use cases for serving data to operational systems e.g. CRM which involves new non-technical teams such as marketing.

Rita: As a data scientist embedded in our Customer Care Tribe, my main stakeholders are my squad’s Product Manager, as well as the Director and Head of Customer Care.

How do your two professions interact?

Carmela: There is a lot of interaction between data scientists and data engineers — we may get questions about data access, data quality or requests for new data. Typically, data scientists will use more features in Databricks to iterate through the machine learning life cycle, therefore we receive more queries relating to Databricks — productionising ML models, orchestrating jobs, managing dependencies, etc. Demand for this kind of support has grown so large it’s becoming its own role — MLOps.

Rita: Couldn’t have said it better myself!

What advice would you give to someone coming into your profession?

Rita: Besides investing in constant learning and development, what will make you stand out as a data scientist is to be passionate about the product you’re working on and having a vision of what it could be in the future!

Carmela: Keep learning and stay in tune with what’s in the data community as the landscape is forever changing!

Check out more stories from Gousto and make sure to follow us here, so you catch the next instalment in our series of “What’s the difference between…”.

While you’re waiting for that you can check out this great example of data science & data engineering coming together.

--

--

GoustoTech
Gousto Engineering & Data

The official account for the Gousto Technology Team, a London based, technology-driven, recipe-box company.