top of page
Search
  • Writer's pictureDeandra Cutajar

Data Analyst, Scientist or Engineer

Many posts online are saying that:

a Data Scientist's role is obsolete or can be classified as Analyst or Engineer.

Some claim that this role was overhyped and that data scientists would ultimately move on to specialise in analysis or engineering.



Design by Geoff Sence.


Saying this means that one missed the word that explains the role of a data scientist:

Science

Science is a word that has been largely misunderstood. Britannica defines Science as the

knowledge about or study of the natural world based on facts learned through experiments and observation.

It isn't specific to an industry or a career but is broadly defined as the study of the world based on facts.


In fact, over the years, this definition has not sufficed. Instead, we began associating career titles and job functions with the word science. Medical professionals and Engineers are widely accepted to do science, but they are not the only ones.


Going to the word's core, Science is a process where an individual learns facts through experiments and observations. For example, a chemist does science, but so does a car mechanic. When someone comes to your home to fix your plumbing issues, they also experiment to find the problem, learn where it is based on observations, and act on it. A chef conducts science by experimenting with ingredients. What do chefs, plumbers and data scientists have in common? Science.


The role of a data scientist has been misunderstood and 'mythified' due to one word we do not fully comprehend. So, let's define the role of a Data Analyst, Data Scientist, and Data Engineer, which essentially means focusing on the words Analysis, Science, and Engineering.


Title 1: Data Analyst


Britannica defines Analysis as:

a careful study of something to learn about its parts, what they do, and how they are related to each other.

Analysis is a process where an individual looks at something de facto and tries to understand it. Why does a car move forward? Why isn't a light bulb working? Through analysis, we identify an insight, that 'Eureka' moment of understanding.


Data Analysts do the same. These data professionals use data to try and build valuable insights to help the business understand its different functions and how they complement each other. For example, how does marketing content lead to successful conversation rates? Why do on-site professionals refuse to fill out a form? The outcome of an analysis is to explain and provide a deep understanding of the current facts about the business's ways of working.


For this reason, Data Analysts' value is understood because Business leaders know what to expect from these experts. The outcome is usually a report or a dashboard that shows, "Due to a brand re-launch where we adapted technique A, we saw a rise of inbound leads coming from the channels in which the campaign was launched." Business leaders can link the outcome to their business function, and they can do something about it. An outcome is always expected and delivered. In addition to trustworthy data, a data analyst should always be among the first hires of a data team.


Today, many analytical tools provide predictive capabilities. Equally, many data analysts have or are acquiring predictive skills such as Bayesian statistics, Machine Learning, and Conditional Probability. But when that happens, the analysis process progresses into experimentation, i.e. Science.



Title 2: Data Engineer


Britannica defines Engineering as:

the application of science to the optimum conversion of the resources of nature to the uses of humankind.

I placed the definition as is, but in simpler words, engineers are professionals who figure out how to transform a natural resource and convert it into something useful for humankind, such as coal, wind, water and more.


Nowadays, data is a resource owned by humans that can be used to serve humankind.

Data Engineers define the process by which data is collected from humans (a.k.a. data subjects) and convert that data into a format easily readable for data processing. These professionals define the roads by which the data leaves its owner and sits with the other data. Data engineers also ensure that the data is of good quality (together with analysts and scientists) by placing checkpoints on those roads to assert that it is in the right shape and format. Businesses understand that the success of the data engineering team is measured by ensuring that once the data arrives at the destination, it can continue down the pipeline without any concern. Like the data analyst, a data engineer should be among the first hires of a data team.


When setting up a data team, a data engineer will get your data in the right shape and format for your storage tool, such as a data lake, database, etc; and a data analyst can start providing insights to the business.


So, where does data science fit into all of this?


Title 3: Data Scientist


The role of a data scientist is to get the data and transform it to help the business gain knowledge about its domain based on facts learned through experiments and observation. Beyond understanding the present, data scientists build models that explain the observable phenomenon before taking a peak into the future.

Experiments are conducted to understand the workings of a phenomenon, but they need to be repeatable and produce consistent results. Only when this is achieved can a business use the model for forecasting.


Most of the time, a data scientist cannot predict what to expect from a model, which is why Businesses find it difficult to grasp the purpose of a data science team. Data science models can either be great and generate revenue (if done meticulously and following the proper procedure), but they can also cost money if not done properly.


The value of a data science team is innovation and the ability to think outside the box. Some ideas lead to nowhere, whereas others become a gold mine. For this reason, companies prefer the titles of analysts and engineers because they link them to an output, a result that a data science team cannot always promise. I want to add that this is related to data quality because all models work.


Data Project


To conclude this article, I want to define the data project lifecycle.


Figure 1: Data Project Lifecycle


A data project starts like every other project: a need or requirement arises from the business, and discussions begin towards a solution.


The stage 'Requirement from business" is the same regardless of the role. A data professional must always understand the four questions I refer to in The Scientific Method, "why, how, what, who".


At the 'Research' stage, we see some slight deviations. A data engineer will research tools and solutions for the requirements. A data analyst will research different ways to provide insights and more information about the nature of the requirements. A data scientist will research the same information as a data analyst plus potential statistical, machine learning or AI models. A data analyst and a data scientist conduct similar research but not for the same purpose.


The next phase of a data project is the collection of "Data". Again, this is similar for every role. A data engineer, analyst, and scientist will work out similar processes to gather the necessary data to fit the requirements.


The roles will deviate at the "Exploratory Data Analysis" stage. A data engineer explores the data by checking the format, type, schema, volume, storage lineage, etc. They might also add some data quality checks. On the other hand, a data analyst and a data scientist explore patterns within the data, such as missing information, outliers, and correlation. However, a data analyst explores the data subject to the requirements for insights, whereas a data scientist has the model and its desired output in mind. If there is a correlation, a data analyst may find it helpful to visualise for the business to show direct/indirect causality. A data scientist may want to drop one of the correlation variables to ensure optimal model performance.


The distinct purpose established during the Exploratory phase determines the distinction during the "Data Modelling". A data engineer models the data to ensure that it remains accurate and does not lose quality as a data point journeys from point A to B. An analyst models the data using visualisation and statistical metrics to deliver insights to the business. Lastly, a scientist builds a model, whether off the shelf or bespoke, to explain current observations and predict future behaviour. A scientist's role is not solely to look back and explain but also to tempt a peak into the future with some acceptable predictable error.


Consequently, each role will "Present Model to the Business" based on the scope and requirements.


Ultimately, and this is where it gets interesting, all roles converge when their solution is ready to be "Pushed to Production/Front end". Most data engineers can do this autonomously but require the analyst and scientist's feedback on the visualisation and application of such data. Nowadays, data analysts and scientists have the skills to complete such tasks. Still, it is wise to consult and get support from engineers to ensure a sustainable production environment is kept clean and void of errors.


The role of a data expert depends greatly on the purpose they are filling and the company's understanding of each role. In today's world, where buzzwords attract more attention than facts, role titles are misleading, and it has been widely accepted that this is so. But if we compare role titles to those of medicine, would we accept the same blurry lines and definitions in the latter? I think not.


A data project has a similar (not exact) lifecycle for all data roles. However, the project's purpose requires a different skill set that different data profiles fulfil. Some may argue that one day, one data role will do everything. To those, I say, "Perhaps, but one cannot possibly be an expert in all things Data. The market changes quickly to allow for this, requiring different skills."


48 views

Recent Posts

See All

Comments


bottom of page