The Scientific Method - Part 4: Communication

Deandra Cutajar
Sep 28, 2021
6 min read

Some would say that the Scientific Method ends with results. However, most would describe the method as cyclical rather than directional. Nonetheless, the series on The Scientific Method shall conclude with Communication.

This may be a presentation, a report of results or an interactive dashboard with different graphs and statistics. Essentially, they all represent one thing: translating the findings into a language that is understood by whoever is your customer.

I love science, talking about it, learning, conversing. Admittedly, coding for long hours can do my head in (especially during debugging) but when I see the final result, then it would be worth it. Upon communicating science to non-science audiences there are, in my opinion, some factors to keep in mind:

your audience cannot read your mind
your audience cannot read your mind
your audience cannot read your mind

I wrote the same sentence three times because it is important to step out of your mind, look at your results and ask:

"If I didn't work on it, how would I interpret this?"

It is useful to take a break between finalising the results and presenting them. A bit of distance can show a different perspective, believe me. When a scientist has been working with a problem for a long time, they can miss the bigger picture. So whilst a scientist should aim for the quickest delivery, they must also ensure optimal and efficiency and not just for the model. In order to do so, they need to allocate time for communication.

What I am trying to say is that if a data scientist goes on to present their results without creating a story, they'd end up confusing their audience at a point when the scientist wants to sell their work. Again, I think all scientists have their own way of glamorising their presentation but I always opt for VISUALISATIONS. Even better, interactive visualisation so during the discussions, one can play around with the data. Remember, the scientist is the one driving the meetings so why not show how quickly they can zoom in and out, left and right on whatever they want to focus on.

Data science or not, it is not always intuitive to understand someone else's analysis or work if you didn't work on the problem. That is the key to communication. Whatever the tool utilised, the results have to be almost intuitive. What is highest, lowest, risky and so on. It is not always straightforward how to do it but you just spent days if not weeks building a machine learning model to ingest rows of data, learn and predict. Surely, choosing a graph is next to nothing. Or so I would think.

Throughout my articles, I used a python package called Plotly, which is my favourite package when presenting. However, other software or platforms alike can prove to be equally professional and easy to manoeuvre. I particularly like the design of the plots but that is beside the point, or is it?

I shall not bother you with different plots that can be used, but rather discuss the different means one can use the plots, specifically when involving Use Cases.

One of the most important things I learned throughout my experience is the power of a USE CASE. It is nothing but an example that would either be taken from the data itself or simulated to represent different scenarios. It could also be a case that the business suggests just to understand the model better in a real scenario. Frankly, it should not matter where the use case comes from. I like using actual data from the dataset but realistically the dataset may not include all scenarios for whatever reason.

In the following, I shall share my three most common ways of communication. Four cases are presented using public data easily available online. For the first case, I generated random data much like the ones showing in previous articles. Each case shall have a description of what the data is about, but I invite you to look at the graph first and then read the description. See if you almost got it or not. In that case, please send me a message as I would love to learn how different audiences interpret different graphs.

Case 1

It is common for data scientists to compare the predicted output of a model against the actual value. In the below graph I compare the prediction against the true values. If the model was 100% accurate, all the blue dots, the data, would lie on the red line. However, this is highly unlikely and worrisome if it happened. Instead, it is expected to have the below chart. One can choose to plot without the straight line or annotations, and then explain the graph during the meeting. Nonetheless, I find that drawing the straight line is already a more intuitive approach since the data is already segmented in two. Additionally one may opt to include annotation for more explanation according to the business problem. In this case, I'm showing the sections that would show when the model predicted a higher value than the true, overpredict, or a lower value underpredict.

Case 2

Another case is when visualising changes of a variable across time. It doesn't necessarily have to be time, nor does it have to follow this data. In the below graph I show the money transaction (in and out of an account) across 100 minutes. This is an example, and I hope people are more careful than the person behind this data. The red line now represents a value of 0 € which translates to "no money spent or earned". In contrast, the purple line shows the earnings and expenditure over time. Shading the difference between the purple line and the red line helps to make the graph, again more intuitive. I added the annotations for the article.

In another instance, the data may not be continuous but rather categorical. Say a person wants to compare the number of giraffes, orangutans and monkeys between two Zoos. Plotting the data in bars next to each other, where each colour is labelled for each zoo makes it clear that SF Zoo has more giraffes than LA Zoo, but then LA Zoo has more monkeys and orangutans. Can you conclude which zoo has more animals? Remember that hovering over the chart gives the number in each bar.

Case 3

This is my favourite plot when preparing for a business meeting. In every machine learning or analysis project, the business will ask for Use Cases which means, they want to show how the model performs for certain clients, and how does the model prediction relate to behaviour. I love the dropdown function within a Plotly chart. I adore it, to the point that I don't do without it anymore.

The data aims to show the number of tasks (vertical axis) that two users (Christophe and Ferrante) executed on a particular date (horizontal axis) for either AI or RANDOM reason (second button). I invite the reader to hover over the buttons and choose a particular value from the menu. Note that right now, the graph is showing some general statistics. Only when selecting one of the two options from the first menu will the data make sense.

Whatever the case, it makes it easier to visualise, share analytics. It's like a dashboard using python, without having to code a lot. Plotly now has a package that generates a dashboard called Dash which I invite you to check out by clicking on the word.

Remember that when a data scientist begins to explain the model, a lot of people already brace themselves that they won't understand. That is not what science is about.

Science is about collaboration, learning from each other. One person speaks business, another speaks engineering, others speak testing and a scientist speaks science.

I know exactly how I sound to someone who is not a scientist. With each new business problem that I work in, the person may speak for hours but if that person is using terminologies that I haven't grown accustomed to, then I would embrace myself not to understand. Remember the three questions: what, why, how! Keep using them until the model is approved and you, my fellow scientist, have learned and understood why you drank so much coffee, zoned out to your favourite music and lost track of time, made horrendous flowcharts in the attempt to solve it, coded for hours and finally delivered.

The Scientific Method - Part 4: Communication

Recent Posts

Comments