Understanding Journalism: the other half of Data Visualization

I learned a lot when it came to structuring data and turning it into information into something more. The act of setting a hypothesis, structuring your data, and defining things in terms of metrics was an enlightening process that no doubt brought a lot of help to my design thinking process.

But while it was an exciting foray into a different field, it hadn't addressed my main point: data visualization. As I discussed earlier, data visualization is mainly used for three purposes in Data Science: exploring a dataset, building models, and presenting data to your stakeholders. But the last purpose, presenting to stakeholders, was my primary aim for learning Data Visualization.

However, data science wasn’t the best place to learn how to do that. It’s not as though they don’t have a method to learn Data Visualization.

Listen to most Data Scientists, and they'll give you a very structured data science methodology towards data visualization. First, learn Python or R, make a structured database, and use libraries or additional languages such as D3.js to create stunning visualizations. These are powerful tools for creating visualizations, and they're on my short list of tools that I want to learn next.

But I still didn’t know what made something a good or bad visualization for my audience. I wanted to understand the reasoning behind why I should choose a specific visualization method, what that meant, and how to best convey it to my audience.

Until now, I learned about understanding the data and forming a cohesive dataset based on my investigations. But now, I needed to take the next step: I needed to understand not only how to figure out what was necessary to choose the most appropriate chart but how to tell that story with that.

And to understand that, let’s first revisit where we started, with the Data Pyramid.

DIKW: Data, Information, Knowledge, Wisdom

As a reminder, the DIKW pyramid is the process that allows humans to understand and gain insights from data.

The first step is turning reality into data. This step is what happens when you write down surveys, conduct interviews, or otherwise record observations based on what you’re doing.

The next step is turning that data into Information. This is a process that can often involve Data Science with larger sets of data. This step is where Data wrangling can, Data Analysis, Data Normalization, and Standardization comes into play. But efficiently structuring data can only get you so far.

But the next step, turning Information into Knowledge, requires more than just ordered data: it requires an understanding of how your user understands or finds patterns in the data. This step is part of where I was running into problems with Data Science.

To explain this, let's return to the wine example I had for the pyramid. I said that crossing the Information to Knowledge gap requires understanding how users synthesize or organize information into a framework that allows them to make sense of data.

For example, what type of wine pairs well with certain foods, your limit for wine consumption, or the appropriate steps to taste test wine. You can also tie it to specific experiences: for example, you like this wine brand because it reminds you of Italy.

However, crossing this gap requires more than well-structured data: it requires an understanding of your users. And this is was something I learned when looking at the data science model of data visualization. -

Data Science encodes, users decode

According to Evelyn Munster, Data UX/Visualization Designer for DesignNation.eu, data visualizations are a way for Data professionals to condense a great deal of data into a single image. And the best analogy that I’ve seen for a similar process is with QR codes.

https://en.wikipedia.org/wiki/QR_code

If you're unaware of what QR codes are, they are essentially data encoded into a matrix barcode which, when scanned, allows the data to be decoded and often points to a website or a location. In this model, Data professionals are encoders who choose to represent various pieces of data through visualization. They encode this data into specific visualization pieces, such as bar or line charts, to reduce it into a format that stores visually complex data. I have no problems with this, as this makes sense so far.

However, the second half of the process is when the decoder, which is the reader, cracks the translation key by relying on their skill with graphicacy, which is understanding different axes, labels, and other elements of the chart.

The process almost sounds like how you might encode a QR code with a phone app, with one specific difference:

Unlike a fancy phone with standardized specifications, we’re relying on our diverse audience to decode the data in their minds.

If it successfully decodes, it can yield insights, understanding, and even knowledge.But it if doesn’t? Then it’s just information.

So if that's the case, how do we design visualizations that can be easily decoded? That's a question that I knew from UX Design that required taking the time to understand your users.

To illustrate that, let's talk about one of the most famous examples: which chart do I choose?

Which chart to use, and why it’s often not enough

One of the most critical questions that I wanted to know when starting was based on my data. Which chart should I use? And according to most, this a problem that’s long been solved.

There is a somewhat famous chart by Dr. Andrew Abela in which he shows which chart you should use based on the data you’ve collected.

© Dr. Andrew V. Abela, 2021, www.extremepresentation.com

It's a very effective way to make sure that the chart lines up with the type of data you have. It's orderly, easy to follow, and it frequently misses the point of finding the proper visualization.

For example, if I were to follow that diagram, if I wanted to show the Relationship or Distribution of two variables, this chart would lead me to believe that the best chart to use is a scatter plot.

But what if my audience doesn’t know what a scatter plot is?

I've chosen the correct chart according to the data that I have, but I’ve only solved half the equation by doing that. What am I missing?

According to Cole Nussbaumer Knaflic, author of Storytelling with Data, the question of what chart do I choose has already been solved. But it’s not the answer that you might think.

The answer to this is whatever chart will be easiest for your audience to read. This is otherwise known as the golden rule of information visualization: choose the most straightforward presentation method.

But this is where I started to hit a wall with Data Science. Data Science is focused almost solely on the data: from how to structure, think about, and build visualizations with data and key metrics. As for who the audience was or how to present it to them, it didn’t seem like it had those answers.

UX Design taught me to user test, understand my audience, and spend time iterating on designs. As a result, I’ve become painfully aware of how my first drafts can miss the mark, which is why I would always iterate after feedback. So the thought of trying to create the right visualization on the first attempt, without user testing or feedback, seemed strange to me.

But I found my guidance in two unexpected places: Journalism and Psychology. By spending more time understanding near-universal concepts to most audiences, I could work on something that would be a better basic visualization.

This is how I learned the Journalist’s approach to Data Visualization.