Using treemaps to understand hierarchical data

How to make sense of a complex visualization technique.

May 26, 2021

I didn’t understand the point of treemaps until I worked with a large structured dataset.

As part of an extensive user research effort, we collected open-ended survey data from 130 participants. After standardizing the data and doing thematic analysis, the next topic was to try and figure out a way to visualize it.

After analyzing the data, it was a dataset with over 20 themes and several different categories of respondents. This was a much larger dataset than I was used to visualizing: what was I supposed to do?

That’s when I first started to learn about treemaps. To understand why let’s first examine why they were first created: what’s taking up all my hard drive space?

Understanding hierarchical data

In 1990, Ben Shneiderman, Human-Computer Interaction and Information Visualization pioneer, had an issue: his hard drive was full of files, but he didn’t know what was taking up so much space.

Rather than sort through several directories, he instead devised a method for visualizing the hierarchy of files by a technique called treemapping: this technique fills all available space with a hierarchy of rectangles of different sizes to represent how big a file was.

https://en.wikipedia.org/wiki/Treemapping

This allows you to make sense of what is on the hard drive quickly, but looking at it closely, we can see a problem: we can’t compare categories easily. To understand why let’s revisit the accuracy of visual cues.

According to Cleveland and McGill's research, humans can perceive specific cues more accurately than others. But looking at treemaps, we can see that it uses less accurate cues, such as area or volume, to showcase data.

This means that any comparisons we try and draw between data points are harder to see.

Why, then, would we want to use this? Because sometimes you want to glance at your entire dataset.

Examining the data landscape at a glance

When it comes to displaying large amounts of hierarchical data visually, few charts can match up to a treemap.

But the one area where treemaps excel is guiding the user towards a couple of selected categories through their size among the larger landscape. We can see this in the most popular usage of treemaps: visualizing the stock market.

Source: https://finviz.com/map.ashx?t=sec_all&st=w52

We can see many different dimensions of data just by glancing at it, such as:

How profitable an individual stock on a scale from red to green
What the exact % increase/decrease some of the stocks have
How different companies within a sector compare with one another (for example, GOOGL appears to be 1.3 times as large as FB)
How a specific sector is performing in general (Basic Materials seems to be performing poorly for the most part)
Etc.

The sheer amount of data visualized in a space-efficient manner allows the user to glance and see the hierarchy of data.

But that doesn’t mean that treemaps are only useful in finance. The treemap can often still be the right choice if you want to show a data hierarchy in a space-efficient manner. Some examples of this include:

News aggregators, such as Newsmap (A google News treemap)
Photo galleries
Dashboards

But understanding whether or not to use treemaps is based on asking yourself three questions.

The ideal use case for treemaps

Before deciding whether or not to use treemaps, you first need to ask yourself three questions.

Do I care about showing the composition of the data?

Charts are used for one of 4 reasons: showing Comparison, Relationship, Distribution, or Composition of data.

The Composition of data is better known as the ‘part-whole schema: it asks how big one piece of information is compared to the total. But it is sometimes confused with Comparison.

The difference between the two categories is this: let’s say that I’m showing five categories (Category A-E) that add up to 1000 units. I’m focusing on Category B, which contains 250 units.

If I cared about Comparison, I would focus on the number 250. This is so I have an easy comparison point with other Categories (i.e., Category A has 300 units), so I’d probably use a bar chart made up of different lengths.

If I cared about Composition, though, I would focus on the fact Category B is 250/1000, or 25% of the total. I might therefore represent it as a stacked bar chart.

Thinking about your chart's purpose is going to be crucial at this stage: treemaps are bad for comparison, which is one of the more common reasons for creating a chart. As a result, make sure you’re not choosing a treemap for that reason.

Do the data values vary widely?

According to the Nielsen Norman Group, treemaps work well when you want to visualize two main dimensions:

A positive quantitative value, which is the area of your rectangle
A categorical or 2nd quantitative value, which is the color of your rectangle.

However, if the data values are too similar, it defeats the purpose of treemaps, which is to figure out what are the largest categories at a glance. In that case, a bar chart or other visualizations are best suited for this type of data.

Treemaps should only be considered as a visualization choice when the data is large, hierarchical, and unequal in size.

A sample dataset, shown as a table, pie chart, stacked bar chart, and treemap

Treemaps' use of the area as a visual cue makes it hard to compare categories through pre-attentive attributes. This makes it a less useful chart in most cases, except when the data is complex enough to make the alternatives worse.

But there’s one more question that you have to ask yourself before considering treemaps: is your visualization interactive?

Are you creating a dynamic visualization?

I would probably never use a treemap for a static visualization for one main reason: you can’t read the smaller values.

https://www.nngroup.com/articles/treemaps/

Without being able to zoom into specific categories, I have no idea what any of this means at a glance. Also, there’s not enough space for any labels.

Compare that with a horizontal bar chart, where we can accurately read even the smallest data values, and there’s no doubt which one to use.

https://www.createwithdata.com/chartjs-and-csv/

If you are designing a treemap, you often want the ability to zoom in and figure out what is going on with the smaller aspects of your chart, which means that it requires a level of interactivity. Whether through Tableau, Python, or any other tool, treemaps tend to work best when they can further explore the data.

https://www.nngroup.com/articles/treemaps/

Why I ended up not going with treemaps

After considering the following questions with my dataset, I did not go with a treemap: I changed my categorization structure to use a stacked bar chart instead.

The reason was, upon further analysis, the dataset was too similar. While there were outliers in terms of some categories, too many themes had the same value. This would have resulted in around a quarter of the values being the same size, defeating the purpose of a treemap.

But exploring treemaps led me to realize another visualization tool that I added to my toolbox. So if you’re working with large, hierarchical datasets, perhaps consider whether it’s the right tool for the job.

Kai Wong is a UX Designer, Author, and Data Visualization advocate. His latest book, Data Persuasion, talks about learning Data Visualization from a Designer’s perspective and how UX can benefit Data Visualization.

Data-Informed Design by Kai Wong

Discussion about this post