Hands-on Data Visualization practice #3

Apr 27, 2021

Sorry, this was a bit late. I struggled with another week’s visualization and decided to skip that for the time being and move to another visualization.

Initial thoughts

When I first saw this visualization, I immediately knew that I was not the intended audience. While it was true that I did not know much about coal, trying even to understand the presentation format took me a while.

It was only after digging around in the data that I began to understand that hyphenated names were referring to two places: one was the State name, and the other was the District name. For example, Korba-Chhattisgarh means that the State name is Chhattisgarh, and the District name is Korba.

Source: Sandeep Pai and Hisham Zerriffi. A novel dataset for analyzing sub-national socioeconomic developments in the Indian coal industry, IOPSciNotes, https://doi.org/10.1088/2633-1357/abdbbb

What works and what doesn’t?

This seems like a standard horizontal bar chart, but there are many things here that make this weird.

First and foremost are the two variables, # of mines for one and the coal production for the other. Based on the labels, it seems like one unit is just supposed to be measured normally (i.e., 5 coal mines) while the other is supposed to be measured in Metric Tons (10 Metric Tons). This makes a slightly unusual combination of variables, but it’s not too weird.

However, what's harder to visualize (or compare) is how these things are grouped or compare with one another. For example, there are a couple of areas (Korba-Chhattisgarh, Singraui-Madhya Pradesh, Angul-Odisha). There seems to be significantly more coal produced than the # of mines that they have. Is this related in any way? Are they part of a particular region that is known for their coal?

Lastly, I don’t know how these things are organized. It almost seems to be sorted by metric coal production, except that doesn’t make sense for the latter entries. It’s not sorted by # of coal mines either, and at a glance, I can’t tell if it’s sorted by location (I later learned that it wasn’t). As a result, I don’t know how things are organized.

What did I do to make this better?

My initial thought, when I first saw this, was to make it a map. Without knowing anything about these locations, the first thought was to see how things first fit together.

However, this ended up not really giving me that much information. There were over 450 mines within this dataset, scattered among certain regions and areas.

While I could have made this a possible choropleth map with different values highlighted in a range, I don’t think it would have given me the right comparison: either it would have told me which states have the most production/mines (but not necessarily districts), or it would have split this map even further up into individual districts (which would make this harder to read).

So I decided to stick with the current horizontal bar chart, but instead address the organization issues there: I would give an overview of the # of mines and coal production per state, with a greater breakdown of the different districts per region.

Doing this allows us to still understand the same chart as before and allows us to zoom in on certain states if we would like. For example, it’s easier to see that certain states, such as Maharashtra, have even coal production across different districts. In contrast, Madhya Pradesh has a single district that produces most of its’ coal.

Data-Informed Design by Kai Wong

Discussion about this post