The need for structured data
Photo by Fausto García-Menéndez on Unsplash
One of the most important lessons I've learned about Data science is the need for structure.
We've talked about understanding what metric you are interested in, but we also need to know precisely how to test for it.
And this is done due to the scale and scope of quantitative data. As I mentioned before, one of the things that we give up when we work with large datasets is the ability to edit items manually. This trade-off is why we need to make sure that the dataset is organized so that computer queries can easily return answers.
Without structured data and metrics that we're interested in, we can't visualize anything: trying to do that would only result in misleading visualizations.
As a result, we need to figure out how to create structured data. Structured data is data whose elements are addressable for analysis. It's usually organized into a repository like a database and has a well-defined structure, consistent order, and relational keys.
Relational keys are used to ties specific data to a category, such as storing numbers in a column labeled "Number of Sales." When things are stored in this fashion, Data Scientists can run what's called a query through a coding language like SQL, which asks database-specific questions like "How many people bought our products on March 2nd?". With this structure, the computer can easily look over the dataset and quickly return an answer.
Turning unstructured data into a structured dataset is a process that can take Data Scientists a lot of time and effort. This is called data wrangling, and I’ll talk about it in a later section, but I bring up structure to talk about another related problem.
If you're planning to use a tool like Google Analytics to gather data, it's very likely that you'll be generating a dataset that is mostly structured. However, you may have dozens of metrics and thousands of data points to consider: how can you figure out how to narrow it down?
It may not seem like that big a deal at this point, but it can have drastic effects later on for reasons you might not expect: file size and processing power.
Part of the structuring process is to eliminate variables that may not be entirely useful to your problem. It's often good practice to have both a complete dataset of all structured data and a narrowed down dataset for the specific issue at hand. Working with a slimmed-down and small dataset has several advantages: file sizes can easily reach 1 GB or more with big data, which means each calculation can require a ton of processing power to run queries or visualize on a larger scale.
In the last section, I talked about thinking about the dimensions of data to narrow down some of the metrics of interest. However, you still may be left with several possibilities. How do you figure out what to focus on, as not to be overwhelmed with data?
You build a hypothesis through Data-aware design.
Data-aware design and hypothesis building
Let's go back to what I said earlier about Data-aware Design.
Data-aware design allows you to apply design thinking to understand the questions that need to be answered through research, set up data collection, what data types should be captured, and how to utilize them in the future. This level of thinking is needed to ensure that the correct data is collected to address the right questions, and it often involves others such as business strategists and developers.
And it's one of the best ways of narrowing your dataset to the most relevant data. Addressing research questions and data collection ahead of time allows you to save you more time and effort you'd otherwise have to spend structuring and narrowing down a large dataset.
Designing with Data, by Rochelle King et al., provided me with critical lessons on exactly how to do this. This book showcases an experimentation framework that uses the scientific method to design your data collection in a methodical and focused way.
This book has two principles for incorporating Data-aware Design at the beginning of the process: Hypothesis building and test cells.
Hypothesis building
According to Designing with Data, hypothesis building is a method that allows you to structure your research in a template like this:
For [user group(s)], if [change] then [effect] because [rationale], which will impact [measure].
Or this:
We predict that [doing this/building this feature/creating this experience] for [these people/personas] will achieve [these outcomes] because of [these reasons]. We will know this is true when we see [this impact on our metric of interest].
This is important for several reasons.
First, it spells out some key things that are otherwise assumed, such as:
Who exactly our users are
What are we planning to do or change
Why are we doing this
What we expect to happen
How this relates to the business.
These distinctions are crucial in establishing common ground at the beginning of the project because not everyone may think the same way about these topics. Even if they did, they might not be able to elaborate on them to other people.
For example, someone on the team might think that our users are our regular customers without realizing that this tool is specifically tailored for senior citizens.
Or that the reason that we're trying to re-design this application is that the current process is unusable, instead of it being a little bad.
The more you're able to define a specific topic, the better it is for the data collection process. A slight misunderstanding at this stage can have a huge impact later on, as we'll see in a later example.
But to build a hypothesis, we first need to take a step back and see where we are in this process.
At this stage, we've defined some of your metrics of interest based on the data dimensions.
The next step would be to think about the area of focus for achieving that goal. In this stage, you' either attempting to solve a problem for your users, or find an opportunity area to offer something that didn't exist before. This could take the form of fixing minor localized issues with the interface (specific design interactions or elements). It could take the form of a site-wide opportunity (such as redesigning an entire website).
For example, suppose your goal was to increase the number of sales on your website. In that case, you might break it into several approaches: you might try to make existing customers into recurring customers, bring in new users, or possibly encourage users to buy more per purchase.
Identifying these areas of focus helps determine exactly how many tests or hypotheses we will run. However, likely, you won't create hypotheses for all of these areas: whether due to project scope, budget, or other data, your product owner or manager may decide not to pursue these areas at this time. You might not have the budget for it, you have data that contradicts one of your hypotheses, or the area might not be high priority.
After that comes the hypothesis. At this point, you might have various hypotheses that you want to test based on what you've discovered.
To illustrate this, let's look at a case study to see how this process has been organized so far.
Case study: PS3 on Netflix
When Netflix first started to create their app for the Playstation 3, they had never developed an app for the television. As a result, their first version was considered quite basic, and they were hoping to optimize the experience with the next version. However, this was a new field at the time: they not only didn't feel quite confident in relying on their instincts of what works, they felt like their competitors might not either. As a result, while they weren't happy with the current design, they felt like the right design might not exist on the market at this time.
So they needed to figure out how to think about the best way to design a new application using this experimentation framework.
They knew what metrics they would use to define success for their goal, which was viewing hours.
From there, they were brainstorming multiple hypotheses to consider, such as whether presenting the whole movie catalog might be best, if they should replicate their website design, or if they should replicate a TV watching experience (i.e., channel flipping).
Hypothesis
Each of these was created through the Design Thinking process: by brainstorming several different mental models that their users might be familiar with (TV, Netflix.com, and movie catalogs), they could turn them into very different ideas for testing. From there, they turned their ideas into hypotheses by following a template:
Because of [these reasons], we believe that [doing this/building this feature/creating this experience] will achieve [improvement in metric of interest].
These were the four hypotheses that Netflix generated:
"Because people are used to the TV experience (Reasons), we believe structuring our interface to match that mental model (Doing X) will result in more hours of content consumption (Achieve Y)."
"Because people are used to the Netflix website (Reasons), we believe structuring our interface to match that mental model (Doing X) will result in more hours of content consumption (Achieve Y)."
"Because people expressed issues with seeing a portion of our catalog (Reasons), giving users more choice in the selection of movies on the home page (Doing X) will allow them to find a movie they like, resulting in more hours of content consumption (Achieve Y)."
Because people usually have a certain movie genre in mind when browsing (Reasons), we believe changing our navigation to organize movies in a particular genre (Doing X) will allow them to find movies they like, resulting in more hours of content consumption (Achieve Y)."
Each of these hypotheses represented a different design and possible route that they might explore with testing. In Netflix's case, because they invested a lot of time and effort into this, they decided to test all of these hypotheses. However, at this stage, you may choose to reconsider whether you want to pursue all of your hypotheses: in this period, you would spend time to consider whether or not your hypotheses were going to have an impact as least as big as the Minimum Detectable Effect (MDE).
The MDE is the minimum difference you would want to achieve with your tests when compared with your control. For example, gaining 10% more new users compared to the current system. Do you believe that a certain hypothesis, such as changing the color of a button, will have that type of impact? If not, then we can choose not to move forward with testing that hypothesis at the time.
After this, comes the next portion: Creating test cells.
Test Cells
With a strong hypothesis comes the ability to create many different designs: after all, there may not be one single solution to a hypothesis. As a result, you may create different designs, each acting as a possible solution to the hypothesis that you're presenting. This is where test cells come into play.
In the case of Netflix, it was a matter of different treatments, or test cells, that emphasized either breadth or depth.
So for the hypothesis:
"Because people expressed issues with seeing a limited portion of our catalog (Reasons), giving users more choice in the selection of movies and TV shows on their home page (Doing X) will allow them to find a movie they like, resulting in more hours of content consumption (Achieve Y)."
They tried several different approaches to this. In the control group, they had 25 rows of categories on the home page with 75 titles in each row. However, there were two ways that they could give users more choice on the page:
Add a wider selection of movies to each row
Add more/different genres or categories
As a result, they incorporated a number of different test cells in their testing.
Cell A: 25 rows and 100 movies per row
Cell B: 50 rows and 75 movies per row
Cell C: 50 rows and 100 movies per row.
By doing this, they not only had a structure of easily organized hypotheses that they could choose to pursue. They also could easily run A/B testing on each of their test cells to compare different design elements very easily. Doing this allowed them to break down a side-wide problem into specific sections for testing and data collection.
Your test cells might involve different elements, such as placement of buttons, information hierarchy, color palettes or other small changes. However, setting up these structures was more important than you might think.
Volume, comparison and the need for structure
So why did Netflix end up setting up these test cells in this manner? Because they needed a consistent way of being able to compare across hypotheses.
After all, if Cell B for Hypothesis 3 and Cell A For Hypothesis 2 scored the best in the previous tests, the the next step would be to figure out a way to compare those two designs.
And without this sort of structure, this could easily result in biased tests, unformatted data, or losing track of data. One of the things that we need to remember is that quantitative data often works with a larger volume of data, meaning that we need to think like a Data Scientist.
But at the same time, Data-aware design means applying Design Thinking to our data collection process. And one of the strongest aspects of Design Thinking are the concepts of divergent and convergent thinking: by taking the time to go broad and experimental with our ideas, before narrowing it down into the best solution, we can find design solutions that we hadn't thought of before.
This structure allowed them to make use of divergent and convergent thinking, while also making sure that the data that was collected was high quality, consistent, and formatted correctly.
A dataset, especially one based on testing multiple hypotheses, can lead to headaches if things are formatted incorrectly.
For example, if in these test cells, one test was set up to capture the Time field in a 24-hr format (13:00) while another was set up for AM/PM (1:00 PM), then you might get thousands of rows of data that you would need to standardize into one of these formats. That by itself isn't that bad.
But now imagine that, across all of these test cells, you had to fix thousands of Time and Date rows, not to mention that errors in fields are showing up as 999 instead of blank.
Or imagine that one of your executives believes that they know the correct solution to the problem, so they don't see the point in testing.
Or another team member thinks that one design is perfect, so we don't need to design anything else.
It may seem like this structure is unnecessary, until things start to break down or errors start to pop up. But I'll say 7 words that will immediately convince any Data professional reading this:
Data-aware Design can reduce Data Wrangling.
We'll explore what this means in a later section. However, by breaking it down in this manner, we can ensure high quality and consistent data across any tests that you might run, which is a huge advantage for anyone working with data.
And having metrics, hypotheses, and structures finally allows us to do what I was hoping to do from the start: Data Visualization.