tl;dr: When designing a chart, most people try to come up with the ‘best way to visualize the data’. This often results in charts that are unobvious or useless to readers, though. Instead, we should try to design charts that best answer a specific question or that best communicate a specific insight about the data, even though such charts don’t answer all questions that readers might have about the data.
Like any field, data visualization has some common misconceptions floating around in it. There’s one, though, that I think has done more damage than any other, which is the assumption that…
“When designing a chart, the goal is to find the overall best way to visualize the data.”
“WTF are you talking about?”
How can that be a misconception? Am I suggesting that your goal should be to find a bad way to visualize the data? Obviously not. What am I saying, then?
Well, have a look at the data in the table below and three potential ways of visualizing it for our company’s CEO. Which of the three graphs do you think is the best way to visualize this data, graph A, B, or C?
The answer, of course, is that any one of these graphs could be ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data:
If the CEO needs to know which regions have the highest expenses, then Graph A is ‘the best way to visualize this data’.
If the CEO needs to know which regions are doing a better or worse job of sticking to their budget, then Graph B is ‘the best way to visualize this data’.
If the CEO needs to know which regions are contributing most to the company’s overall budget overage, then Graph C is ‘the best way to visualize this data’.
Is any one of these graphs the ‘overall best way to visualize this data’, or the ‘truest representation of this data’? How would we even go about determining that? All three—and many other possible variations—are potentially ‘the best way to visualize this data’, depending on what, specifically, we need to say about the data. None of them is the ‘overall best way to visualize this data’, or ‘the best representation of this data’. In fact, there’s never a single, ‘overall best way’ to visualize any dataset; there are only ‘best ways to say different things about the data’, such as which regions have the highest or lowest expenses, or which regions are doing a better or worse job of sticking to their budgets.
That’s the harsh reality of data visualization that few people seem to realize: Charts never ‘show the data’, they always just say a few specific things about the data. Different ways of visualizing the same dataset make different insights about that data more obvious, less obvious, and not visible at all. Yes, it would be awesome if we could make charts that ‘just show the data’, i.e., that make all possible insights obvious or that answer all possible questions that readers might have about the data, but those charts don’t exist.
Well, if we try to create a chart that makes all possible insights obvious or that answers all possible questions that readers might have about the data, we’ll always end up with a ‘spaghetti chart’:
Even this doesn’t answer every question that the CEO might have about this data, though. For example, if the CEO wanted to quickly see what fraction of total expenses each region represents, or how these expenses compare to those of the previous year, we’d need to add even more clutter. Indeed, we’d never stop adding clutter to our chart in a quest to ‘just show the data’ because there’s always a virtually unlimited number of things that we could say about any dataset.
“Why don’t we just use a table, then?”
Well, tables do ‘just show the data’ without saying anything about the data. Indeed, tables don’t make any insights obvious at all. For example, based on the table alone in the scenario above, is it obvious which regions are doing a better or worse job of sticking to their budget? Or what fraction of total expenses each region represents? Sure, the reader can get those insights, but they’re going to have to work for them and possibly do some calculations, and they’re far less likely to notice interesting or unexpected patterns or relationships in a table of numbers than in a graph.
Tables are also many times slower to consume than graphs and require a lot more cognitive effort to process, which substantially increases the risk that readers won’t get the insights they need from a table—or will just skip over it altogether—because it requires too much cognitive effort to consume. In most situations, then, saying a few things about the data (i.e., showing a graph) is far more useful than saying nothing about the data (i.e., showing a table).
“So, what does all this mean when it comes to actually designing charts?”
The next time you sit down to create a new chart, instead of asking yourself, “What’s the best way to visualize this data?”, ask yourself, “Do I know why I’m creating this chart?”, i.e., do you know what specific insight or answer you need the chart to communicate about the data? If the answer to that question is “no” (which it will be surprisingly often), you need to step away from the charting software and go find out. Perhaps you’ll need to do some exploratory analysis, or speak more with the target audience but, one way or another, you need to figure out what, specifically, your chart needs to say about the data. If you don’t, many of your design choices (chart type, color palette, etc.) will be quasi-random guesses, and the chances that the audience will get what they need from your chart will be low.
Once you’ve figured out what, specifically, your chart needs to say about the data, the next step is to accept that whatever design you come up with is going to communicate that specific insight or answer that specific question clearly (hopefully, anyway…), but there will be many other potentially interesting questions and insights that won’t be obvious in your chart, or possibly not visible at all. Not only is that O.K., it’s the only way it can work (unless you give your audience a spaghetti chart).
What happens if, try as you might, you can’t find out specifically why the audience needs to see a particular dataset or needs to see a chart? For example, perhaps the CEO has simply asked for “expenses for each department” and you don’t have the opportunity to ask them why they need that information because they’re too busy to meet with you. These are unpleasant situations to be in, but they do happen. In my Practical Charts course, we discuss strategies for increasing the odds that we end up giving the audience something that will be at least somewhat useful to them, but these strategies will have to be a topic for a future article since this one’s already longer than I’d like it to be. The bottom line, though, is that our chart probably won’t be as useful to the audience as it could be if we design it without knowing specifically what it needs to communicate about the data.
“So, are you also saying that…”
No. I want to be clear about a few things that I’m not saying:
I’m not saying that all the ways to visualize a given dataset are ‘potentially best’ ways. For any dataset, there are plenty of ways to visualize it that aren’t useful in any plausible scenario, that are fundamentally confusing, or that are just plain misleading:
Outside of obviously bad ways such as these, though, there are always many ‘best ways’ to visualize any dataset.
I’m not saying that, because there’s never a single ‘overall best way to visualize this data’, that whether one chart is better than another comes down to personal opinion or preference. For any given scenario (the nature of the data + what we need to say about that data + knowledge of the audience), different chart designs will be objectively better or worse ways to visualize that data for that scenario. How could we know if one chart design is objectively better than another for a given scenario? We could recruit representative members of our target audience and run an experiment to test the different chart designs to determine which one most effectively answers the question at hand or communicates the insight we need to communicate, and that ultimately best achieves whatever effect we want to have on the target audience.
Of course, we usually don’t have the time or resources to run such experiments, so part of learning data visualization involves getting good at making educated guesses about which chart designs would perform best, were we to test them experimentally with members of our target audience. Having some knowledge of major findings from data visualization research studies is helpful and can make those guesses more educated, but research findings generally aren’t specific enough to point to the best chart in a specific scenario.
Whether we have the resources to determine which chart design is objectively better or not, though, the fact remains that one of the designs is always objectively better than the others. It’s not an inherently subjective assessment.
I’m not saying that, as long as you know specifically what you need to say about the data, you’ll automatically be able to design an effective chart. It takes a fair amount of skill to take some data, a specific reason why the audience needs to see that data, and knowledge of the target audience (level of dataviz sophistication, current concerns, etc.), and turn all that information into an effective chart. The chart creator has to know how to choose chart types, chart arrangements, color palettes, scale formatting, and how to make many other types of design decisions. These are the skills that I teach in my Practical Charts course, and it’s 14 hours long…
“Umm, this seems kind of obvious…”
The fact that there isn’t a single ‘overall best’ way to visualize a given dataset may seem obvious to some when it’s spelled out like this, but getting out of the mindset of ‘trying to find the best way to visualize this data’ and into the mindset of ‘designing the chart that best communicates a specific insight or best answers a specific question’ requires a fundamental shift in thinking that relatively few people seem to have made. I regularly hear even well-known experts discussing which chart design ‘best represents the data’ without even mentioning what, exactly, the chart is supposed to do. As I see it, though, that’s like arguing about whether a hammer or a screwdriver is ‘the best tool’ without ever mentioning if we need to pound in a nail or tighten a screw.
“But is this really the biggest misconception in data visualization?”
I think so, yes…
It’s very widespread. While some people have fully internalized the idea of trying to find the best way to answer a specific question or communicate a specific insight, most still try to find ‘the best way to visualize this data’, without considering the specific reason why the audience needs to see that data in the first place.
It’s caused innumerable arguments regarding which of two (or more) chart designs is ‘better’, which could have been instantly resolved if everyone involved had realized that one chart design would be ‘the best chart’ in one scenario, and the other chart design would be ‘the best chart’ in a different scenario.
If we design a chart by trying to find ‘the best way to visualize this data’, there’s a dramatically higher risk that the target audience will find the resulting chart to be too unobvious—or possibly even useless—because many of our design choices (chart type, color palette, highlighting, etc.) will be guesses since they won’t be geared around communicating a specific insight or answer.
Trying to find ‘the best way to visualize this data’ makes designing effective charts a lot harder than it needs to be. Once we realize that all charts just say a few things about the data, it becomes a lot easier to choose chart types, color palettes, scale formats, etc. in light of the specific insight or answer that we need to communicate. We’re no longer trying futilely to design charts that anticipate every possible question that the audience might have about the data, or trying to find some ‘overall best’ representation of the data that doesn’t actually exist.
Let me know your thoughts in the comments, though. Do you have a different take on this idea?
By the way...
If you’re interested in attending my Practical Charts or Practical Dashboards course, here’s a list of my upcoming open-registration workshops.
In Kieran Healy's article, he discussed three obvious problems of data visualization tend to be aesthetic, substantive, and perceptual.What are some of the problems that may arise when trying to visualize data using charts and graphs? ›
- Using the wrong chart type.
- The poor use of a 3D chart.
- The presentation of misleading or bad data.
- Inconsistent scale across the data represented.
- A visually cluttered graph.
One of the greatest mistakes you can make in your data visualization efforts is to deploy a data visualization tool with little to no communication or training for employees. We can't emphasize enough that installing a new tool won't be adequate to bring transformative insight.What is a weakness of data visualization? ›
Drawbacks of interactive data visualizations
Interactive data visualizations come with some drawbacks, such as requiring more time, effort, and skills to design, develop, and maintain than static charts, and potentially increasing the complexity and cost of the data analysis process.
- Data Quality. Accuracy. Completeness. Consistency. Format. Integrity. Timeliness.
- Not Choosing the Right Data Visualization Tools.
- Confusing Color Palate.
- Analytical & Technical Challenges.
- A 3D bar chart gone wrong.
- A pie chart that should have been a bar chart.
- A continuous line chart used to show discrete data.
- A misleading geography visual.
- A confusing graphic.
There are certain factors that block the ability of visualization. The biggest and most common problem is the limitations of algorithms. This is the matter of the human inputs. Our mind tries to pay the entire attention to the main point of the data we are processing and try to visualize.Which of the following is incorrect regarding data visualization *? ›
Explanation: Data visualization decrease the insights andtake solwer decisions is false statement.What are the disadvantages of Visualisation diagram? ›
- It gives assessment not exactness – While the information is exact in foreseeing the circumstances, the perception of similar just gives the assessment. ...
- One-sided – ...
- Absence of help – ...
- Inappropriate plan issue – ...
- Wrong engaged individuals can skip center messages –
- Using the wrong graph type. ...
- Using too many graph types. ...
- Label axes inconsistently. ...
- Using a graph with too many series of data. ...
- Placing graph titles in odd places. ...
- Compressing or expanding axis scales to fit data trends. ...
- Using too many or too few tick marks on the axis scale.
Misleading and confusing images can skew the data and lead to misinformation guiding important decisions. Deceptive data visualizations lead to residual effects like miscommunication and a loss of trust.What are some common errors that researchers make when presenting data? ›
- Your graphs don't have a clear message. ...
- You haven't chosen the most suitable plot type. ...
- You haven't been selective enough. ...
- The axis intercepts aren't appropriate. ...
- Your plot is cluttered. ...
- Your axis titles and legends are confusing or repetitive.
We've divided them into three related categories: completeness, correctness, and clarity. To envision how all these fit together, imagine that your data is pieces of a puzzle. To get value out of your data, you need to assemble the puzzle (do data quality).What three things are needed in order to have successful data visualization? ›
Understand the audience, work within a clear framework, and tell a good story.What are the three most important principles of data visualization? ›
- Use patterns (of chart types, colors, or other design elements) to identify similar types of information.
- Use proportion carefully so that differences in design size fairly represent differences in value.
- Be skeptical.
A bad graph is constructed in a way that does not convey data in a manner that is clear to the audience. For example, a graph may not have labels for one or both axes. The person who created the graph may have done it unintentionally, but it is still considered a bad data visualization example.Which two are not benefits of visualization? ›
Reduced status reporting overhead is not a benefit of visualisation of work.Which is not a benefit of data visualization? ›
One of the drawbacks of data visualization is that it can't assist, meaning a different group of the audience may interpret it differently. If data visualization is considered the new sort of communication.What are the 3 Vs of big data challenges? ›
Dubbed the three Vs; volume, velocity, and variety, these are key to understanding how we can measure big data and just how very different 'big data' is to old fashioned data.What are three major concerns when dealing with large datasets? ›
Managing huge data sets used to be a problem that only the largest of enterprises had to deal with. Now everyone – from the college student who's developing the next Yelp, to IBM – is dealing with the three V's that define Big Data: volume, velocity and variety, and the Big Data security issues that come with them.
Avoid creating cluttered visualizations.
Cluttered visualizations that include too many visual elements, such as multiple text boxes and graphic layers, lead to audience confusion. In situations where visuals are too busy to be effectively read and understood, a more pointed focus should be incorporated.
The lack of an appropriate fit between the task and the visual representation can be misleading. Some visualizations are based on pre-defined forms or templates that are not adequate for the communication task at hand or the information to be represented.How not to visualize data? ›
- Using the Wrong Type of Chart or Graph. There are many types of charts or graphs you can leverage to represent data visually. ...
- Including Too Many Variables. The point of generating a data visualization is to tell a story. ...
- Using Inconsistent Scales. ...
- Unclear Linear vs. ...
- Poor Color Choices.
Noah Iliinsky discusses the four pillars of effective visualization design, including purpose, content, structure, format, and design types to avoid.Is data visualization difficult? ›
Of course, the difficulty that comes with learning a new skill is somewhat subjective. The challenges of learning data visualization depend on whether you have a background in data analytics, if you know basic design concepts, and how familiar you are with programs such as Microsoft Excel and Tableau.What are 4 characteristics of data visualization? ›
Accurate: The visualization should accurately represent the data and its trends. Clear: Your visualization should be easy to understand. Empowering: The reader should know what action to take after viewing your visualization. Succinct: Your message shouldn't take long to resonate.Which of the following is not a common data visualizations tool? ›
Microsoft Excel is not a type of visualization tool, but is a powerful tool to help analyze data sets.Which chart type should be avoided when visualizing data? ›
The worst thing you can do to your visual report is to include 3D charts! Just, don't do it. Sure, they look beautiful & out of the ordinary compared to 2D charts but that is all there is to them. 3D charts tend to skew how we perceive the data thereby passing inaccurate info.What are the negative effects of visuals? ›
- Costly: The expense of using visual communication techniques is higher than that of using other techniques. ...
- Not Easy to Interpret: ...
- Incomplete Approach: ...
- Time Wastage: ...
- May be Difficult to Understand: ...
- Not Suitable for General Readers: ...
Data visualization helps to tell stories by curating data into a form easier to understand, highlighting the trends and outliers. A good visualization tells a story, removing the noise from data and highlighting useful information.
A visualization may be ineffective for a ilunlber of reasons. It might be too confusing or complex to be interpreted by the intended audience, or some of the data may have been distorted, occluded or lost during the mapping process.What are two ways that graphs can be misleading? ›
Graphs can be misleading if they include manipulations to the axes or scales, if they are missing relevant information, if the intervals an an axis are not the same size, if two y-axes are included, or if the graph includes cherry-picked data.What is the most common way graphs can be misleading? ›
Arguably, the most common form of misleading graphs is one that has its Y-axis manipulated. When comparing large numbers with each other many try to exclude zero from the Y-axis in order to better show the differences between instances.What are 5 ways in which data and graphs can be changed to be misleading? ›
- Excessive usage.
- Biased labeling.
- Pie chart.
- Improper scaling.
- Truncated graph.
- Axis changes.
- No scale.
- Improper intervals or units.
In 2007, toothpaste company Colgate ran an ad stating that 80% of dentists recommend their product. Based on the promotion, many shoppers assumed Colgate was the best choice for their dental health. But this wasn't necessarily true. In reality, this is a famous example of misleading statistics.Why is the pie chart controversial in data visualization? ›
Pies and doughnuts fail because: Quantity is represented by slices; humans aren't particularly good at estimating quantity from angles, which is the skill needed. Matching the labels and the slices can be hard work. Small percentages (which might be important) are tricky to show.What are the three main types of data error? ›
- Sampling error.
- Non-sampling error.
- Importance of error.
Incorrect data inputs are typically the most common error that may occur in data entry. An unintentional mistype may lead to a more severe problem in the short or even long term. It will also bring about wrong information, disorganization, and incorrect records within the organization.What are the 3 must know data visualization principles? ›
- Know your audience.
- Keep things simple.
- Use the right chart type.
- Use colors wisely.
- Highlight the most important information.
- Avoid clutter.
Avoid data distortions.
Data distortions take place when components of the visual that have different shapes are scaled disproportionately to the others that are depicted. Distortions not only can be distracting in visuals, but also have the potential to mislead an audience.
Using the wrong graphs/charts for their particular purpose. Not making the best use out of colors. Creating misleading graphs/charts. Trying to incorporate too much information in one graph.What are the 4 elements of data visualization? ›
Successful data visualization will be achieved when these four elements are present: information, story, goal and visual inform.What is the most important rule for data visualization? ›
1 rule for good data visualization is to let your data breathe,” says David Wurst of Webcitz. “When it comes to data visualization, one of the most common mistakes people make is trying to cram too much visual information into a single design.What is visualization problems? ›
Visualizing a problem helps us understand it ourselves and then gain consensus with others on it. It also allows us to determine if we are all seeing it in the same way. Drawing something also lays it out spatially, allowing people to see relations, sequence and connections, or whatever we want to depict.What is the most common but unavoidable error in data analytics? ›
Unfortunately, regardless of how well laid out the experiment is and how careful the person conducting the experiment follows the steps, mistakes and errors are unavoidable. The most common type of error is experimental error.What are three 3 challenges encountered by companies that embrace big data analytics? ›
This data needs to be analyzed to enhance decision making. But, there are some challenges of Big Data encountered by companies. These include data quality, storage, lack of data science professionals, validating data, and accumulating data from different sources.