In today’s data-driven economy, data analytics is a powerful differentiator for businesses to gain insights into virtually any aspect of their organization, giving them an edge over the competition. As such, companies are investing a huge amount into analytics tools: IDC projects that global revenues for big data and business analytics will grow from $150.8 billion in 2017 to more than $210 billion in 2020.

Unfortunately, without the right approach and skillsets, there’s a high fail rate for data initiatives, resulting in organizations failing to reap the true value of analytics. Avoiding some of the pitfalls of data analysis can give value to that dollar investment.

Here are some common pitfalls of data analysis to be aware of that can lead to incorrect or misleading information:

Not working with clean data

Understanding the data you are analyzing is vitally important as it can lead to confusion and bias. Often times, data comes from more than one data source and combining or merging data from multiple sources can lead to errors. Businesses end up comparing datasets that bear no relevance as they are using missing, inaccurate or ‘dirty’ data. In the end, this generates misleading interpretations.

Starting with clean data is essential for producing accurate analysis. Having a skilled data science leader is key for setting up and providing the proper views of the data sets tailored for the needs of business users across the organization. It also requires instilling a best practices approach to bringing in data, understanding the relationships between the tables that have been set, especially from multiple data sources. For instance, one should only bring in data from tables they have knowledge of and not from areas they don’t.

Starting with the conclusion

Sometimes, the business team will have an idea of the desired end result – a thesis – for their analysis. As such, they start by seeking patterns in the data that support that hypothesis to the exclusion of other pertinent pieces of data.

Approaching data exploration with a specific conclusion in mind can easily lead to bias in data analysis. Bringing in another set of eyes is prudent to not only offer an opposing viewpoint, but also for validating the approach.

Jumping to conclusions too early

It’s easy for businesses to focus on data that’s irrelevant to the problem: They end up getting distracted by data that isn’t directly connected to the goal or they focus on a data measure that may obscure or derive misleading results.

Here’s an example to illustrate this concept:

In figure 1, a sales manager is enthused because monthly average sales are increasing, but this data point does not provide the bigger picture:

Average Order Value by Month

Figure 1 – Average Order Value by Month. Source: Datahero

Average Order Value by Department by Month.

Figure 2 – Average Order Value by Department by Month. Source: Datahero

In figure 2, by drilling into the next level of sales such as by department, it shows the context and the reason for the growth. This shows that the increase in the average sales by month is coming from an increase in womens’ shoe purchases.

Without this more holistic analysis of the sales data, the retailer may end up spending on a marketing campaign that’s not targeted to the area that needs the most sales lift. Drilling into the next level of data is key to gaining a better understanding and for the business to get a more accurate picture and therefore, take appropriate action.

Correlation and Causation

This one is my favorite. When looking at data, it is very easy to draw conclusions and make correlations. Coincidences happen! Take the chart in figure 3. There is a 99.26% correlation between the divorce rates in Maine and the per capita consumption in Maine. Correlation – yes; causation – no.

Per capita consumpsion of margarine

Figure 3 – Source Tylervigen.com

Consider the following:

  • Is it coincidence? In this case it is possible, the source has analyzed many datasets and produced many amusing correlations.
  • Common causal variable? Is there a third factor C which causes A and B? An increase of wealth may mean the ability to splash out on butter so buying less margarine. Having an increase in wealth could remove financial stress from the family unit, thereby reducing the divorce rate.
  • Reverse causation – does the decrease in margarine consumption lower the divorce rate. It’s possible that switching to butter has a better taste that leads to a happier family unit.

For causation to be considered then coincidence has to be ruled out, and there cannot be a common causal variable and the reverse causation cannot be true too.

To derive actionable business insights from data analytics, it’s imperative to begin with having a good grasp of the data that’s being analyzed. This entails having clean data and not drawing on inaccurate or dirty datasets and making hasty conclusions. Furthermore, to ensure that business teams gain an accurate and full picture, bring in another set of eyes for confirmation of the results.

In an upcoming post, we’ll examine the dangers of data visualization and how this can be misleading if not used appropriately.

What pitfalls or dangers of data analysis have you experienced?

Learn more about how we’re enabling analytical clients access to a multitude of data sources with our ODBC and JDBC drivers for data connectivity.