Data Bias

Data Bias

Each of us as data professionals will bring some bias into our work. Even the most objective among us has a tendency to lean toward the answers we assume to be true, whether we are troubleshooting a technical problem or creating a new user-facing data solution.

Is data bias bad? The answer is more complicated than a simple yes or no.

Solution bias

Take one example of data bias in which one is trying to find a solution to a problem. This might be to troubleshoot a data anomaly, or on a larger scale, to choose a software or service to address a business pain point. If they have solved a similar type of problem in the past, they’ll likely assume that the solution is something they’ve used before. Using anecdotal data from their own experience, they’ll start with the likeliest cause that they know about. If that doesn’t work, they’ll move onto other possible root causes they are aware of; if those are exhausted, they’ll then move into research mode for possible causes they haven’t yet experienced.

In most cases, experienced professionals will benefit from this data bias. For example, if a DBA learns of a full SQL Server transaction log, one of the first things they’ll check is whether the log has been backed up recently. This type of solution bias is often a time-saver.

However, the most obvious solution isn’t always the correct one, so we have to guard against getting cut by Occam’s Razor. Going too far down the assumption path can lead to wasted time or other bad outcomes.

Interpretation bias

Interpretation bias is when individuals can look at the same data and draw very different conclusions. While everyone has interpretation bias – it’s impossible to thumb through Facebook or Twitter to see examples of this – the issue is particularly problematic when it occurs with data professionals. Because we are very often the gate keepers of the data, it is essential to recognize any biases we have to prevent them from seeping into the queries, reports, and systems we create to deliver information to data consumers.

In most cases, this interpretation bias is accidental and not malicious. I can remember a few projects in which I began my data analysis with the assumption that some fact was true, and would work backwards from there to find the data to support this conclusion. If you start with a conclusion in mind, you can always find a creative way to interrogate the data to support that conclusion.

Avoiding data bias

It’s almost impossible to rid oneself of data bias entirely, but there are some ways to mitigate it:

  • Recognize and acknowledge your potential data biases.
  • Start with a question, not an answer.
  • Use your experience to guide you, but don’t be blinded by it.
  • Ask for a second opinion from others who may not share your biases.

Data bias can taint your perspective, but it doesn’t have to take away from your effectiveness as a data professional. Recognizing and guarding against it will ensure that bias doesn’t leak into your work product.

Posted in Data.