← Back (Early Preview)

Data is not always right

After looking at Data of 80 tech companies — What have I learned? (PART I)

The past five years have given me a tremendous opportunity to see firsthand data of over eighty VC-backed tech companies. That is close to 100 teams and 300 individuals. Naturally, I’ve got to see a lot of data - very detailed information on every transaction, activity, click, and interaction. What would be expected of me now is to go about promoting for everyone to collect as much information in order to make data driven decisions. Instead, I think all of us in the data profession should be honest about the pitfalls of always championing data-centric approach. Here is why:

  • Data is not always right
  • Not everyone is naturally comfortable with understanding data

Let’s dive right into it…

Data is not always right!

Most of us are familiar with the notion “garbage in, garbage out”, so this is not such a new idea. And yet, I would take this a step further. Even when input data is good, data used today as part of analytics is often not correct. In tech we generally focus on simplifying conclusions and interpretations for the C-level leadership, while we ourselves try to handle the underlying complexity. But therein lies a problem. When it comes to data, there is an inverse relationship between informational completeness and inference simplicity.

inverse relationship between informational completeness and inference simplicity

This applies to all phases of problem solving - from how data is transported to how it is modeled and analyzed. The more assumptions individuals at all phases of the process make, the more likely resulting conclusions are wrong or off. I commonly encounter statisticians who are obviously familiar with technical underpinnings (i.e. lower R²) of, say, relying on high dimensionality datasets, and yet they cannot see the forest for the trees when it comes to their overall analysis.

So what ought a modern data practitioner in such circumstances do? “Embrace the uncertainty” would be my advice. When organizational leaders implicitly expect the data team to answer every analytical question, they ignore the tangible costs of getting to the truth. The time wasted by a data specialist is not that important in the grand scheme of things. Reframing the problem in terms of expected uncertainty allows leaders to seek alternative venues - like having the CEO visit key customers - with more transparent cost-benefit calculations (e.g. time of the CEO to make the trip). The added benefit is that the priorities shift from finding analytical answers, to discovering the truth.

What I’ve seen the best teams do is to embrace this uncertainty head on. Some make it clear in every presentation that they are confident about directional accuracy only. They acknowledge that their snapshot data could be off, but that in itself would not interfere with their recommendations. Others, such as my former boss, Lloyd Tabb (http://tomtunguz.com/ab-testing-saas/), are well aware of the problem with small numbers in SaaS companies. They might inspect data to learn more about customers’ behavior, but they will shy away from drawing up statistical inferences using small samples.

Not everyone is meant to work with data.

This next point is going to be controversial, though it really should not be. I’ve had older organizational leaders hint at this in private conversations. To my own fault, I’ve only recently begun to see this as clearly: peoples’ brains work completely differently. Ray Dalio’s Principles has several chapters on this very topic, so I won’t delve on it here. In brief, there are so many dimensions in which someone might flourish, and yet have no abilities in reading or interpreting data. And it is OK.

Good teams embrace equally non-data driven approaches as well data-driven ones. Snapchat famously was a good example of that in its early years. It made successful decisions about growth using design thinking over commonplace reliance on A/B testing. So why is it that most organizations that I come across belief in putting data in front of every employee and rewarding data-driven behaviour? I think it stems from insecurity about how teams are managed. When data and authority are the key frameworks for making decisions and resolving arguments, management of companies is more straightforward.

The problem is that data is often just an excuse to transfer decision making from those who are not as capable of at interpreting data. This might, for example, mean giving more decision making power around the product experience to someone with a business degree than to someone with a creative design background. Neither is really a standalone qualifier - in my opinion. Data, like everything else, can be manipulated to fit our own biases. Stories can be told to fit our own biases. Stories and data together can be manipulated to fit someone’s bias. That’s just how it is. We should not tell ourselves anything otherwise.

What I suggest instead is that companies draw the line between “having optional access” and “requiring mandatory use.” This might make management of some teams more challenging. Data is a convenient common framework for resolving arguments. But multiple ways of making a decision force teams to find alternative ways to self-organize. Ultimately while we keep insisting that there is some panacea, every organization and team requires its own rules for effective decision making (see my earlier reference to Ray Dalio).

Bottom line, data is important, but it is one of many important factors. We can practice our profession, encourage the use of data when it makes sense, but embrace other possibilities as well. And it is OK.