Counting is Hard: The Two Biggest Problems and How to Fix Them

13 min readMay 31, 2019

Takeaways

Masked complexity: Metrics are hard because they mask complexity and are massively open to interpretation.
Expected interoperability: Metrics are hard because people have different expectations around application, precision, and lifespans.
We can massively improve how our organizations deal with data and metrics by leveraging the same core Product Management skills we use to build amazing features.

Talking About Data

Storytelling is a critical skill for Product Managers — the best PMs excel at weaving compelling narratives that illustrate how the next big thing will dramatically improve the user journey. Getting these stories right and reflecting the reality of user experience requires invoking data* and metrics**. And yet, skills in this area are generally hard-won. For most PMs, these skills are acquired informally through practice and repetition. It is therefore no wonder that inside of many successful companies there is a constant struggle to get metrics right.

At the heart of this struggle lies two major problems: masked complexity and expected interoperability. These are deep and pernicious problems, ones that are often hard to recognize, and yet both can be overcome with careful thought and sound communication skills. Product Managers already leverage those skills to build amazing things, so too should we apply them to the world of data. After all, data will already be central to nearly every amazing feature we might build, and it is nearly inescapable that requirements will emerge around metrics and reporting. We should therefore embrace the world of data with open arms and improve ourselves there just like we would any other skills.

In this article, I’ll describe these problems in depth and then lay out a series of strategies to tackle them in your organization. My hope is that this helps you level up your metrics communications skills so that instead of getting stuck fighting metrics fires you can focus on what matters: building products that delight your users.

* data: Individual markers indicating that something occurred, which are captured in a structured way and stored in a system that allows quantification. Examples: pageviews, button clicks, server HTTP responses.

**metrics: A named method of counting of data points in a specific way, often incorporating a time-bound component. Examples: pageviews per user, button clicks per page, server HTTP responses per day.

Who Am I To Talk About Data?

My 9 year career at Yelp has revolved around data, but I didn’t start there. I started as a front-end engineer, writing CSS and HTML. However, every project I touched involved data and metrics, and soon I found that the best way to improve the user experience at Yelp was by leveraging our analytics. Resultantly, I’ve worked as an engineer on logging systems; managed analytics suites and trained our teams on how to use them; built growth-oriented feature teams with data-driven missions; operated for years as the primary owner of many of Yelp’s publicly reported metrics; and most recently I’ve overseen a multi-year strategic reinvestment in Yelp’s core data, experimentation, and analysis systems. Most of my knowledge in this field has been acquired “the hard way”. This article is my attempt to distil those lessons down so that you can accelerate your career, improve your company’s efficiency, and help elevate general discourse and numeracy throughout our industry.

Note that I’ve written this article directly at an audience of “Product Managers”, and I reference the field and PM skills often. However, I firmly believe that anyone working closely with data can and should get involved with data and metrics management, practice these skills, and be a part of improving our data driven world.

Masked complexity

noun
A process or situation involving a high level of complications, where the nature of the complications are concealed from view.
“The Director of Sales was surprised to find that there was so much masked complexity involved in the quarterly report on productivity.”

The problem of intangibility

At the core of this problem is that the internet is inherently intangible and business practices are still catching up. Let’s explore some simplified analogies with well-known non-internet-based business models to paint some contrast. Ford can accurately count the trucks it moves off the assembly line, McDonald’s knows how many boxes of burgers it ships to each location, and Delta Airlines keeps meticulous track of how many people board each airplane. Without doubt the logistics behind these metrics are complex, but all describe physical interactions in the real world. Not only could these interactions be verified through observation, but due to their tangibility they are inherently understandable. In our mind’s eye we can picture the journey of a truck frame through an assembly line. We know that the F-150 at the end of the line did not just manifest there, but rather that its existence is due to the complex interactions of workers, machines, and thousands of parts. The complexity of this truck is understandable, even to a lay-person.

What then does a lay-person understand of the term “user” in the context of an internet company? Certainly they can trace their own experience using a device to interact with a website or an app. The logical leap that they would be called a “user” is an easy one. Ergo, when a company states “we had 100 million website users in 2018,” one might reasonably conclude that this means their personal experience was duplicated by 100 million other people. One reaches for their personal experience as a reference point, and interprets the metric through that frame. The mostly likely lens one can understand “users” through is their own personal experience. However, the term “user” hides an enormous amount of complexity.

In reality, the metric and meaning that each company imbues into the term “user” is bespoke, as the industry lacks a true standard. Each time a new definition for “users” is created, the surface area of potential misunderstanding grows. Contributing factors include: expectations that simple language can be used to describe complex technical systems; expectations that counting things in software is easy; expectations that “users” is a universal term; expectations that counting systems work as expected. All of these things add up. The complexity of metrics in the software industry is masked.

Unlike Ford, we cannot measure our users with the same physicality and undeniable presence imparted by a freshly built truck off the assembly line. Nor can we look to the lot adjacent our factory and count the day’s output. We have no such tangible objects. And yet, our minds cannot help but default towards a simplistic view. It is only with great effort that the full complexity comes into view. Let’s explore some of what is behind the mask.

One term, numerous business definitions

Nearly every tech company count its “users” in some way or another. Those with S-1s even document these definitions publicly! And yet, it remains a term with a multitude of meanings, where only some of them describe what you actually care about.

Here are just some of the things “users” could mean:

Unique devices
Paying customers
Logged-in user accounts
All valid rows in a user account table
Users who initiate a product interaction
Devices that receive a push notification
Any of the above, but filtered by specific criteria (e.g.: US only, mobile app only, highly engaged only, verified accounts only, etc.)

One term, numerous technical definitions

The above are all “business logic” definitions. That is, they make no statements about technical details and assume perfect fidelity in underlying data systems. The real world of building large-scale applications presents numerous challenges to one’s assumptions. For a given business definition, it is a messy process to find an appropriate and matching technical definition. Doing so will invariably require some level of compromise and imperfection.

Here are just some of the difficulties to contend with at a technical level:

Cookies are used to count anonymous web users, but they are easy to change and reset. A given user can readily generate multiple cookies by using multiple browsers, private browsing modes, privacy extensions, etc., and this is often undetectable. This is not evasive behaviour from users, but rather features of the browser ecosystem being used as intended.
Bots and scrapers infest the internet. The more successful a business, the more likely it is to have a robotic infestation. The more sophisticated a malicious actor is, the more likely they are to appear similar to real users. Bots have the potential to generate millions of new unique “user” tokens in a day. It can be challenging and cost-prohibitive to address this problem at all, and perhaps impossible to know if one has been fully successful.
Machine communication. All product data points exist due to machine-to-machine communication. This communication may not always be a faithful record of user-initiated activity. Modern-day mobile applications are in daily if not hourly contact with servers to request background updates, provide recent locations, or receive push notifications. These features can bring delight to our users, but they also can clutter up our analytics. Differentiating what set of machine communication truly indicates a human engaging with our product can be a time-consuming prospect, especially with a code base that is constantly in flux.
Counting systems. When dealing with millions of users and billions of data points per day, we must turn to high-powered tools such as Redshift, Presto, Hadoop, etc. in order to gain insights and ask questions with speed. Properly deploying and maintaining these systems can entail complication that equals or exceeds the costs of building user-facing features. Inevitably errors will occur (e.g.: data duplication, data loss, data mis-labeling). Unlike consumer-facing products where user-experience bugs are often highly noticeable, here they can be hard to detect and take time to find.

A single term, nearly infinite complexity

As we’ve now seen, a seemingly simple term like “users” has multiple interpretations at the business and technical levels, the expertise to deal with all this nuance is rare and hard-won, and the realities of day-to-day business operations do not engender deep introspection. This complexity isn’t obvious, so it’s easy to understand how frustration can grow around a seemingly simple request such as “count all the users.” It is our job as Product Managers to internalize this complexity, create appropriate pathways for the rest of the company to interact with it, and set context and expectations clearly. I have some suggestions further down on how to accomplish such a feat.

Expected Interoperability

noun
The anticipation that a metric will be seamlessly of use in a variety of contexts.
“The head of marketing reasonably expected the term “users” to be interoperable between our vendors data and our internal analytics.”

As a direct result of the masked complexity comes the expectation that a metric be applicable in any circumstance. As we’ve just seen, simply talking about and defining metrics involves far more complexity than one would expect. Actually doing work with those metrics lays bare the cost of these complexities.

Let’s say that you’ve managed to avoid the above mentioned pitfalls around masked complexity. You now have an operating metric that is well understood. Product Managers and Engineers are speaking the same language. Congratulations! And yet, bugs keep happening, the metric keeps breaking, and it’s hard to pin down why. What’s going on? Quite possibly, you’ve run into problems with unclear expectations around interoperability. In short, what you want to do with the metric may not be clear and it may not align with what others want to do with it, or with how they express those desires in practice. Let’s dig into some specific examples of how these expectations don’t quite line up.

Metric lifespans

The meaning, technical definition, and technical investment of a metric will vary wildly based upon the context in which a question is asked.

Imagine the context of an analyst providing insight on a very specific metric. For example, “How many active users did we see last Friday in NYC?” At the cost of a moderate amount of one-off work, they can provide a very precise answer. Along with this answer comes the expectation that once an answer is provided, there won’t be too many follow-ups. Therefore, our analyst is unlikely to invest the time required to support providing continued high-precision answers for this question in the future. This is just one reason why the effective “lifespan” of such a metric is often very short. By “lifespan,” I mean the length of time from when I first ask a question to the last moment when I can expect to still receive a valid answer.

On the opposite end, high level metrics (e.g.: a company’s KPIs or north star metrics) generally have far less exacting requirements around precision. The difference between 12.55% year-over-year growth and 12.54% isn’t that meaningful. High level metrics do however have far more demands in terms of lifespan as they are mostly used to measure year-over-year growth. Any meaningful changes in methodology or underlying data can cause large and concerning fluctuations, often warranting expensive investigations. Because of these demands, the technical definition of this metric is the most rigid definition one is likely to encounter.

Failing to plan for the lifespan of a metric appropriately will inevitably lead to disappointment. It is unreasonable to assume that a metric developed a year ago to answer a very specific question could now be used interchangeably with a high level metric — after all, so much has changed over that year in terms of logging, processing, data at rest, and the product itself. So, when we discuss our well-defined “user” metric with engineering and analyst teams, we must also put it into proper context. As Product Managers, framing and context are our bread and butter. If we can put context around the next big feature investment for the company, then so too can we put context around the metrics we’ll use to determine if that feature is successful. We can make these problems better! Let’s explore some ideas that should help.

What You Can Do

Formalize the definition of your metric
Make a metrics checklist
Set expectations

Write a formal definition

Take the time to carefully consider and define exactly what it is you want to count. Try your best to capture the business definition, and collaborate with data engineers to capture accurate technical definitions. Describe what “slices” or dimensions you might want on these metrics (e.g. “user counts per city in the US”). Describe intended uses, implications, and nuances of the metrics. Write down questions you want these metrics to answer. The more effort you invest in this formal definition, the easier it will be to ensure you’re speaking the same language as everyone else. This is how we go from the ephemeral “users” to a more tangible concept with a shared mental model.

Make Checklists:

Define your metrics:

Business definition
Technical definition
Use case
Lifespan
Error tolerance
Expected future use cases

Review with the following parties:

Analyst experts: data engineers, data scientists, analyst teams
Fellow PMs
Execs: Reporting this metric upwards and outwards? Make sure your expectations are aligned with your exec’s!

Vet your metrics:

Interrogating your numbers to compare against expectations / prior results
Investigating any inflections you might find and providing explanations
Discussing with your dev team how to ensure the data remains trusted over time

Compare against prior art

Look at other metrics with similar names to yours. Assume most people will confuse them. Understand what those metrics are, how they work, whether you trust them, and how they differ from yours.

Compare against related metrics

Look at actively-used metrics which are directly related to yours. If they are a subset of your metrics, are the counts actually smaller and within expectations? If you add up your metrics with related metrics, are they expected to equal a whole, and if so, do they actually add up correctly? Can I plot my metric alongside related metrics on the same graph?

Communicate your metrics widely

Let everyone know that you’ve got your metric down! Be the data leadership you want to see in your company!

Document your metrics

Record all the good work you’ve done, not only for posterity, but also to help everyone involved keep the definition correct over time!

Set Expectations

If you’ve done the above steps, you now have powerful tools at your disposal. Use them to communicate effectively. Elevate discussions around your metrics to the same level as you’d discuss user features. Make it clear that you prioritize getting the data right, which includes committing the resources to achieve trust in your metrics. Make it clear that paying these costs up front will spare much greater pain and turmoil down the road. Celebrate every query that is written, every dashboard that is created, and every chart that is shared. Create positive feedback loops for your team that helps them learn the value of investing in metrics and data. Be the change you wish to see in your data-driven culture.

How this adds up

By following the above steps, you will create change for the better in your company and in the industry. For those who need to know the formal definition of a metric (or the curious), you’ll have quality documentation at your disposal. For those who are instrumenting the metrics you’ll have set clear guidance and expectations such that they can plan appropriately. For those that deal with executive or upwards reporting, you’ll now be well versed when discussing your metric. For the developer team tasked with building these metrics, you’ll provide clear instructions and expectations allowing them to build and monitor appropriately, thus stopping more bugs before they occur. For the next Product Manager to use this metric, you’ll have created an institutional record that empowers them to focus on building the next great feature.

Wrapping Up

I think it’s easy to look at the above and go “wow, that sounds like an awful lot of work, why would anyone bother?” And yet, our industry is dominated by a data-driven culture. Nothing else has their combination of power and truth. This boils down to two simple reasons:

Compelling stories. Metrics — when done correctly and imbued with trust — have the ability to tell deeply compelling stories. When I say my recently launched app has over 1 million users, you can imagine yourself using the app and then multiply yourself over and over again until you’ve lost count. Perhaps you can’t form a mental picture of “1 million users”, but you can know it’s an overwhelming number. This is the positive flip side of masked complexity issue discussed above — metrics have the ability to tap into the psyche in a remarkable way.
Truth at scale. The beauty and power of the software world is that a small team of Developers and Product Managers have the ability to produce amazing experiences and then disseminate them instantly to the whole world. Working at this massive scale requires the aggregation of user journeys into powerful metrics, which in turn leads to good decision making. This is the only way we can improve our products to reach the next set of millions users. This is telling the truth of user experience at scale.

Although it is difficult and expensive to do metrics well, the rewards are most assuredly worth it. High-quality metrics tell us if we’re making people’s lives easier or more difficult; filled with joy or pock-marked with frustration. They are the feedback mechanism that Product Managers rely on most. It is incumbent upon us to invest the time, carry the torch, get the metrics right, and tell those millions of user stories faithfully.

(Note: A special thanks to Carl Bialik and Travis Brooks for excellent feedback on this article)