Enterprise Lean Startup: Part II – Measures for Learning

One of the key differences between traditional agile approaches and Lean Startup practices is that agile places the highest level of value on working product, whereas Lean Startup will prioritize learning. As Ash Maurya observed, “The initial goal of a startup is to learn, not to scale.” This is not to denigrate building working product, but rather to properly emphasize that if we don’t know we’re doing the right thing, well then it doesn’t matter how much of that thing we can do.

Enterprise Lean Startup Measures for Learning

Enterprise Lean Startup & Learning

So, if learning is so important, how do we ensure we are learning? As I have discussed in the past, we frequently misinterpret experiences and can draw the wrong lessons. Unfortunately, we are simply not impartial judges of what is going on around us. As Roger Bacon observed when he helped bring concepts like independent verification and the scientific method to Europe, we learn best by using observation and objective measurement. These concepts are not lost on us today, and you could think about the build-measure-learn loop of Lean Startup as the scientific method applied to product development.

The natural next question then is, what do we measure and how do we ensure we are measuring objectively. Many start ups today are web products selling directly to consumers. As such, their business model provides a high number of transactions. As we try to apply enterprise lean startup inside large organizations, however, we see that this model does not hold. Business-to-business transactions are usually much larger in scale and lower in frequency. Internal groups don’t have customers who opt to buy their product, as they frequently hold an effective monopoly of over their “market.” These dynamics can provide obstacles to effectively using a build-measure-learn loop, but creative teams and organizations are finding ways to do it well. Let’s take a look at a couple models.

Measures of Behavior, Not Sentiment

There are quite a few experts who have identified valuable ways to look at direct customer metrics, which generally take the shape of a sales funnel. Dave McClure has an excellent summation of these with his “pirate metrics”, and Ash Maurya elaborates building actionable metrics by using cohorts. From these we can see a couple key guiding principles that valuable measures should be based on actual behaviors, as opposed to what it is that people say they want. This is a powerful lesson, and one that has been demonstrated in spades when organizations fail to appreciate this truth. Whenever I teach Scrum Product Owner classes or other sessions around business analysis, I love using the cautionary tale of the Ford Edsel. Based heavily on feedback from potential customer focus groups, the product development team for the Edsel piled on numerous features that their users told them they would want. Those conducting the research failed to appreciate that, when asked in the abstract, if someone “wanted” something, the answer was almost always yes. Consequently, the Edsel was unveiled with a very high price tag – comparable to a high end Cadillac. The cumulative effect of all those “yeses” created a car that was more expensive than potential buyers were willing to spend. For those who have never heard of the Ford Edsel, that is because the car was discontinued after only three years. Personally, I have found this to be a truth whenever talking to a customer about what to build: what they say and what they do are frequently very different things.

Enterprise Lean Startup & Customer Behavior

This focus on customer behavior is why most people in the Lean Startup community will point to actual registrations or sales as the primary measure of progress. These measures are unambiguous and represent customers undertaking the behavior we want, purchasing our product. However, as we discussed earlier in this post, what can we use when sales are not sufficient or applicable to gain rapid learning while evolving a product or solution?

By now, it is pretty clear that Netflix made a good bet in its recent remake of the old BBC show “House of Cards”. Within the first few weeks, some 10% of people with Netflix access had viewed the show, well actually on average they had viewed about 6 episodes, and the reviews are quite good. The show is now the most popular streamed television show. Netflix has been using its data for some time to build a powerful recommendation engine, something Netflix famously crowd sourced at one point with the Netflix Challenge. But Netflix’s data collection goes well beyond just what shows people watch. With its streaming service, Netflix measures what devices people use, when they watch, and even their behavior while watching (such as pausing, watching through and re-watching specific scenes). What gets downright interesting about the organization today is how it is now using that data to produce new products. Let’s look a little closer as just what Netflix learned about its subscribers before purchasing the license for this television series.

Based on measurements of what its current subscriber base was watching, Netflix knew there was a potential interest in British television, as well as a series about political intrique. However, the insights went even further. Kevin Spacey was cast in the lead role, and David Fincher given the job of director, not based on the maneuverings of their agents, but rather based on Netflix’s assessment of patterns from its users, who were watching movies from those two people; the short answer is that they were favorites. When promoting the series, Netflix even went so far as to produce 10 different trailers that targeted different groups based on their preferences. So while we can now look at the commercial success of this venture, looking closer we can see that Netflix went through a very methodical process based on painstaking research of what its users were doing, as opposed to what they said they wanted, and the payout has been quite nice: Of those Netflix subscribers who watched the series, 86% reported that they were more likely to keep their subscription. While I suspect Netflix will never share the raw data, you can imagine that Netflix is also watching new subscriptions and retention in relation to this content as well.

I will be the first to admit, that the stakes for this bet are quite large. Netflix purchased two seasons for $100 million. As a point of perspective, the company currently has total revenues of about $3.6 billion and cash or other short term assets of about $750 million. If we dig deeper, this is actually Netflix’s second major experiment, prior to this larger purchase, Netflix pioneered original content with the first season of the show Lilyhammer, which had a production cost of only about $6 million. House of Cards is only the beginning. Netflix is now helping to produce other new shows, as well as additional seasons of canceled shows, such as Arrested Development, a particular favorite of mine. I should say, Netflix’s fate is far from certain, and it has stumbled along the way – most notably a terribly received attempt to split the DVD and streaming businesses last year. The point remains that Netflix demonstrates the power of leveraging measures about user behavior beyond the traditional measures of registrations and referrals.

If we take this idea one step further and abstract it to the typical organization, we get to the advantage that a large enterprise has: lots of data. Whereas start ups have virtually no users that they can watch and observe from, existing institutions have this vast collection of people consuming whatever product or service they offer. This user behavior can be a primary resource for internal groups, which may never get sales data as their customers are purely internal. Imagine you are the product owner for an internal project management system. You have no sales, as all of the company’s project managers are supposed to use your system, so you can’t validate your product based on that. However, not all behavior is created equal. Adding some simple tracking code to the pages, you can begin to watch people’s behavior through the system much like a marketing team would watch people move through a public website. You could identify key behaviors like setting up a project, checking in daily, or using more advanced features like alerts or dependency management. These can provide tells about who is simply using the system when mandated and who is actually using it to run their work.

Let the Users Speak

Earlier, I cautioned against user feedback, I should acknowledge that sometimes we can get really interesting insights from the people using whatever product or service we are offering. The challenge is to frame the inquiry in a meaningful way. The net promoter score is a simple example, where people are asked would they recommend this to a friend, rather than would they personally like this. Techniques like the Innovation Game®, “Buy a Feature” offers another interesting twist where people simulate an auction. The one I find most interesting right now is the approach outlined in the SenseMaker® suite. This tool offer the ability to collect individual stories and add their own significance and interpretation. This data is aggregated, providing a view of common experiences, stories and interpretations so that people watching the system can identify opportunities that may not be obvious, spot early changes and gain other insights from the self organization of people’s stories. Systems like this can be quite useful in Enterprise Lean Startup, where people may be just a little more willing to offer feedback and spend the time to make sure this sort of data is accurate.

Putting it All Together

Views of MeasurementLet’s step back and take a conceptual look at measurements. The most traditional measurements organizations may use I would call “measures of impact,” for lack of a better term. These represent things like visits, annual satisfaction and other broad measures. It is not that these are bad, but rather that they are not necessarily actionable. If we go back to our analogy of Lean Startup as the scientific method for product development, there are too many variables that occur within a year to identify any one has being a major impact one way or the other. Organizations very well may want to keep some of these, as they show overall aggregate trends, but they are not terribly useful for learning from specific experiments. This brings us to actionable metrics, these should represent user behavior over a finite time horizon or measurement cohort, so that you can validate the impact of discrete changes. Sales, subscriptions and referrals are such measures. In the case of enterprise lean startup for internal organizations, people will most likely look to specific behaviors that can serve as a valid proxy for their value proposition. For example, one company I worked with was building a voice response system to take the load off of call agents. They had one primary metric that they cared about: percentage of callers who need to speak with an agent. While not glorious, this measure represented real value they were delivering and served as a valuable barometer for how effective a given release of their system was performing.

Last, we have something slightly different. Let’s call these forward looking, or prospective measures. They represent how we go about collecting data to build our next round of hypothesis. For many people, this is a very anecdotal affair, and as we have seen, an over dependence on things like focus groups, or even requirement interviews, can be treacherous to product development. There is a lot of interesting activity in this area around observing behavior, putting people into simulations or games, and even collecting narrative stories from users. Companies like Netflix are using big data to go even further and try to figure out novel solutions by statistically analyzing historical data. This area is where established organizations have a huge advantage over startups, simply because they have lots of data. Of course, this can also be a risk, because an invalidated hypothesis is no different than an educated guess. I have seen organizations, flush with data to analyze, become enamored with the projections and forget that they need to incrementally test and validate those assumptions or even worse, feel that because they have so much data, they don’t need to experiment.

Of course, rapid measurements are only useful if you can accelerate your delivery to the point where you can rapidly test hypothesis, but that discussion will have to wait for part III.

comments powered by Disqus