# Incremental Innovation

A Bayesian approach to changing the world.

“Innovation” is a good word for what I mean, but there are a couple of other phrases that relate to the concept I’m really trying to address. It’s “making a difference”, “changing the world”, “doing great things”, “establishing a legacy”. In antiquity this is exemplified by what Alexander the Great did for military strategy and Aristotle did for philosophy. Today it’s what turns kids in dorm rooms and hippies in garages into billionaires in jeans and turtlenecks. In its highest realization, it gets people on lists of historical greats. This notion appears in every field. In literature there’s the works of Dickens and Hemingway. In art there’s Michelangelo and van Gogh. In science Newton, Darwin, and Einstein. The business acumen of Steve Jobs and Elon Musk. N.W.A. did it for rap, the Beatles did it for rock. We attribute innovation to people and groups alike. Whether we’re thinking of “Google” or “Larry Page”, the point is the same – we’re describing a phenomenon that seems to somehow transcend the ordinary improvements that 99% of the world produces in favor of transformational improvement. This is the phenomenon I mean when I say innovation.

Zero to One Innovation

I’m going to borrow language from Peter Thiel’s Zero to One. The work focuses on innovation in the context of startup companies, but it relates directly to my broader concept. Thiel describes innovation as a “zero to one” improvement in some field. To go from zero to one is to do something that no one else is doing, or to create something that did not exist before. Thiel contrasts this with events that take fields from one to n. For example, the invention of penicillin is a “zero to one” event, while opening another pizza shop in Manhattan is a “one to n” event. Penicillin is innovative, the pizza shop is not. Thiel qualifies this model by requiring that successful innovation also introduces an order of magnitude (10x) improvement over competition. Incremental improvement isn’t enough. A car that is 5% more fuel efficient than any other car is not world changing and will not dominate the field, but the development of cars is a world changing event. Cars can be said to be at least an order of magnitude more effective than horse drawn carriages. In my experience, Thiel accurately describes the model of innovation that most people hold. The innovative agent of change must be better than existing agents. In order to generate the legacy that great innovation deserves, it can’t just be a little better – it must be a lot better. To be better than existing practices the innovation must be new, or if not new, at least rediscovered. I’ll call these conditions the “Zero to One (Zt1) conditions” for innovation regardless of whether Thiel himself appreciates this definition, because I’ve seen many people walk away from his book with this impression.

I’ll use Michelangelo to demonstrate Thiel’s model in the wild. Michelangelo’s preeminent biographer, Giorgio Vasari, identifies the Zt1 conditions of innovation:

One [biography], by Giorgio Vasari, proposed that [Michelangelo] was the pinnacle of all artistic achievement since the beginning of the Renaissance, a viewpoint that continued to have currency in art history for centuries.” Vasari on Michelangelo’s David: “in it may be seen most beautiful contours of legs, with attachments of limbs and slender outlines of flanks that are divine; nor has there ever been seen a pose so easy, or any grace to equal that in this work, or feet, hands and head so well in accord, one member with another, in harmony, design, and excellence of artistry. And, of a truth, whoever has seen this work need not trouble to see any other work executed in sculpture, either in our own or in other times, by no matter what craftsman.

Vasari claims Michelangelo demonstrated a new level of artistic merit, one that is significantly better than any before him. He even claims that Michelangelo created works significantly better than anything that came after. Attacking a claim this extreme is too close to weak manning for my taste, but my point is that this mode of thinking is common. In an attempt to develop a coherent model of a world in which one person achieves such prestige and influence in art, we come to the conclusion that the skills of the artist and the merits of his work must have been far greater than those of his competitors.

Incremental Innovation

This would be boring if I didn’t tell you there are significant problems with the Zt1 model of innovation. I hold that Michelangelo does not represent an order of magnitude improvement in the world of Renaissance art. People get this impression because the fields of painting and sculpture were growing an order of magnitude faster than normal. He was on the right point of the S-curve of innovation. I argue that the contributions Michelangelo made to the fields in both theory and execution can be characterized as incremental relative to his peers at the time, and there is nothing particularly anomalous about the man himself. The important differences between my model and Zt1 innovation come not from the effects of innovation on its field, but rather from what innovation looks like from the inside.

I can make the differences between my model and the conventional model concrete. I’ll continue to use Michelangelo as an explanatory example, but this is a general framework that can be applied to any innovator (or innovation, with some tweaking).

We model the relative skills of a collection of agents in some field with a normal distribution. The incremental innovation hypothesis is what scientists and statisticians call the null hypothesis $H_{o}$. It holds that the sample in question, the innovator, is not significantly different than the general population. I take general population to refer to the competitors of the innovator. This is the correct way to interpret general population because it is not an impressive claim to say that Michelangelo was significantly more skilled in art than the average citizen of Florence. The Zt1 model of innovation means to say that the innovator made a significant improvement in their field. Thus our population must be restricted to active members of the field in order to model the phenomenon that we’re addressing. You wouldn’t argue that Lebron James is the GOAT by comparing him to an actuary playing pick-up after work.

$H_{o}$: The innovator or innovation is not significantly better than the competition.

What do I mean by “not significantly better”? I really mean that at least one of the innovator’s contemporaries could be considered as skillful or more skillful than the innovator. This implies that in a plausible alternate universe, a competitor could be switched out for the innovator and attain a comparable legacy. We make this concrete by specifying how far from the mean of the talent distribution the innovator needs to be in order to conclude that they were truly irreplaceable. For the sake of the discussing the general framework, I’ll use the three sigma rule as a conservative rule of thumb.

This should really be adjusted depending on the size of the population (number of competitors to the innovation) and the strength of the claim of greatness (Do you agree that in order to be considered as great as Michelangelo you must be once-in-a-lifetime? Maybe its once every two lifetimes…maybe its once ever. How rare must you be to be as great as Michelangelo?). In any case, the three sigma rule is the rule I’m going to adopt because it is just about the weakest claim I can make that models the Zt1 conditions. Three-sigma corresponds to roughly 99.9% of the competitors being worse than Michelangelo. If this holds, then for 1,000 artists in Renaissance Italy there would be 1 artist as good or better than Michelangelo. The more artists there were, the more sigmas we need to maintain that he was unique among his peers. Again I’m going to try to avoid weak-manning and targeting the hyperbolic Vasario who is likely defending a 10+ sigma alternate hypothesis.

The alternate hypothesis $H$ is the negation of the null hypothesis – that Michelangelo’s skill as an artist is greater than 3-sigma away from the mean of his competitors. From this it follows that (given there were 1,000 artists in Italy during his life time) Michelangelo was at least a once in a lifetime event, while allowing that perhaps he was a once-ever event.

These hypotheses line up with the Zt1 conditions of innovation. A unique innovation is one that only happens once in the lifetime of the innovators. There is only one in the population. This takes the field from zero to one. We have a separate issue of whether an innovation is 10x better in some sense than what’s currently being done – this begs the question of how you apply multiplication to whatever the x-axis of our normal distribution is. I won’t get into that, but as far as the Zt1 conditions go, this framing seems to line up with Thiel’s language.

The General Case for Incrementalism: A Bayesian Analysis

So we’ve defined our two hypotheses – the incrementalist model Ho and the Zt1 model H.  We also have a body of evidence E which encompasses all observations about Michelangelo and  his peers, their works and the impact of their works, facts about Renaissance Italy, and really anything that can be used to support either claim. E is shared among both hypotheses. Like a good rationalist, I appeal to Bayes’ Theorem when applicable.

$P(H|E)=\frac{P(E|H)P(H)}{P(E)}$

And similarly for the null hypothesis,

$P(H_{o}|E)=\frac{P(E|H_{0})P(H_{0})}{P(E)}$

$P(H)$ and $P(H_{0})$ are given by construction. For the three-sigma interpretation,$P(H_{0})$ is 0.999, and $P(H)$ is .001.  If you think your  innovator is more special than one in one thousand, then the more sigmas you need, $P(H_{0})$ gets higher and $P(H)$ gets lower. To compare the two hypotheses, we can look at the ratio of their probabilities given the evidence:

$\frac{P(H|E)}{P(Ho|E)} = \frac{P(E|H)P(H)}{P(E|Ho)P(Ho)}$

If this ratio is greater than one, the Zt1 model is more likely to be correct that my model. If this ratio is less than one, my model is more likely to be correct. We have our knowns so far:

$P(H_{0}) = 0.999$

$P(H) = 0.001$

$P(E|H) = 1$

Why am I going with $P(E|H) = 1$? If you assume that an innovator is at least a once-in-a-lifetime event, or make a stronger claim, then the probability that the innovator will produce innovations deserving of the esteem he or she has achieved is probably close to 1. Equating this probablity with 1 is a conservative assumption that gives the benefit of the doubt to the Zt1 model of innovation.

The core of my argument, and the place where discussions about these innovators normally leads, rests in the evaluation of $P(H_{0})$. $P(H_{0})$ is the degree of belief that one of the competitors of the innovator could have produced the innovations presented in E – that the innovator is not of significantly higher merit than his contemporaries.

In order to go from the most contentious part of the discussion,$P(H_{0})$, to the conclusions, $P(H_{0} | E)$ and $P(H_{0} | E)$, we must apply Bayes Theorem – which means multiplying by the Bayesian priors $P(H)$ and$P(H_{0})$. This is something that in many cases, people rarely do – it’s the base rate fallacy. It’s why Bayesian rationality is an important movement – recognizing this fallacy helps us avoid incorrect models of the universe.

We have

$\frac{P(H|E)}{P(Ho|E)} = \frac{P(E|H)P(H)}{P(E|Ho)P(Ho)}$

$\frac{P(H|E)}{P(Ho|E)} = \frac{(1)(0.001)}{P(E|Ho)(0.999)}$

$\frac{P(H|E)}{P(Ho|E)} = \frac{(1)}{P(E|Ho)(999)}$

In order to consider the alternate hypothesis possible, we must convince ourselves that$\frac{P(H|E)}{P(Ho|E)} > 1$. For this to hold $P(E|H_{0})$, which is our degree of belief that the innovator could be swapped out with a peer, must less than 1/999, or 0.001.

Now we see why it’s so important to consider Bayes’ Theorem in detail. Normally when discussing this model, people get to this point and think “it seems more likely than not that Michelangelo was unique in his time, so he must be have significant artistic merit compared to his peers”. This is blatantly wrong. You can’t just conclude that it’s “more likely than not”. With the Bayesian interpretation of probability, you have to come to the conclusion that it is 1000 times more likely that Michelangelo was uniquely meritorious than not. This is the standard that must be met for an innovator to satisfy the Zt1 conditions of innovation.

Individual Examples

Harry was wondering if he could even get a Bayesian calculation out of this. Of course, the point of a subjective Bayesian calculation wasn’t that, after you made up a bunch of numbers, multiplying them out would give you an exactly right answer. The real point was that the process of making up numbers would force you to tally all the relevant facts and weigh all the relative probabilities. Like realizing, as soon as you actually thought about the probability of the Dark Mark not fading if You-Know-Who was dead, that the probability wasn’t low enough for the observation to count as strong evidence. One version of the process was to tally hypotheses and list out evidence, make up all the numbers, do the calculation, and then throw out the final answer and go with your brain’s gut feeling after you’d forced it to really weigh everything.

– Eliezer Yudkowsky in Harry Potter and the Methods of Rationality Chapter 86

Now we have to evaluate $P(E|H_{0})$. This is the most important and most subjective portion of the argument (although you could argue any of these numbers are subjective, but that’s less contentious than what I’m about to get into). The evaluation dives into the details of the innovator, and his competitors. It requires knowledge of the works in the field at the time of the innovators life, knowledge of the works of the innovator, and knowledge of how these innovations were levered and through what means. You have to gather all the body of evidence E that you can to address this claim.I will only be making claims on rough order of magnitude estimates, but I hope to convey that the claims I make are absolutely reasonable given the evidence at hand.

I have a post with my arguments for $P(E|H_{0})$ – the degree of belief that another innovator could have established similar results – for Michelangelo. I think after reading this you can formulate your own conclusions for any innovator of choice. I have thoughts on covering more (Einstein, Jobs, Alexander the Great, the Beatles) but haven’t committed time to filling those out yet. If you’re honest with yourself and diligent in your research, I think you’ll find that most fail to meet the Zt1 conditions. I’m open to the notion that some innovators really were truly unique in their time, but we must realize exactly how hard it is to really justify that claim.

My goal with this argument is not to convince you that there are no real heroes, or to rob you of role models. I seek a model of innovation that is as accurate as possible. We should not overlook a number of cognitive heuristics that come into play with the formulation of the Zt1 model of innovation – anchoring on fame and influence, the illusory truth effect, fundamental attribution error, and perhaps most of all the base rate fallacy. I’m also playing around with the idea that there’s some fundamental human inclination to hero worship, although that’s for another time. We should also recognize there is incentive from established innovators to propagate the Zt1 model of innovation. When an innovation is believed to be a “zero to one” event, rather than a nudge above the competition, it’s that much easier to market and profit from it.

My aim is not to present a bleak view of the world in which only external forces determine who changes the world. I do believe in the importance of individual virtues – and I suggest that the best way to maximize your chances of making an impact is to recognize the true drivers of innovation.