Estimated Reading Time: 16 minutes
Algorithms to Live By is the work of Brian Christian and Tom Griffiths. The book is a mélange of cognitive science, psychology and economics. The authors are familiar with this interdisciplinary territory as Brian studied computer science and psychology before going on to graduate work in English. Tom studied psychology and statistics before becoming a professor at UC Berkely.
First of all, I have to say the book is not an easy read. It is filled with mathematical formulas, algorithms, and words that make you stretch for the dictionary every 15 minutes. The book doesn’t quite live up to its name, and that’s mainly because the writers didn’t manage to get the balance right between computer science and the real-life application of algorithms. After the first chapters, Brian and Tom tend to emphasize too much on the scientific aspect of the book and it may get exhausting and baffling for the readers. However, it is a book I recommend reading as it enhances both your general knowledge and your vocabulary. There are also a few actionable ideas in the book that I will briefly mention further in the article.
Secondly, we should get things straight. The book is not about the hidden algorithms the human brain uses, but it rather introduces engineered computer algorithms in the context of day-to-day life.
Optimal Stopping: When to stop looking
Any dynamic system subject to the constraint of space and time is up against a core set of fundamental and unavoidable problems. By not considering computers merely our tools some pieces of wisdom can come out of this that we can apply to our decision taking processes and in our daily life.
The 37% Rule, the Least Recently Used criterion for handling overflowing caches, and the Upper Confidence Bound, for example, can simply be transferred over to our human problems.
Using an optimal algorithm should be a relief if you don’t get the results you are looking for.
The first thing you learn from the book is how to apply the 37% rule in a looking and leaping process. When you are looking for a new apartment or dating, you are trying to find the right balance between establishing your standards (looping) and satisfying the standards you have set up (leaping). When is a good time buy the apartment or to settle down and marry?
In most cases, we have no idea how to assign scores to individual applicants or lovers. We use ordinal numbers to relatively rank lovers and applicants instead of rating their skills using cardinal numbers.
The 37% rule derives from optimal stopping’s most famous puzzle, which has come to be known as the “secretary problem.”
Leave your checkbook at home 37% of the time spent searching for houses and just set your standards. The same challenge also appears in even more fraught settings such as dating or hiring people. Assuming that you date between the ages 18 to 40, the rule suggests it’s best to settle down at 26,1 years old. Interview the first 37% of the applicants for a job and after that hire the first applicant that meets your standards.
Although this is the best strategy it has a 63% failure rate. It is a sobering fact but there is a silver lining. If we were to hire at random in a pool of a hundred applicants we’d have a chance of 1% success and in a pool of a million applicants, we’d have a 0.0001% chance. Thus the bigger the applicant pool gets, the more valuable knowing the optimal algorithm becomes. If we have an objective criterion: if we give the applicants a test or you are a gold digger looking to marry a rich man than we apply the Threshold Rule, where we immediately accept an applicant if she is above a certain percentile. For example, we will take into consideration the net worth of the husbands-to-be or the score from the test we handed out to applicants.
Explore/Exploit: The Latest vs. the Greatest
Every day we are constantly forced to choose between trying new things or sticking with our favorite ones because life is a balance between novelty and tradition, between the latest and the greatest. The unanswered question is: What is the proper balance?
Age-worn aphorisms acknowledge this scenario but they fail to provide a concise solution to the problem.
For a computer scientist, exploration is gathering information, and exploitation is using the information you have to get a good result. Never exploring is no way to live. However, “carpe diem” is overrated and somewhat self-contradictory as we can also have its inverse: “Start learning a new language, make small talk with a stranger, because life is long, and who knows what joy could blossom over many years’ time”.
More important than the balance between exploration and exploiting is the interval over which we plan to enjoy them. We are more likely to try a new restaurant when you move to a city than when you leave it. You are more likely to spend more time with you family and existing friends than meeting new people as the years pass by.
In order to solve the problem Tom and Brian perceive the exploration/exploitation problem in the form of a scenario called “the multi-armed bandit”. The odd name comes from the slang word used for slot machines because this is another situation where we are forced to choose between staying at the same slot machine because it paid off once or trying a new one that is going to provide consistent winnings as well.
One of the solutions for the “multi-armed bandit’ is Win-Stay, Lose-Shift. Although this simple strategy is far from a complete solution it does perform reliably better than chance.
The Gittins Index is another way you can approach the “multi-armed bandit” problem. Gittins conceived the goal as maximizing payoffs not for a fixed interval of time, but for a future that is endless yet discounted. The value assigned to payoffs decreases geometrically; each restaurant visit you make is worth a constant fraction of the last one.
Regrets, Optimism and Upper Confidence Bound
Regret is a result of comparing what we actually did with what would have been best in hindsight.
The most popular algorithm that offers the guarantee of minimal regret is known as Upper Confidence Bound.
Upper Confidence algorithms implement a principle dubbed as “optimism in the face of uncertainty”. Optimism can be perfectly rational if we focus on the best an option can be, given the data that we already know. This kind of algorithms injects a dose of exploration into the decision-making process A slot machine that has paid off once out of two pulls will have a higher “confidence interval” that one that paid off five out of ten. Using the same principle, a restaurant that you have never been to for all you know can be better than your favorite one.
Unlink most animals that run away from predators from the day they are born humans require a longer period of time until they become autonomous and competent. We use the extended period of dependence, the childhood, to as a way of solving the exploration/exploitation trade-off and like the most successful algorithms we tend to explore longer at the beginning.
If we stop to think about long term effect of our decisions and not as if they were our last ones we will soon realize that exploration makes sense.
As we become older we tend to maximize social and emotional gains by reducing our number of friends and trying to spend more time with our family. We tend to exploit more as a rational response to having less time to enjoy things in life.
Sorting: Making Order
The sorting function was in many ways what brought the computer into being. The US Government created an automatic inventor entitled Hollerith to use it for the 1980 census of the population. The system used punched manila cards to store the information , count and sort. The company that invented the machine merged with several others in 1911 and became International Business Machines or IBM.
We refer to Google as a search engine but that is somehow of a misnomer. It stood the test of time due to its ability to sort the results in such a manner that they are relevant to the users.
The Bubble Sort: When you want to alphabetize a collection of books and you skim the shelves for misplaced books, flip them around and continue until the end of the shelf. Start again until there are no books out-of-order. This is the bubble sort and it is a maximum of n passes through n number of books: O(n)². It’s better than shuffling all the books together until maybe we get lucky and they are sorted – O(n)ⁿ.
The Insert Sort: You pull all the books off the shelf and you put them back one by one. You put the first book in the middle of the shelf and then you compare the rest with the first one, inserting them either to the left or to the right. It’s better than the Bubble Sort but it’s not that much faster.
Mergesort: You split the books into pairs and start sorting each pair. Then you collate the pairs into ordered set of fours, and finally, collate those sets to get a fully sorted shelf. The power of Mergesort comes from the fact that it’s a complexity between linear and quadratic time – O(n log n). Mergesort is the best sorting algorithm if we want to sort n items via a series of head-to-head comparisons.
Mergesort is as important in the history of sorting as sorting in the history of computing.
Bucket Sort: If you don’t want a fully ordered set you can sort your books without any item-to-item comparison in O(n) – linear time. For example, Preston Sort Center, groups books in buckets based on their branch destination.
Comparison Counting Sort: each item is compared to all others, generating a report of how many items it is bigger than. This number can be used directly as the item’s rank. The first example of a Comparison Counting Sort that I could think of was the Alexa Ranking of websites.
Computer Science is all about tradeoffs. Exploration versus exploitation but also about sorting versus searching. We must make a thorough assessment and decide if the time expended sorting is just a preemptive strike against the effort it will take to search. In some cases, the precise balance inclines to search. Sometimes ordering your books will take more time and energy than just scanning through it ever will.
Sorting in sports
Very few know that Charles Lutwidge Dodgson, better known by his pen name Lewis Caroll, had mathematical talents as well. In one of his works, “Lawn tennis tournaments : the true method of assigning prizes with a proof of the fallacy of the present method” he complaint about the fact that in single elimination tennis tournaments the true second player could be any of the players eliminated by the best- not just the last eliminated one. The silver method is a lie. The chance that the second best player gets the prize is only 16/31st.
The most comprehensive sorting algorithm in sport is Round-Robin where each player/team plays every one of the other n-1 teams. This is the most laborious solution but it solves the Single Elimination algorithm problem.
Sorting in sports is different because sports leagues are explicitly designed not to determine the rankings as quickly and expeditiously as possible but to maintain tension throughout the season. This is the case of ladder tournaments and bracket tournaments.
Caching: Forget about it
Computers regularly must make tradeoffs between size and speed. Maurice Wilkes was the first one to realize that we can deliberately hold on to pieces of information that are likely to be needed later, anticipating similar future requests therefore dramatically speeding up processes. This idea was implemented in the IBM 360/85 computer later in the 1960s and it acquired the name “cache”. Laszlo Belady, a Hungarian immigrant who arrived in the US with just 1,000$ and his diploma in his pocket, would write a paper that would become a cited piece of computer science research about caching. After all, this man really knew what to leave behind and what to take with him.
Because the cache can only take a small fraction of the memory the data has to be overwritten and a cache eviction policy takes place. But how do we choose what to evict?
We could either try the Random Eviction, the First-In, First-Out (FIFO) or the Least Recently Used (LRU). The last one mentioned clearly outperforms the first two eviction policies.
If you are trying to decide which clothes to throw away then LRU is an optimal solution for you. Throw away the clothes that you wore least. If you jog regularly then it is a good idea to leave your HR belt and running equipment somewhere near the front door.
Michael Ramscar at the University of Tubingen has suggested that the “cognitive decline”, having a hard time remembering things as we get old is an unavoidable consequence of the amount of information we have accumulated. Forgetting certain things might just be our LRU cache and not a sign of a failing mind.
Scheduling: First Things First
When it comes to scheduling these are the most popular algorithms:
Earliest Due Date – It is fairly intuitive. You schedule tasks based on their due dates and you don’t take into consideration the required time to perform the task. You handle customers in order of their arrival.
Moore’s Algorithm: If you try to minimize the number of foods that spoil this is the most optimal algorithm. We apply the due date algorithm but if we see there is a product that will take too much time to consume we through that out and resume to the next product with an expiry date. This way we make sure we consume most of the products before they expire.
Shortest Processing Time: You always do the quickest task you can. This algorithm shrinks the total number of tasks and may bring some relief. Moreover, it is best to always perform right away any task that takes less than two minutes to be done. You must prioritize a task that takes twice as long to perform than another only if it is twice as important.
Interrupt Coalescing: Life is a constant tradeoff between responsiveness and throughput: how quick you can perform tasks and how many you can complete.
If you have several bills to pay it’s best not to pay them as they arrive but to set a date when you will pay all of them as long as none of them are due prior to that date. Computers themselves perform in this certain way – they wait until some fixed interval to check everything.
The Copernican Principle
There are times in life when we must make assumptions based on a single data point. When we arrive at the bus station we wonder when will the next bus arrive. The only piece of available data may be the last time the bus arrived or the station or an estimated such time based on the number of people waiting for it.
When J. Richard Gott arrived at the Berlin wall he asked himself for how long will the wall stand. He assumed the moment he encountered the wall was not a particular moment in life and that he could arrive halfway into the time span of the wall. So it is likely to fall exactly as long as it has lasted already.
This straightforward principle, which he named it the Copernican Principle is a simple algorithm that can be used to make various predictions. But of course, we can’t assume that a 90 years old man is going to live 180 years because we already know a lot about human life spans. So the richer the prior information is the more accurate we can make predictions.
Human life spans have a normal distribution. The average life span is 76 years old: a single digit life span is considered tragic, a double digit life span normal and a three digit something extraordinary.
This distribution doesn’t apply to everything in the real life. Take for example the box office grosses. Some movies rack in petty cash while occasionally we have movies like titanic that break all income records.
This pattern is something that we call a “power-law-distribution“. or a “scale-free-distribution.
For any power-law distribution, we must apply the Multiplicative Rule: we multiply the results observed so far with a constant. If for the normal distributions the constant was 2, for the gross of movies the constant is 1.4 for example. If a movie grossed 10 million USD so far we can predict it will top out 14 million in the end.
The is a third category, the Erlang distribution that is used mainly by urban planners and architects to model car and pedestrian traffic. In this distribution, the events are completely independent of one another. This distribution gives us the Additive Rule: always predict that things will last longer for a constant amount of time.
Al these three patterns result directly from applying Reverand Thomas Baye‘s rule.
Knowing what kind of distribution you are facing can make a real difference. When Harvard biologist, Jay Gould, discovered he had been diagnosed with cancer he found out that half of all patients with his form of cancer died within eight months after being diagnosed. But that didn’t tell him anything about the distribution of those survivors. He soon realized that there was a power-law distribution of survivors with a tail that stretched far out the right. He saw no reason why he shouldn’t be in the small tail that extended for several years above the eight months median. He lived for 20 years more years.
Overfitting: When to think less
Why is it that the foods that taste so good are so bad for us? The answer is that the taste is our body’s proxy metric for health. Fat, sugar and salt are important nutrients for the human body and for hundred thousands of years we were attracted to foods containing them was good for a sustainable diet. In recent history, the food was heavily modified and all these nutrients were added in excess beyond reasonable amounts. We no longer eat fruits, grains and meat that traditionally made up the human diet. When overfitting creeps in, it can lead to disastrous outcomes.
So you have to be very careful of what you incent people to do, because various incentive structures create all sorts of consequences that you can’t anticipate.
At a job placing company they gave incentives to employees using the number of interviews, they conducted as a criterion. Employees started to run through interviews as quick as possibly and they forgot their role was to help the interviewees get a job. There are stories of police officers that were shot during standoffs with criminals because they wasted time gathering bullet cases from the ground because that’s what they were taught at training. The started to overfit.
The best way to detect overfitting is something that is called Cross-Validation. Simply put, Cross Validation assesses not only how a model fits the data it was given but also the data that it hasn’t received yet.
When you start designing something you should plan with a sharpie because thick lines in writing ensure you that you don’t waste times for insignificant details as it would happen with a ball pen. This advice was borrowed from the authors from Jasion Frieds’ book – ReWork.
Relaxation: Let it slide
When you encounter a problem that seems to be intractable it’s best to apply the Constraint Relaxation Principle. Remove some of the problem’s constraints and obtain the problem you wish you had. Solve that problem and from there on maybe you will find it easier for you to solve the initial problem.
Randomness: When to leave it to chance
Sometime randomized approaches can outperform even the best deterministic ones. Sampling can provide answers where analysis fails.
One of the oldest problems of all time was how to determine all prime numbers. Sieve of Eratosthenes came up with the first approach.
Gary Miller an undergraduate at MIT developed a faster algorithm for testing primality but it didn’t always provide a correct answer. Michael Rabin, a Turing Award winner (a CS equivalent of a Noble prize), soon discovered that the outcome was incorrect no more than one-quarter. So if he applies the same algorithm again the probability of a number not being prime drops by another multiple of four. Repeating the same procedure ten times and the probability of a false result is one in four to the 10th power. Check another five times and the possibility of a false result is one in a billion. This is a step outside the traditional deterministic world of computer science used in order to solve a supposed tractable problem.
The study of prime numbers was the most useless branch of mathematics but now it is used worldwide to encrypt banking.
Networking: How we connect
Human connections lay on a foundation of protocols: gestures of assent, handshakes and hellos to politesse. Machine connections are somewhat very similar. The internet is also a collection of many protocols with the most common of them known as Transmission Control Protocol (TCP).
The first telegraph message was the bible quote WATH HATH GOD WROUGHT but that was also the second one. The message was sent back to Morese as a confirmation of receipt.
The Tragedy of the Commons
The ecologist Garreth Hardin applied the two prisoner’s dilemma and hypothetically scaled it to the size of a village. A “commons” of public lawn was available to the villagers with the condition that each of them used only the amount required for their livestock. Everybody took a little more because they considered the consequences too small and that their action alone could not destroy the lawn alone. Yet if everybody acted this way it all would result in a disastrous lawn.
This is exactly what happened in the US with the leaded gasoline that was 10 cents cheaper but a hazard for the environment. Nobody considered he was harming the environment as long as so many people were using it.
Information Cascades: The tragic rationalities of bubbles
Usually, it is good for us to pay attention to the way others behave and add their knowledge to our own. Most of the time a restaurant that is always full provides excellent food. On the other hand, it is not always rational
If we always use other people’s actions as guidance sometimes economic disasters such as the 2007-2009 mortgage crisis can occur. In order to understand financial bubbles easier, we should take a look in the bidding process during auctions.
The sealed-bid first-price: each participant writes down his bid in a sealed envelope and the one with the biggest offer wins. In this type of auction, the winner overpays for the goods he purchases most of the time.
The Dutch auction:the price gradually decreases until one participant is willing to pay for the goods being auctioned. Dutch auctions are more common than you would think. A local shop marking down its unsold items with discounts is one example of a decreasing auction.
The English auction: this is the most familiar auction. Bidders alternate raising the price until all of them but one drop out.
In these types of auction, you don’t know the other auctioneers’ beliefs, only their biddings. There is a slight chance that they don’t value the products accordingly and they just adjust their bidding according to yours while you are doing the same thing. This will irrationally increase the price of the items being sold. This behavior is similar to the tragedy of the commons.
A biology text, The Making of a Fly was being sold on Amazon in April 2011 for 23,698,655,93 (plus $3.99 shipping). How was this possible? Two of the sellers had the prices automatically adjusted by a fraction of each other and they did not set a limit. It resulted in a spiral out of control. It is possible that the same thing occurred in the flash crash of the stock market in 2010.
The solution: The Vickrey auction
Nobel prize winning economist, William Vickrey invented a type of auction similar with the first-price sealed bidding. Every auctioneer writes down in a sealed envelope his offer but the winner doesn’t pay up the amount of their own bid but that of the second-place bidder.
In this type of auction, you tend to bid your true value. Bidding a lot more than your competitors is clearly silly while bidding less doesn’t save you money as you will pay the second-place bidders offer.
All the good algorithmic approaches I have mentioned above can simply be transferred to how humans approach decisions and in their life. We should always try to use the most optimal algorithm even if we don’t always get the results we are looking for.