The Master Algorithm

Rating: 9/10

Author: Pedros Domingos Read The Original

Recommended By

Bill Gates, Peter Norvig, and Walter Issacson



High-Level Thoughts

The book is extremely insightful if you have previous experience in the machine learning space. You don't need to understand the complex math behind the classic algorithms such as support vector machines or k-nearest neighbors, just a broad understanding of what they do. If you're interested in a more easy to read introduction to machine learning and how it will shape our economics I highly recommend Prediction Machines. The book goes into what Domingos believe will make up the algorithm that will rule all other algorithms and change every facet of our lives. It's not like a boring textbook, but more of a story of the different tribes of researchers in the world of artificial intelligence and what the tactics they use and believe will shape the future.



The Master Algorithm Summary and Notes

A learning algorithm is like master craftsman, every one its productions is different and exquisitely tailored to match the customer's needs.

CH 1 The Machine Learning Revolution

  • Every algorithm can be reduced to just three operations: AND, OR and NOT
  • If programmers are minor gods, the complexity monster is the devil himself.
  • Learning algorithms are the seeds, data is the soil and the learned programs are the grown plants. The Machine Learning expert is the farmer.
  • Machine Learning is the subfield of Artificial Intelligence.

CH 2 The Master Algorithm

  • The Master Algorithm -> All knowledge - past, present, and future - can be derived from data by a single, universal learning algorithm.
  • Our brain is an example of the master algorithm, so the key is to reverse engineer the brain to unlock the key to creating the algorithm.
  • God created not species but the algorithm for creating species - Charles Babbage
  • Evolution is another path to creating the master algorithm.
  • Maybe everything is just an overarching optimization problem.
  • Bayes theorem is a machine that turns data into knowledge.
  • One definition of 🦾 Artificial Intelligence is that it consists of finding heuristic solutions to NP-complete problems.
  • The Master Algorithm will combine research from the five tribes: symbolists, connectionists, evolutionaries, bayesians and analogizers.

CH 3 Hume's Problem of Induction

  • Rationalists believe that the senses deceive and that logical reasoning is the only sure path to knowledge. Ex: Mathematicians, Lawyers, Computer Science Theorists.
  • Empiricists believe that all reasoning is fallible and that knowledge must come from observation and experimentation. Ex: Hackers, Machine Learners, Journalists, Doctors and Scientists.
  • The machine-learning problem: generalizing to cases that we haven't seen before.
  • "Learning is forgetting the details as much as its remembering the important parts"
  • data mining - torturing the data until it confesses
  • Overfitting is seriously exacerbated by noise. Overfitting hapeens when you have too many hypotheses and not enough data to tell them apart.
  • A cell is like a tiny computer and the DNA is the program running on it. Change the DNA and a skin cell can becomes a neuron.
  • Inverse deduction is very computationally intensive and has a "all or none" logical character.
  • A decision tree is like playing a game of 2- questions with an instance.
  • The symbolists' core belief is that all intelligence can be reduced to manipulating symbols. They are the shortest path to the master algorithm because it doesn't require us to figure out how evolution or the brain works, and avoids the mathemtatical complexities of Bayesianism.

CH 4 How Does Your Brain Learn?

  • Hebb's Rule is the cornerstone of connectionism. Neurons that fire together wire together.
  • A Boltzmann machine has a mix of sensory and hidden neurons (analogous to the brain and the retina)
  • Children's learning is not a steady improvement but an accumulation of S curves.
  • back propogration is the the connectionists' master algorithm.
  • Back prop in theory could learn a detailed model of the cell, with a multilayer perceptron to predict each variable as a function of its immediate causes.
  • One of the keys of deep learnings resurgence in this decade is the Autoencoders -> a multilayer perceptron whose output is the same as its input. Not unlike a file compression tool, and it can turn a noisy, disitorted image into a nice clean one.

Ch 5 Evolution: Nature's Learning Algorithm

  • Genetic algorithms are less likely than backprop to get stuck in local optimum and in principle better able to come up with something truly new.
  • exploration-exploitation dilemma -> Choose between repeating thebest move you've found so far, or trying to other moves, which gather information that may lead to even better payoffs.
  • A genetic algorithm is like the ringleader of a group of gamblers, playing slot machines in every casino in town at the same time.
  • The key is combining the evolution of the brain with neural learning. (Structure learning + weight learning)
  • The Baldwin effect -> widen the fitness peak. Individual learning can influence evolution. This speeds up the evolution process.

CH 6 In the Church of the Reverend Bayes

  • Bayes theorem is just a simple rule for updating your degree of belief in a hypothesis when you receive new evidence: if the evidence is consistent with the hypothesis, the probability of the hypothesis goes up, if not, it goes down.
  • P(cause | effect) = P(cause) x P(effect|cause)/P(effect)
  • Probability is not a frequency but a subjective degree of belief (Bayseian)
  • To avoid combinatorial explosion is to think of the variable as independent given the cause -> Naive Bayes Classifier
  • Naive Bayes scales great
  • Google's page rank and siri both use the Markov chain
  • a Bayseian Network is a "generative mode:, a recipe for probabilistically generating a state of the world. It tells a story. A happened which led to B; at the same time C also happened.
  • Google Ads uses a giant Bayseian Network for automatically choosing ads to place on web pages and Microsoft's xbox live uses a bayesian network to rate players and match players os similar skill. We can put a prior distribution on any class of hypotheses including neural networks.
  • A Markov Network is a set of features and corresponding weights, which together define a probability distribution. They are a staple in many areas such as computer vision.

CH 7 You are what you Resemble

  • Analogizers can learn from as little as one example because they never form a model.
  • The nearest-neighbor algorithm is one example and Support Vector Machines are another example.
  • collaborative filtering system -> people who agreed in the past are likely to agree agin in the future. (Netflix)
  • The curse of dimensionality -> as the number of dimensions goes up, the number of training examples you need goes up exponentially.
  • with SVMs we can learn smooth frontiers. In an SVM the active constraints are the support vectors since their margin is already the smallest it's allowed to be.
  • SVMs weights have a single optimum instead of many locals ones and so learning them reliably is much easier.
  • The most important part of analogical learning is how to measure similarity.
  • The second part is figuring out what we can infer about the new object based on similar ones we've found.

CH 8 Learning Without a Teacher

  • K-means algorithm -> putting observations into clusters and updating judgement.
  • Dimensionality reduction is essential for coping with big data, example -> reducing thousands of pixels to a few features such as eyes, mouth, nose.
  • The whole process of finding principal components can all be accomplished in one shot with a bit of linear algebra.
  • Principal Component Analysis (PCA) is to unsupervised learning what linear regression is to supervised learning.
  • Reinforcement learning -> machine learning algorithm that explore on their own, flail, hit on rewards, and figure out how to get them again in the future.
  • However reinforcement learning cannot solve complex problems, make plans or require abstract knowledge. This is where chunking comes in a method of learning that is the key to the master algorithm.
  • IN relational learning, every future template we create ties the parameters of all its instances.

CH 9 The Pieces of the Puzzle Fall into Place

  • The Master Algorithm is the unifier of machine learning: it lets any application use any learner, by abstracting the learners .
  • It lets any application use any learner, by abstracting the learners into a common form that is all the applications need to know.
  • One of the cleverest metalearners is boosting, instead of combining different learners, [[boosting]] repeatedly applies the same classifier to the data, using each new model to correct the previous one's mistakes.
  • In computer science, a problem isn't really solved until it's solved efficiently.
  • Alchemy has initial knowledge as input besides just data in the form of logical formulas. Alchemy doesn't learn from scratch,it is like an inductive Turin machine, which we can program to behave as very powerful or a very restricted learner.
  • Alchemy provides a unifer for machine learning in the same way that the internet provides one for computer network.

Ch 10 This is the World on Machine Learning

  • The master algorithm will essentially create a digital half to do your bidding for each and every human being. It will be able to go on a million dates and job interviews for you.
  • Facebook -> as its learning algorithms improve, it gets more and more value out of the data.
  • A company that charges a subscription fee for aggregating all your data, and creating this ultimate model could become the most valuable company in the world.
  • a model of you based on all your data is much better than thousand models based on a thousand slivers.
  • Today, most people are unaware of both how much data about them is being gathered and what the potential costs and benefits are.
  • Algorithms can predict stock fluctuations but have no clue how they relate to politics. The more context a job requires, the less likely a computer will be able to do it.
  • Don't fight the revolution, ride it and use it to your advantage.
  • The real story of automation, is not what it replaces but what it enables. #automation
  • The chances that an Ai equipped with the Master Algorithm taking over the world is 0% -> computers don't have a will of their own.
  • Parallax Effect -> things that seem close, seem to appear faster.