Modern AI – ‘machine learning’ – enables software to perform difficult tasks more effectively by learning through training instead of following sets of rules. Deep learning, a subset of machine learning, is delivering breakthrough results in fields including computer vision and language processing.
Explore our AI Playbook, a blueprint for developing and deploying AI, at www.mmcventures.com/research.
Coined in 1956, by Dartmouth Assistant Professor John McCarthy, Artificial Intelligence (AI) is a broad term that refers to hardware or software that exhibit behaviour which appears intelligent. AI is “the science and engineering of making intelligent machines, especially intelligent computer programs” (John McCarthy).
“AI is a general term that refers to hardware or software that exhibit behaviour which appears intelligent.”
Basic AI has existed for decades, via rules-based programs that exhibit rudimentary displays of intelligence in specific contexts.
‘Expert systems’ were a popular form of early AI. Programmers codified into software a body of knowledge regarding a specific field and a set of rules. Together, these components were designed to mimic a human expert’s decision-making process.
SRI International’s PROSPECTOR system of 1977 (Fig. 1) was intended to assist geologists’ mineral exploration work. Incorporating extensive subject matter information and over 1,000 rules, the system was designed to emulate the process followed by a geologist investigating the potential of a drilling site (Fig. 2).
While expert systems experienced some success (PROSPECTOR predicted the existence of an unknown molybdenum deposit in Washington State) their capabilities were typically limited.
Source: SRI International
Source: SRI International
Rules-based systems are limited – because many real-world challenges are too complex, or subtle, to be solved by programs that follow sets of rules written by people. Providing a medical diagnosis, operating a vehicle, optimising the performance of an industrial asset (Fig. 3) and developing an optimised investment portfolio are examples of complex problems. Each involves processing large volumes of data with numerous variables and non-linear relationships between inputs and outputs. It is impractical, and frequently impossible, to write a set of rules – such as a set of ‘if…then’ statements – that will produce useful and consistent results.
Source: Alamy
What if the burden of finding solutions to complex problems can be transferred from the programmer to their program? This is the promise of modern AI.
Excitement regarding modern AI relates to a set of techniques called machine learning, where advances have been significant and rapid. Machine learning is a sub-set of AI (Fig. 4). All machine learning is AI, but not all AI is machine learning.
Machine learning shifts much of the burden of writing intelligent software from the programmer to their program, enabling more complex and subtle problems to be solved. Instead of codifying rules for programs to follow, programmers enable programs to learn. Machine learning is the “field of
study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel).
Machine learning algorithms learn through training. In a simplified example, an algorithm is fed inputs – training data – whose outputs are usually known in advance (‘supervised learning’). The algorithm processes the input data to produce a prediction or recommendation. The difference between the
algorithm’s output and the correct output is determined. If the algorithm’s output is incorrect, the processing function in the algorithm changes to improve the accuracy of its predictions. Initially the results of a machine learning algorithm will be poor. However, as larger volumes of training data are provided, a program’s predictions can become highly accurate (Fig. 5).
Source: MMC Ventures
Source: Michael Nielsen. Note: The size of data set required to train a machine learning algorithm is context dependent and cannot be generalised
The defining characteristic of a machine learning algorithm, therefore, is that the quality of its predictions improves with experience. Typically, the more relevant data provided to a machine learning system, the more effective its predictions (up to a point).
By learning through practice, instead of following sets of rules, machine learning systems deliver better solutions than rules-based systems to numerous prediction and optimisation challenges.
There are more than 15 approaches to machine learning. Each uses a different form of algorithmic architecture to optimise predictions based on input data.
One, deep learning, is delivering breakthrough results in new domains. We explain deep learning below. Others receive less attention – but are widely used given their utility and applicability to a broad range of use cases. Popular machine learning algorithms beyond deep learning include:
Each approach offers advantages and disadvantages. Frequently, combinations are used (an ‘ensemble’ method). In practice, developers frequently experiment to determine what is effective.
Machine learning can be applied to a wide variety of prediction and optimisation challenges. Examples include: assessing whether a credit card transaction is fraudulent; identifying products a person is likely to buy given their prior purchases; and predicting when an industrial asset is likely to
experience mechanical failure.
“The defining characteristic of a machine learning algorithm is that the quality of its predictions improves with experience.”
Even with the power of general machine learning, it is difficult to develop programs that perform certain tasks well – such as understanding speech or recognising objects in images.
In these cases, programmers cannot specify the features in the input data to optimise. For example, it is difficult to write a program that identifies images of dogs. Dogs vary significantly in their visual appearance. These variations are too broad to be described by a set of rules that will consistently enable correct classification (Fig. 6). Even if an exhaustive set of rules could be created, the approach would not be scalable; a new set of rules would be required for every type of object we wished to classify.
Deep learning is delivering breakthrough results in these use cases. Deep learning is a sub-set of machine learning and one of many approaches to it (Fig. 7). All deep learning is machine learning, but not all machine learning is deep learning.
Source: Google images
Source: MMC Ventures
“Even with the power of general machine learning, it is difficult to develop programs that perform certain tasks well – such as understanding speech or recognising objects in images.”
Source: MMC Ventures
Deep learning is valuable because it transfers an additional burden – the process of feature extraction – from the programmer to their program (Fig. 8).
Humans learn to complete subtle tasks, such as recognising objects and understanding speech, not by following rules but through practice and feedback. As children, individuals experience the world (see a dog), make a prediction (‘dog’) and receive feedback. Humans learn through training.
Deep learning works by recreating the mechanism of the brain (Fig. 9) in software (Fig. 10). With deep learning we model the brain, not the world.
To undertake deep learning, developers create artificial neurons – software-based calculators that approximate, crudely, the function of neurons in a brain. Artificial neurons are connected together to form a neural network. The network receives an input (such as a picture of a dog), extracts features and offers a determination. If the output of the neural network is incorrect, the connections between the neurons adjust to alter its future predictions. Initially the network’s predictions will frequently be incorrect. However, as the network is fed many examples (potentially, millions) in a domain, the connections between neurons become finely tuned. When analysing new examples, the artificial neural network will then make consistently correct determinations.
“To undertake deep learning, developers create artificial neurons – software-based calculators that approximate, crudely, the function of neurons in a brain.”
Source: iStock
Source: MMC Ventures
Deep learning has unlocked significant new capabilities, particularly in the domains of vision and language. Deep learning enables:
Deep learning is not suited to every problem. Typically, deep learning requires large data sets for training. Training and operating a neural network also demand extensive processing power. Further, it can also be difficult to identify how a neural network developed a specific prediction – a challenge of ‘explainability’.
However, by freeing programmers from the burden of feature extraction, deep learning has delivered effective prediction engines for a range of important use cases and is a powerful tool in the AI developer’s arsenal.
“Deep learning has delivered effective prediction engines for a range of important use cases and is a powerful tool in the AI developer’s arsenal.”
Source: Museum of Computer Science, MTV, CA
Source: Google / Pixel Buds
Deep learning involves creating artificial neural networks – software-based calculators (artificial neurons) that are connected to one another.
An artificial neuron (Fig. 13) has one or more inputs. The neuron performs a mathematical function on its inputs to deliver an output. The output will depend on the weights given to each input, and the configuration of the input-output function in the neuron. The input-output function can vary. An artificial neuron may be a:
• linear unit (the output is proportional to the total weighted input);
• threshold unit (the output is set to one of two levels, depending on whether the total input is above a specified value);
• sigmoid unit (the output varies continuously, but not linearly as the input changes).
An artificial neural network (Fig. 14) is created when artificial neurons are connected to each other. The output of one neuron becomes an input for another.
“An artificial neural network is created when artificial neurons are connected together. The output of one neuron becomes an input for another.”
Source: MMC Ventures
Source: MMC Ventures
Neural networks are organised into multiple layers of neurons (Fig. 15) – hence ‘deep’ learning. An input layer receives information to be processed, such as a set of pictures. An output layer delivers results. Between the input and output layers are layers referred to as ‘hidden layers’ where features are detected. Typically, the outputs of neurons on one level of a network all serve as inputs to each neuron in the next layer.
Source: MMC Ventures
“Neural networks are organised into multiple layers of neurons – hence ‘deep’ learning. An input layer receives information to be processed, such as a set of pictures. An output layer delivers results.”
Fig. 16 illustrates a neural network designed to recognise pictures of human faces. When pictures are fed into the neural network, the first hidden layers identify patterns of local contrast (low-level features such as edges). As images traverse the hidden layers, progressively higher-level features are identified. Based on its training, at its output layer the neural network will deliver a probability that the picture is of a human face.
Typically, neural networks are trained by exposing them to a large number of labelled examples. As errors are detected, the weightings of the connections between neurons adjust to offer improved results. When the optimisation process has been repeated extensively, the system is deployed to assess unlabelled images.
The structure and operation of the neural network below is simple (and simplified), but structures vary and most are more complex. Architectural variations include: connecting neurons on the same layer; varying the number of neurons per layer; and connecting neurons’ outputs into previous layers in the network (‘recursive neural networks’).
It takes considerable skill to design and improve a neural network. AI professionals undertake multiple steps including: structuring the network for a particular application; providing suitable training data; adjusting the structure of the network according to progress; and combining multiple approaches to optimise results.
Source: MMC Ventures, Andrew Ng