Reinforcement learning, a growing focus of attention in the field of artificial intelligence, opens up new possibilities in the industrial sector.

Artificial intelligence often proceeds in indirect and unexpected ways. Machine learning, for example, has long used – and still widely uses – a method that consists in identifying input and expected output data. The algorithm learns from thousands or millions of labelled examples and uses them to connect images with categories or classes.

“A new method – reinforcement learning – overcomes that problem,” says Erik Lenten, Chief Technology Officer at Axians, the VINCI Energies ICT brand.

The fundamental difference between reinforcement learning and so-called supervised methods is the algorithm’s ability to iteratively try out or explore several solutions, observe the response of the environment and adjust its own behaviour to arrive at the best strategy. In other words, the machine learns autonomously, by trial and error.

The technique is based on a system of “rewards” in which the algorithm is penalised when it makes a mistake and rewarded when it takes the right decision. It thus optimises its own decision-making. The developer of the reinforcement learning model only needs to define the rules that determine whether the AI will be punished or rewarded.

Elon Musk’s video game

Amazon has developed a reinforcement learning prototype called AWS DeepRacer, a miniature autonomous race car that is required to “stay on the track”. It is penalised when it leaves the track and rewarded when it remains on it, while pursuing the objective of “going as fast as possible”. The experiment, which is open to developers around the world via an international championship, uses a 3D simulator that steadily improves the vehicle’s performance. You can train the model in the virtual simulator and when it is sufficiently trained, you can download it and race of a real track. The experiment also provides an understanding of reinforcement learning and enables developers to use the technique in their own software. In similar fashion, British startup Wayve taught an autonomous car to follow a straight line in just one day.

But the most telling example is probably OpenAI Five, developed by Elon Musk’s non-profit research organisation, which used reinforcement learning to train for the equivalent of 40,000 years to play the Dota 2 video game. Five is now able, by itself, to beat an entire team of professional Dota 2 players.

“In an industrial setting, reinforcement learning could be used to determine the best production parameters by trial and error.”

To what extent can reinforcement learning be used in the industrial sector? “Conceivably, the technique could be used in a production line, for example, to optimise the process while taking account of the interaction among different machines. Reinforcement learning could control settings and adjust decisions depending on results,” says Erik Lenten.

“Obviously,” adds the Axians CTO, “that process would be impossible in a real-life situation. But by building a digital twin of the production line, we could use simulations to determine the best production parameters by trial and error.”