Every day, a modern urban resident consumes about 2,000 kilocalories — if he does not eat or suffer from hunger — and spends about the same amount.
By comparison, the energy consumption of a state-of-the-art server processor , requires extensive data centres with dozens of racks filled with servers with such CPUs. The difference in energy efficiency is obvious — and it is not in favour of smart machines.
Moving forward in the development of artificial intelligence systems on an up-to-date hardware base, programmers, scientists and engineers increasingly notice that the computational systems that are used to us, based on the Von Neuman architecture, are not, in principle, capable of achieving the same energy efficiency as the biological brain, but that there may be several ways out of this impasse, including extremely promising quantum computers, which in themselves face a number of challenges.
The goal of building IA on the basis — more precisely, using it actively — is far more achievable.
♪ There are parallels ♪
Not only are humans, but animals with any developed nervous system are able to draw parallels between formally unrelated events — a phenomenon that has been known since Pavlov's time as a conditional reflex. Some worm or mollusk has neurobiologists who successfully form and then explore behavioral stereotypes — manifestations of how the animal reacts in a typical way to an earlier situation.
Humans use anathropologists in the process of understanding objective reality — to take at least symbolic images of hunting that are still recorded in rock paintings. On the basis of a study of modern tribes not affected by the influence of civilization, anthropologists argue that many ancient ancestors have portrayed hunting scenes not as a post-factum, but as a way to go for prey, believing that the painted spear of a painted deer would match the equally successful hunting trophy in reality.
The more complex ways of forming artificial analogies to real events are in the last centuries B.C.E., when the first known Greek policies have begun to appear in the first known to us as the only representative of this class of devices, the antiketeric mechanism, is dated II B.C.E., but in ancient texts references to such systems have been made in earlier times.
♪ Can you tell me what sarosanct is now?
At the turn of the 19th and 20th centuries, the island of Antiquiter in the Aegean had discovered the remains of an ancient ship that had sunk more than two thousand years ago, carrying a vast collection of art and other rares: Greek divers had raised several marble and bronze statues, remnants of luxurious furniture, ceramics and glass products, coins.
Not surprisingly, the staff of the Archaeological Museum of Athens, where the finds were moved, were primarily involved in the reconstruction of the most violent and attractive artifacts — without paying particular attention to the indistinct bronze debris produced with those from the bottom, heavily corroded and heavily covered by marine sediments. Only a couple of years later, Archaeologist Valerios Stais observed that in one of these debrises beneath the famous piles there was clearly a large, about 13 cm in diameter, a bronze wheel with many dozen teeth. A more detailed examination of the artifact revealed other elements of the structure that resembled the clock mechanism.
The scientist suggested that the discovery made by Antiketera was the earliest example of a mechanical device found to predict the movements of celestial bodies, but was immediately raised in laughter by his more skeptic colleagues. The majesty archaeologists stated authoritatively that the technology needed to produce such fine works was not available at ancient times, and that it was likely that there was a kind of anachronism, when a phenomenon dated earlier than when it actually happened. Mol, over the site of an ancient ship's ship crash, a more or less modern marine chronometer was dropped.
It took more than 20 years for a scientist, in collaboration with nuclear physicists, to use a scrubbing X-ray in all kinds of projections of the 82 remaining fragments of an unknown device from an anti-Kyterian treasure, and the crown of their works was the first publication on the antiquity, a mechanism for pre-calculating the sarosans and planet movements.
astronomers refer to a period of just over 18 years in which the Sun — Earth — Moon system repeats the mutual position of its continuously moving elements, which is reflected in the fact that, for a ground observer, exactly through the sarosus in a given area, it is reproduced in the same order, at the same intervals as before, the complete cycle of eclipses: 41 solar and 29 moons.
This pattern became known to the ancient Babylonians, and it was it that was embodied in the main wheels of the anti-Kyteric mechanism. Strictly speaking, the saros is made up of a whole number of days, namely 6585 and 1/3. Therefore, for convenience, ancient astronomers preferred to calculate a period equal to three locusts. In the early 2000s, researchers showed that it was exeligmos and that 37 of the main wheels of the Anti-Kitecterian machinery were produced.
Later, scientists argued that the 82 pieces found on the bottom represented only about a third of the total number of elements of the original mechanism — and that, in addition to predicting solar and lunar eclipses at a certain point in the earth's surface, it was possible to reproduce movements across the celestial sphere of all five known ancient planets: Mercury, Venus, Mars, Jupiter and Saturn.
Interestingly, the ideological heirs, though not direct, of the anti-Kyterian mechanism can be considered active anti-aircraft fire controls in the 20th century — electro-mechanical computations that can predict the position of a moving target in space as long as the beacon keeps it at the intersection of the vizier, and target the anti-aircraft weapon accordingly.
♪ Blow up, comrade!
In 1936, Soviet engineer Vladimir Lukyanov created a hydraulic integrator, the world's first computing machine to solve equations in private derivatives.
Lucyanov suggested that the appearance of cracks in the concrete was not primarily related to the designs of the enemies of the socialist homeland or to the inefficiencies of the workers, but to the particulars of the heat spread in the concrete pantry, but that the nature of the heat transitions within the concrete block might depend both on the temperature regime and the quality and composition of the mixture and on the technology to perform the work.
Theoretically, it is possible to write equations for all these relationships, but they are differential, and in private derivatives they are quite complex for a direct analytical solution. .Lukyanov turned to the works of the predecessors: previously it has been proven that one physical process could be replaced by another during the simulation if the equations describing them are identical. The engineer saw a direct correlation between the patterns of heat distribution in a well-different concrete block and the physical processes of water flow through a complex system of interconnected tubes with variable hydraulic resistance.
The Luqyanian method of hydraulic analogy has led to an extremely difficult mathematically difficult task of constructing a system of receptacles connected with tubes of different diameters, followed by observation of the laminar flow of water through the system. It will help to understand how this is being implemented by considering the very simple case of single-dimensional heat dissipation in a dense wall, one side of which is evenly heated.
The wall is broken into layers corresponding to the thermal resistance of the layer.
Once the initial conditions have been set, it is sufficient to open all the cranes simultaneously — and simply to observe how fast and where the levels of the liquid change. At any time, the cranes can be shut down to study the situation at a certain stage. Drum recorders were later added to the design, and in 1941 Lukyanov proposed a two-dimensional hydraulic integrator consisting of separate sections.
♪ The number beats the equivalent, the equivalent beats the figure ♪
The engineering finding was so successful that digital universal computers were able to compete with serial hydraulic analog computers to solve differential equations in private derivatives at the earliest in the early 1980s.
Because the laws of changing the functions of many variables in many areas are described by equations identical to hydrodynamics, hydraulic cells-based computations have started to appear around the world -- it's enough to remember the New Zealand MONIAC used to model economic processes in country markets. Yes, it is true that a number of models of critical macroeconomic processes are constructed using differential equations in private derivatives.
Today, the computing power of the Von-Neman architecture computers is many times greater than that of those computers almost half a century ago, but they also have to deal with fundamentally different tasks, namely, those that, in billions of years of evolution, have specialized in biological neural structures: image recognition, operations within the framework of illegitimacy, critical decision-making based on incomplete data.
The attempt to create direct artificial analogies of biological "thinking" structures is still far from being realized. It seems logical to model the performance of living neurons in the operational memory of the background of the nemanov systems, and this is already being done with success. But the main obstacle here is the already mentioned excessive energy intensity of the calculations made by classical semiconductor devices at the outset.
And just to improve the energy efficiency of digital machines that model neurons, it's proposed to use analog computing devices, but now it's going to be -- and it's going to be -- not mechanical aggregates, but devices that rely, like all processors in PCs or smartphones, on semi-conductor memory, but on semi-permanent, energy-dependent, not on operational ones, from which information disappears almost instantly after the power supply is shut down.
♪ What's wrong with Von Neuman?
The traditional structure of the computational semiconductor device to date involves the physical separation of the processor from memory. The software and data, in accordance with the principles of von Neuman, are stored and the processor consults the two data knots, both for the raw information for calculation and for the results of its calculations.
It is obvious that a narrow space, which is sometimes referred to even as the Von Neyman's bottleneck, is the capacity of an intracomputer data tyre. The IT industry's usual way of overcoming this tightness in decades — by steadily increasing the tactical frequency of the processor and dissolving its operations, as well as increasing the discharge rate and speed of the tyre itself — is excellent for a sure solution to a vast range of tasks. But the machine training that builds the current IE concept is not, unfortunately, fully integrated into this circle.
The point is that the basic principle of organizing machine learning is the simulation of biological neuronets, which is basically the same as . The memory of a computer is a virtual neurons with multiple entrances. The input signals are given to each neuron at different entrances, i.e. with different meanings. To be blunt, if it is necessary to determine whether the cat is shown in the picture, the weight of the signs "there is a tail", "there is a mustache", the "there are legs with claws" will be moderately positive, while the sign "there is a horn" will no doubt prove to be of enormous negative weight.
Let's look at the operation of a small area of neuronets in more detail. In theory, but in terms of the level of development, even the current best specimens of I.I. are already clear how productive this path is.
So it's rude to assume that receptors send electrical signals to a multilayered network of neurons, and after processing the latter, they release the effectors -- the signal transducers into physical action. You see the visual receptors record the switch to red -- the neural network compared the risk of time loss due to a minute delay and accident injury -- the effectors gave the engine system a command to slow down at the transition.
The neural network is represented in this model by a structure consisting of single-dimensional neural layers; each layer is a set of independent neurons. The neuron has multiple signal receptors, called and the only one called the output pulse channel. The point of contact of the two neurons is called the point of contact through this synaps, and the signal is sort of modularized by learning, increasing or, vice versa, weakening.
The model described thus represents each neuron as a calculation of the weighted amounts of signals coming to it from other neurons. The biological brain is much more complex, but in the mathematical model this complexity is offset by an important modification of the base neuron: instead of a strictly one axon, several exits are allowed. The result is that it is possible to combine a layer by layer in each of which a number of computes the weighted signals from each of the neurons of the previous layer and then transmits the results of this summation, also with weights, but already with the other layer.
A model for learning with one layer of virtual neurons is actually a checklist with a list of key features. By adding to the column all the specified criteria multiplied by their respective weights, the system will have a numerical probability of whether the cat is shown in the image in question or not. However, in practice, the more difficult the required result is, the greater the need for intermediate iterations.
The learning of this neural network is done through a sequence of synaptic weights for each layer of neurons. Even for the basic single layer network, it is clear how hard this task is: initially the weights are set to a certain start value, the signals go through the network, it produces a result on the effectors, and it is compared to the template. If there is a difference, the weights are changed and the full cycle of weighted summations is repeated. By what principle is the weight change in each layer for new iteration a question is separate; the most important thing is that each step requires a large number of multiplication and addition operations.
As a result, a well-developed AI system operates on hundreds of interconnected layers, with each combination of connections producing weighted summation times over time, all of which is by continuously pumping giant data streams across the inner tyre: from the OZN to the CDC and back. True smart computers like Deep Blue, the first-time chess player, or AlphaGo, who has won the world's game champion, are consumed 200-300 kW, while their opponents use the same old good biological 20-W brain. If mankind is actually planning to increasingly apply artificial intelligence to its daily tasks, much less energy-intensive methods of weighted summation will have to be invented.
♪ Better than a cache ♪
Strictly speaking, modern processors already have direct, non-involved data tyre access to limited memory. This is a processor cache, which is now multi-layered and quite extensive. By the way, graphic adapters are so well suited to I.I. tasks that their composition and computational nuclei are huge, and they are almost directly linked to the impressive amount of operational memory. However, in practice, for machine learning with a focus on really complex tasks, it is not enough.
Today, the most appropriate way to overcome Von Neuman's bottleneck is for a number of IT engineers to carry out calculations directly in memory, rather than for various types of DRAMs, in large part because the storage of intermediate data in NVM will not be energy-intensive.
Okay; but this is actually the computation, if it's just about moving processed data from DRAM to energy independent memory? The whole thing is that the NVM module is basically a rectangular matrix of cells, an electrical charge storage facility that allows directly to model nodes, compounds and, above all, weights for weighted summation. And the operations that are being performed on this matrix are actually like -- an analogy -- the work of a biological neural network. Not a virtual simulation of it at the programme level, but this is extremely important.
One of today's similar companies, Mythic, uses the NVM module, directly integrated with the computing core of the processor, to store the balance matrix of the current layer produced in machine learning operations — or several layers if the available capacity allows it. IBM is betting on memory cells with a phase shift to the balance matrix.
At the substantive level, CIM can be implemented on a variety of elemental bases, including magnetic resistive energy, which is particularly important for applications of today's active Internet of things.
♪ Make it happen ♪
Now, the most important thing is how this multiplication is done. If you're short, the vector of input disturbances is set up by a set of voltage levels for each of the entrance channels. The balance matrix is an electrical microstructure with a corresponding distribution of the knots, and now it's perfectly natural, like an analog computer, the calculation is to implement objective patterns -- in this case, the Oma Law for each of the disturbances going through the matrix structure of the resistors, but the summation of the weighted signals is done in accordance with the Kirhgoff rules to add currents to complex circuits.
The first stage of processing is shown on the left: the transmission of signals from each receptor to the neuron is represented by the application of the voltage to the corresponding horizontal tyre of the NVM module. The weights 1, 2, etc. for each synapse are encoded the resistance of the individual PCM element in the matrix.
The values of the charges received are fixed on capacitors. . Now the entrances on the left are given from the said synaptic capacitors and the PCM elements are assigned new resistance values corresponding to the weights of 1, 2, etc. The result is that the same weighted summation method, again using a complete analog pattern, without any number of operations, produces the final values of the effectors charge.
From a technical point of view, the NVM calculator is an intermediate in complexity between the memory module and the universal processor. For example, the M1076 Mythic analog matrix processor, created according to 40 nm production standards, contains 76 CMS-based matrices, all of which fit into the M2 expansion map format.
The application of such a chip, together with the NVIDIA Jetson Xavier NX module, a typical choice in the construction of a modern digital neural network, allows, according to the developer's representatives, to increase the effective productivity of the NVIDIA computing unit two to three times. A full-scale PCIe and 15 analog matrix processors on board promise a capacity of up to 400 TOPS with the ability to set up up up to 1.28 billion synaptic weights and consumes no more than 75 W. The similar course is moving IBM: it is already testing an analog NVM computation based on memory elements with a phase transition, the size of which has a working matrix of 784 x 250 elements, and it is already possible to launch its II mission on this device directly online.
The energy-efficient AI in an analog hardware facility can literally revive many elements of the Internet of things, industrial and household, which urgently need the ability to analyse what is happening in real time and form an adequate response: from video surveillance cameras to autonomous transport and drones. Experts estimate that already available analog computing will reduce energy consumption by 10 times in neuromorphic calculations; especially attractive in terms of the current state of the IT industry is the focus on mature technological processes, which makes it possible not to shut down the production of conventional microprocessors on boulder logic for them, and ultimately create high-profile digital computer complexes in a wide range of parameters.