There have been multiple waves of Artificial Intelligence (AI) research. The first wave was 6 decades ago, when Alan Turing posed the question: “Can machines think?” in his seminal 1950 paper.1 He ushered in the quest for Artificial Intelligence (AI). This quest followed many paths, with Prof. Frank Rosenblatt proposing and developing a single-layer Artificial Neural Net called a “Perceptron” that can now be found in the Smithsonian museum in Washington DC; unfortunately he passed away too early to further advance the work. In response to the perceptron, MIT Prof. Marvin Minsky proposed following a rule based system for computers following “If …, Then…” which set the direction for funding and progress in Artificial Intelligence.2 The 2nd Wave began in March 2017 when Google’s AlphaGo Brain beat Korean genius Lee Sedol in Go by 4-1. AlphaGo used supervised deep learning with labeled training data. Dr. Harold Szu and his collaborators have systematically developed a learning system since 2017 that emulates three intelligences found in the brain- logical IQ, emotional IQ, and Claustrum IQ (cf. Compilation Book from Open Journal by “MedCrave Bionics & Biomechanics”). The following is a new contribution to answer common important questions from the two accompanied authors about what is the essential difference between Artificial Neural Networks using supervised learning via Least Mean Squares and unsupervised learning via Minimum Free Energy. The quick answer: no difference; but the devil, if any, is in the details.
We believe that the machine learning ability of the n-th waves of Artificial Intelligence (AI) will eventually approach the Darwinian animal survival level of Natural Intelligence (NI), as n>4.We observe animals satisfying the sufficient conditions of “having homeostatic brains at constant temperature regardless of the external environment, and equipped with the power of paired sensors” shall exhibit NI at the survival level. For example, Homo sapiens have 5 pairs of sensors: two eyes, two ears, two nostrils, two sides of tongues, two sensing hands. We believe that this an adaptive trait for fast pre-processing for survival, i.e., “when the sensors agree, there is a signal; when the sensors disagree, it is noise” In this short communication, we shall show that NI follows a minimum free energy cost function for learning rule derived from thermodynamics, rather than the classical least means square (LMS) cost function derived from statistics that has previously organized many machine learning systems. The performance cost function will be the essential difference. We begin with the following summary theorem:
Theorem of minimum free energy for natural intelligence
Unsupervised learning based on Minimum Free Energy may be derived from the first two laws of thermodynamics. The second law defines the change of heat energy to be proportional to the change of Boltzmann entropy and the proportional constant is the Kelvin absolute temperature.
(1)
Then we can begin with the definition of entropy of Ludwig Boltzmann (as formulated by Max Planck)3
(2)
(3)
We define as free energy of the brain, or the useful energy of the brain, total energy less thermal energy.
(4)
Derivation
- From the first law of thermodynamics, conservation of energy between the environmental thermal reservoir heat energy kept at the temperature and brain internal energy Thus, when we integrate and drop the constant, we have
- We have arrived at
Where
Now we must move to the anatomy of brain neural physiology. Our brains have approximately 10 billion neurons which have sigmoid-threshold output firing rates (While the sigmoid is linear near the threshold, it becomes nonlinear saturating away from the threshold). Neurons are represented by the following model (Figure 1):
(5)
and 100 billion of neuroglial cells working in the g-lymph system in our brains. Our unsupervised learning rule requires their symbiotic collaboration as follows (Figure 2): We shall mathematically introduce the A.M. Lyapunov control theory of monotonic convergence4,5 as a constraint to our model of brain free energy.
Appendix A: Derivation of Sigmoid Logic
Two neuronal input/output (I/O) states must be normalization with a norm that turns out to be the sigmoid logic. This is consistently obtained from the canonical probability of usable brain energy as follows:
Where
;;
for
Let’s consider “calcium ions” used in the communication vehicles among neurons (repuling one another like “ducks” walking & quacking across the axon road (but ushered in a line-up (by ten times more and ten times smaller house cleaning) neuralgia cells))
Streaming term is set zero at the wave front of diffusion of calcium ions. We have derived Albert Einstein Diffusion Equation:
Infamous San Francisco Fire with Smokes Diffusion. Cf.4
(6)
If & only if the following learning rule is true will learning exponentially converge:
(7)
We introduce the Dendrite sum of the output firing rates as follows
(8)
(9)
Canadian neurophysiologist Donald O. Hebb observed 5 decades ago that the rule of changing the synaptic weight matrix is “Neurons that fire together wire together,”6
Which defines the Neuroglia cells to be the negative slope as the
(10)
In the standard PDP Book4 Prof. Geoffrey Hinton (formerly Canada Univ. Toronto, now Prof. at Google Silicon Valley as Chief Scientist (Protégé Yashua Bengio7)) gives Backward Error Propagation supervised learning as:
So that the positive learning synaptic weight matrix becomes the error energy slope,
(12)
(13a)
(13b)
This is either unsupervised minimum free energy (MFE) or supervised least mean square (LMS) learning our brain models.
We have reviewed the fundamentals of artificial neural network (ANN) modeling of Biological Neural Networks (BNN) that may yield an ANN that can compete on the basis of Charles Darwin’s “survival of the fittest.” We found out for weakly nonlinear systems there is only one sequential learning rule. Thus the Natural Intelligence (NI) or Artificial Intelligence (AI) share a similar learning rule even though they are derived from different origins - the former from thermodynamics and the latter from statistics. This might answer Prof. Yann LeCun
7 of NYU Courant Institute in his Youtube Lecture, or Prof. Andrew Ng of Stanford teaching Deep Learning at the commercial online school Coursera. The only difference in the methods is when to apply either- with a labeled dataset, use supervised learning and with an unlabeled dataset use unsupervised learning.
Funding received from Parian LLC.
None.