Opinion Volume 3 Issue 3
1Dept. of Biomedical Engineering, Catholic University of America, USA
2Parian LLC, Saratoga, USA
3Netflix Corporation, Los Gatos, USA
Correspondence: Department of Biomedical Engineering, The Catholic University, Wash DC, USA
Received: May 07, 2019 | Published: May 9, 2019
Citation: Szu HH, Ho P, Sun Y. Unified learning rules based on different cost functions. MOJ App Bio Biomech. 2019;3(3):45-47. DOI: 10.15406/mojabb.2019.03.00102
There have been multiple waves of Artificial Intelligence (AI) research. The first wave was 6 decades ago, when Alan Turing posed the question: “Can machines think?” in his seminal 1950 paper.1 He ushered in the quest for Artificial Intelligence (AI). This quest followed many paths, with Prof. Frank Rosenblatt proposing and developing a single-layer Artificial Neural Net called a “Perceptron” that can now be found in the Smithsonian museum in Washington DC; unfortunately he passed away too early to further advance the work. In response to the perceptron, MIT Prof. Marvin Minsky proposed following a rule based system for computers following “If …, Then…” which set the direction for funding and progress in Artificial Intelligence.2 The 2nd Wave began in March 2017 when Google’s AlphaGo Brain beat Korean genius Lee Sedol in Go by 4-1. AlphaGo used supervised deep learning with labeled training data. Dr. Harold Szu and his collaborators have systematically developed a learning system since 2017 that emulates three intelligences found in the brain- logical IQ, emotional IQ, and Claustrum IQ (cf. Compilation Book from Open Journal by “MedCrave Bionics & Biomechanics”). The following is a new contribution to answer common important questions from the two accompanied authors about what is the essential difference between Artificial Neural Networks using supervised learning via Least Mean Squares and unsupervised learning via Minimum Free Energy. The quick answer: no difference; but the devil, if any, is in the details.
We believe that the machine learning ability of the n-th waves of Artificial Intelligence (AI) will eventually approach the Darwinian animal survival level of Natural Intelligence (NI), as n>4.We observe animals satisfying the sufficient conditions of “having homeostatic brains at constant temperature regardless of the external environment, and equipped with the power of paired sensors” shall exhibit NI at the survival level. For example, Homo sapiens have 5 pairs of sensors: two eyes, two ears, two nostrils, two sides of tongues, two sensing hands. We believe that this an adaptive trait for fast pre-processing for survival, i.e., “when the sensors agree, there is a signal; when the sensors disagree, it is noise” In this short communication, we shall show that NI follows a minimum free energy cost function for learning rule derived from thermodynamics, rather than the classical least means square (LMS) cost function derived from statistics that has previously organized many machine learning systems. The performance cost function will be the essential difference. We begin with the following summary theorem:
Theorem of minimum free energy for natural intelligence
Unsupervised learning based on Minimum Free Energy may be derived from the first two laws of thermodynamics. The second law defines the change of heat energy to be proportional to the change of Boltzmann entropy and the proportional constant is the Kelvin absolute temperature.
ΔQ = T0 ΔSΔQ=T0ΔS (1)
Then we can begin with the definition of entropy of Ludwig Boltzmann (as formulated by Max Planck)3
Stot≡kB Log Wtot Stot≡kB Log Wtot (2)
Wtot=exp(StotkB )=exp(StotT°kB T°)=exp((Sres+Sbrain)T°kB T°)≡exp(−HbrainkB T°)Wtot=exp(StotkB )=exp(StotT∘kB T∘)=exp((Sres+Sbrain)T∘kB T∘)≡exp(−HbrainkB T∘) (3)
We define as free energy of the brain, or the useful energy of the brain, total energy less thermal energy.
Hbrain≡Ebrain−T0SbrainHbrain≡Ebrain−T0Sbrain (4)
Derivation
Where Hbrain ≡ Ebrain − T0SbrainHbrain≡Ebrain−T0Sbrain
Now we must move to the anatomy of brain neural physiology. Our brains have approximately 10 billion neurons which have sigmoid-threshold output firing rates (While the sigmoid is linear near the threshold, it becomes nonlinear saturating away from the threshold). Neurons are represented by the following model (Figure 1):
yi=σ(Di);Di≡∑j[Wi,j]xj≡[Wi,α]xαyi=σ(Di);Di≡∑j[Wi,j]xj≡[Wi,α]xα (5)
and 100 billion of neuroglial cells working in the g-lymph system in our brains. Our unsupervised learning rule requires their symbiotic collaboration as follows (Figure 2): We shall mathematically introduce the A.M. Lyapunov control theory of monotonic convergence4,5 as a constraint to our model of brain free energy.
Two neuronal input/output (I/O) states must be normalization with a norm that turns out to be the sigmoid logic. This is consistently obtained from the canonical probability of usable brain energy as follows:
Inputnorm=exp(−βHinputbrain)exp(−βHinputbrain)+exp(−βHoutputbrain)=11+exp(−β(Houtputbrain−Hinputbrain))≡σ(x)Inputnorm=exp(−βHinputbrain)exp(−βHinputbrain)+exp(−βHoutputbrain)=11+exp(−β(Houtputbrain−Hinputbrain))≡σ(x)Where HI/Obrain≡EI/Obrain−STo;HI/Obrain≡EI/Obrain−STo; ;β≡1kBToβ≡1kBTo;
for 27oC+273oK=300oK=140eV27oC+273oK=300oK=140eV
Let’s consider “calcium ions” used in the communication vehicles among neurons (repuling one another like “ducks” walking & quacking across the axon road (but ushered in a line-up (by ten times more and ten times smaller house cleaning) neuralgia cells))
y=σ(x)=11+exp(−x)≡φ(x)=−φ′φ=−dlogφ(x)dxCalcium ions dydx=y2−y lHS=dσdx=−φ″φ+(φ′φ)2=RHS=(φ′φ)2+φ′φ φ′=−φ″Streaming term is set zero at the wave front of diffusion of calcium ions. We have derived Albert Einstein Diffusion Equation: φt=φ″
Infamous San Francisco Fire with Smokes Diffusion. Cf.4
ΔHbrainΔt=ΔHbrainΔ[Wi,j]Δ[Wi,j]Δt=−(Δ[Wi,j]Δt)2≤0 (6)
If & only if the following learning rule is true will learning exponentially converge:
Δ[Wi,j]Δt=−ΔHbrainΔ[Wi,j] (7)
We introduce the Dendrite sum of the output firing rates as follows
Di≡[Wiα]yα; thus ΔDiΔ[Wi,j]=yj (8)
ΔHbrainΔ[Wi,j]=ΔHbrainΔDiΔDiΔ[Wi,j]≡−giyj (9)
Canadian neurophysiologist Donald O. Hebb observed 5 decades ago that the rule of changing the synaptic weight matrix is “Neurons that fire together wire together,”6
Which defines the Neuroglia cells to be the negative slope as the ΔHbrain≤0
gi≡−ΔHbrainΔDi (10)
In the standard PDP Book4 Prof. Geoffrey Hinton (formerly Canada Univ. Toronto, now Prof. at Google Silicon Valley as Chief Scientist (Protégé Yashua Bengio7)) gives Backward Error Propagation supervised learning as:
So that the positive learning synaptic weight matrix becomes the error energy slope,
Δ[Wi,j]Δt=−ΔHbrainΔ[Wi,j]=giyjΔt (12)
new[Wi,j]=old [Wi,j]+Δ[Wi,j] (13a)
Δ[Wi,j] ≅Δt giyj (13b)
This is either unsupervised minimum free energy (MFE) or supervised least mean square (LMS) learning our brain models.
None.
©2019 Szu, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.