Computational Cognitive Neuroscience
Course Notes of Computational Cognitive Neuroscience by Prof. 鄭士康.
Course Information
- Lecturer: 鄭士康
- Time: Fri. 789
- Location: EE2-146
- Homeworks:
- HW1: 10/25 (Topic free)
- HW2: 11/29
- Group presentation: 1/22
- Handouts (G suite): https://drive.google.com/drive/u/1/folders/1GOcrwV9lR5Xk4bqPCXhxLc600DXRYl4K
Central Questions
- Could machine perceive and think like humans?
- Turing test
- Stimuli -> acquire -> store -> transform (process, emotion) -> recall -> response (actions)
Cognitive Psychology
- Assumption: materialism: mind = brain function
- Later became Cognitive Neuroscience
- Models: Box and arrow -> Computational (mechanistic) vs Statistical model
- Neuronal network connections
Artificial intelligence
- Reductionism
- Search space of parameters
- General problem solver
- Expert systems (symbol and rule-based)
- Symbol processing ≢ intelligence (Chinese room argument)
- Does the machine really know semantics from the symbols and rules?
- Mimicking biological neural networks (H&H neuron model) -> spiking neuron network & Hebbian learning
- Perceptron : Limitations by Minsky (unable to solve XOR problem) -> 1st winter of AI
- Multilayer and backpropagation: connectionism
- Parallel distributed processing (1986): actually neural networks (a taboo by then)
- Convolution neuronal networks (CNNs)
- Computer vision
- Similar to image processing in the visual cortex
- Decomposition of features: stripes, angles, colors, etc.
- Does intelligence emerge from complex networks?
- Dynamicism
- embodied approach
- Feedback system
- Systems of non-linear DEs
- Cybernetics: control system for ML (system identification)
- Bayesian approach : pure statistics regardless of underlying mechanism
Biological plausibility
- Low = little similarity to biological counterpart
- e.g. expert systems
- CNN: medium BP
- SpiNNator and Nengo: high BP
Levels (scales) of nervous system
- Focused on mesoscopic scale (neurons and synapses) in this course
Building a brain with math models
Why?
Feymann: What I cannot create, I do not understand.
- Understanding brain functions -> health (AD, PD, HD)
- AI modeling and applications
3D brain structure
The scale of brain models
- Neuron
- Small clusters of neurons
- Large scale connections (connectomes)
Neuron biology
- dendrite
- soma
- axon and myelin sheath
Hodgkin and Huxley model (1952)
- Math model from recordings of squid giant axon
- Action potential
- Biophysically accurate, but harder to do numerical analysis
- Chance and Design by Alan Hodgkin
Derived models
- Simpler models with action potentials and multiple inputs
- Leaky, Integrated and Fire model (LIF model)
- LEBRA: single equation for a neuron, no spatial components
- Compartment model of dendrite, soma, and axon.
- Delay effect (+)
- Discretization of the partial differential equation (PDE) model
- Could Delayed Differential Equations (DDEs) used in this context?
- Data (from fMIR, DTI, …) rich and theory poor
- Large-scale models (connectomes)
- Neuromorphic hardware
NEF (Neural Engineering Network) & SPA (Semantic Pointer Architecture)
Semantic Pointer
Semantics important for both symbolic and NN models
Example : autoencoder
- Dimension reduction layer by layer (raw data -> symbols)
- Similar to visual cortex and associative areas
- Reverse the network to adjust the weights
- Loss = predicted - input
Spaun model: Autoencoders to process multiple sensory inputs as well as motor functions and decision making (transformation, working memory, reward, selection).
Ewert’s Question: How is neural activity coordinated, learned and controlled?
- Capturing semantics
- Encoding syntactic structures
- Controlling information flow?
- Memory, learning?
Embodied semantics
- Neural firing patterns
- High dimensional vector symbolic architectures
Working memory
- 7 +/- 2 items, with highest recall for the 1st and the last item
Spike-Timing-Dependent plasticity (STDP)
- non-linear mapping for learning through synapses
Spiking models
- Keywords: spike firing rate, tuning curves, *Poisson models
- Adrian’s frog leg test: loading induced spikes in the sciatic nerve
- Stereotyped signals = spikes
- Firing rate is a function to stimuli
- Fatigue (adaptation) over time
Neural responses
Raster plot: dot = one spike. x: time; y: neuron id
Firing rate histogram: x: time; y: # of spikes
Neural signal response: with Dirac delta function (signal processing?)
Individual spikes -> Firing rates (in Hz) with a windows (moving average)
Similar to pulse density modulation (PDM)
Tuning curve
- x: stimuli trait; y: response
- e.g. visual cortical neuron response to line orientation
- Present in both sensory and motor cortices
Poisson process for spike firing
- Poisson process: a random process with constant rate (or average waiting time).
- The probability
Pwithnevents fired in a periodTgiven a firing ratercould be expressed by:
Rate code v.s. temporal code
- Dense firing for the former, sparse firing for the latter
- Population code (a group of neurons firing)
Encoding / decoding
- encoding: stimuli $x(t)$ -> spikes $\delta (t-t_i)$
- decoding: spikes $\delta (t-t_i)$ -> interpretation of stimuli $\hat x(t)$
Neural Physiology
- Neuron: dendrites, soma, axon
- Synapses: neurotransmitter / electrical conduction
- AP from axon => Graded potential in dendrite / soma
- Temporal / spatial summation of graded potential: AP in axial hillock
Excitable membrane
- Phospholipid bilayer (plasma membrane) as barrier
- Integral / peripheral proteins: ion carriers and channels
- Selected permeability to ions: Na / K gradients
Action potential
- Voltage-gated Na channel: both positive and negative feedback (fast)
- Voltage-gated K channel: negative feedback (slow)
- Leaky chloride channel: helping maintaining resting potential (constant)
- Refractory period (5 ms): available Na fraction is too low for AP
- Nodes of Ranvier and myelin sheath: accelerates AP conduction
Neurotransmitters
- Signaling molecules in the synaptic cleft
- AP -> Ca influx -> vesicle release -> receptor binding -> graded potentials (EPSP/IPSP) -> recycle / degradation of neurotransmitters
Neural models
- Features to reproduce: Integrating input, AP spikes, refractory period
Electrical activity of neurons
- Nernst equation for one species of ion across a semipermeable membrane
- GHK voltage equation for multiple ions
- Quasi-ohmic assumption for ion channels $I_x = g_x (V_m-E_x)$
- Membrane as capacitor (1 $\mu F/ cm^2$)
- Equivalent circuit: An RC circuit
HH model
- GHK voltage equation not applicable (not in steady state)
- Using Kirchhoff’s current law to get voltage change over time
- Parameters from experiments on the squid giant axon
- K channel: gating variable n
α and β are determined by voltage (membrane potential)
Na channel: two gating variables, m and h
αs and βs are determined by voltage (membrane potential)
Considerations
- Model fidelity (biological relevance) vs simplicity (ease to simulate and analyze)
- Biological plausibility
Dynamic system theory
A system of ODEs
e.g. Butterfly effect (chaos system): small deviation of initial conditions > huge different results
Morris-Lecar neuron model
- Similar to the HH model (KCL)
- Ca, K, and Cl ions
- two state variables: voltage (V) and one variable (w) for K
- using tanh and cosh functions
Phase plane analysis
- Stability: Eigenvalues of rhs Jacobian matrix in the steady-state
- External current (Ie) = 0: single stable steady-state (intersection of V and w nullclines)
- Increasing Ie: shifting V null cline => unstable steady-state (limit cycle)
- Bifurcation: V vs Ie
Integrate and fire (IF) model
- A simple RC circuit
- Single state variable (V)
- Use of conditional statements to control spiking firing and refractory period
- Used in nengo (plus leaky = LIF model)
- Firing rate adaption: IF model + more terms
Izhikevich model
- Two state variables
- Realistic spike patterns by adjusting parameters
- Could be used in large systems (100T synapses)
Compartment model
- Spatial discretization for neuron models
- Coupled RC circuits -> FEM grids
Filters
- Presynaptic AP -> synapse neurotransmitter release -> Postsynaptic potentials
- Approximated by an LTI(linear, time invariant) system
- Linear: superposition
- Time invariant: unchanged with time shifting
- Impulse response: given a impulse (delta function) -> h(t), transformed results
- Convolution: h(t) instead of the system itself
- Fourier transform: Convolution -> multiplication
Synapse model
- Synapse = RC low pass filters with time scale = $\tau$
- $\tau$ is dependent on types of neurotransmitter and receptors
Intro to brain
Prerequisite
- Simple linear algebra (vector and matrix operations)
- Graph theory: connections
Reverse engineering the brain
engineeringchallenges.org- Complexity, scale, connection, plasticity, low-power
- Design: brain scheme; designer: natural selection
Why a brain
- To survive and thrive.
- Brainless (single-celled organisms): simple perceptions and reactions. Some endogenous activity
- Simple brain (C. elegans): aversive response and body movement
- Connectome routing study (as in EDA) showed 90% of the neurons are in the optimal positions
- General scheme: sensory -> CNS -> motor (with endogenous states (thoughts) in the CNS)
Design constraints
- Information theory (information efficiency)
- Energy efficiency
- Space efficiency
- Human brain is already relatively larger than almost all animals
Evolution of the brain in Cordates
- Dorsal neural tube -> differentiation respecting sensory, motor, and inter connections
Central pattern generator
- The brainless walking cat: endogenous activity in the spinal cord
- Main functioN unit in the CNS
nengo programming
Classes
- Network: model itself
- Node: input signal
- Ensemble: neuronss
- Connection: synapses
- Probe: output
- Simulator: simulator (literally)
Integrator implementation
- Similar to the Euler method in numerical integration
import matplotlib.pyplot as plt
import nengofrom nengo.processes
import Piecewise
# The model
model = nengo.Network(label='Integrator')
with model:
# Neurons representing one number
A = nengo.Ensemble(100, dimensions=1)
# Input signal
src = nengo.Node(Piecewise({0: 0, 0.2: 1, 1: 0, 2: -2, 3: 0, 4: 1,5: 0}))
tau = 0.1
# Connect the population to itself
# transform: transformation matrix
# synapse: time scale of low pass filter
nengo.Connection(A, A, transform=[ [1] ], synapse=tau)
nengo.Connection(src, A, transform=[ [tau] ], synapse=tau)
input_probe = nengo.Probe(src)
A_probe = nengo.Probe(A, synapse=0.01)
# Create our simulator
with nengo.Simulator(model) as sim:
# Run it for 6 seconds
sim.run(6)
# Plot the decoded output of the ensemble
plt.figure()
plt.plot(sim.trange(), sim.data[input_probe], label="Input")
plt.plot(sim.trange(), sim.data[A_probe], 'k', label="Integrator output")
plt.legend();
plt.show()Oscillator implementation
Harmonic oscillator: one 2nd order ODE -> two 1st order ODEs
nengo:
import matplotlib.pyplot as plt
import nengo
from nengo.processes import Piecewise
# Create the model object
model = nengo.Network(label='Oscillator')
with model:
# Neurons representing 2 numbers (dim = 2)
neurons = nengo.Ensemble(200, dimensions=2)
# Input signal
src = nengo.Node(Piecewise({0: [1, 0], 0.1: [0, 0]}))
nengo.Connection(src, neurons)
# Create the feedback connection. Note the transformation matrix
nengo.Connection(neurons, neurons, transform=[ [1, 1], [-1, 1] ], synapse=0.1)
input_probe = nengo.Probe(src, 'output')
neuron_probe = nengo.Probe(neurons, 'decoded_output', synapse=0.1)
# Create the simulator
with nengo.Simulator(model) as sim:
# Run it for 5 seconds
sim.run(5)
plt.figure()
plt.plot(sim.trange(), sim.data[neuron_probe])
plt.xlabel('Time (s)', fontsize='large')
plt.legend(['$x_0$', '$x_1$'])
data = sim.data[neuron_probe]
plt.figure()
plt.plot(data[:, 0], data[:, 1], label='Decoded Output')
plt.xlabel('$x_0$', fontsize=20)
plt.ylabel('$x_1$', fontsize=20)
plt.legend()
plt.show()Connectivity analysis
- Structural: anatomical structures e.g. water diffusion via DTI
- Functional: statistic, dynamic weights
- Effective: causal interactions (presynaptic spikes -> postsynaptic firing)
- ref. 因果革命
Microscale vs Macroscale
- Microscale: um ~ nm (synapses)
- Macroscale: mm (voxels) coherent regions
Graph theory
- Node: brain areas (or neurons)
- Edges: connections (or synapses)
- Represented by adjacency matrices (values = connection weights)
Types of networks
- Nodes in a circle; Connections in an adjacency matrix
- Measure: degrees of a node (inward / outward) / neighborhood (Modularity Q, Small-worldness S)
Random
Same edge probability
Scale-free
- Power law
- Fractal
- Increased robustness to neural damage
Regular
- Local connections only
Modular
- hierarchial clusters
- Built by attraction and repulsion between nodes
- In some biological neural networks
Small world
- Similar to social networks, sparse global connections
- A few hubs (opinion leaders) with high degrees (connecting edges)
- Rich hub organization in biological neural networks (10 times the connections to the average)
- Anatomical basis (maximize space / energy efficiency)
Neural Engineering Framework (NEF)
- By Eliasmith
- Intended for constant structures without synaptic plasticity
- Compared to SNNs (with learning = synaptic plasticity)
- Neural compiler (high level function <=> low level spikes)
Central problems
- Stimuli detection (sensors)
- Representation / manipulation of information (sensory n.)
- As spikes (pulse density modulation = PDM)
- Recall / transform (CNS)
Heterogeneity in realistic neural networks
- Different set of parameters for each neuron in response to stimuli
- Represented as tuning curves
Building NEF models with nengo
- Hypothesis / data / structure from the real counterpart
- Build NEF and check behavior
- Rinse and repeat
Central NEF principles
Representation
- Action potential: digital, non-linear encoding (axon hillock)
- Graded potential: analog, linear decoding (dendrite)
- Compared to ANNs:
- dendrite = weighted sum from other neurons
- axon hillock: non-linear activation function (real number output)
- Examples: Physical values: heat, light, velocity, position
- mimicking sensory neurons = transducer producing pulse signals
Transformation of encoding information by neuron clusters
Neural dynamics for an ensemble of neurons
HH model, LIF, control theory
PS
- Neurons are noisy
- In the NEF: the basic unit is an ensemble of neurons
- Post synaptic current: approximated by one time constant
Neural representation
Encoding / decoding
- Ensemble = Digital-analog converter like digital audio processing
Symbols used when neural coding
- x: strength of external stimuli
- J(x): x-induced current
- $a(x) = G[J(x)]$: firing rate of spikes ≈ activation function in ANNs
- Most important parameters
- $J_{th}$ (threshold current)
- $\tau_{ref}$ (refractory period → maximal spiking rate)
Population encoding
A group of neurons determine the value by their spikes collectively. Contrary to sparse coding.
Some linear algebra
- Any vector could be decomposed as an unique linear combination of basis vectors
- The most convenient ones are orthogonal bases e.g. sin / cos in Fourier series
- The stimuli through the ensemble could be estimated from the linear combination of weights of neurons with different tuning curves
- Simplest : two neuron model (on and off)
- Adding more and more neurons differing in tuning curves (more bases) = more accurate representation
Optimal ensemble linear encoder
- Calculated by solving a linear system
- Nengo derives the best set of weights for an ensemble of neurons automatically
- Adding Gaussian noise in fact enhanced the robustness of the matrix of tuning curves
Example: horizontal eye position in NEF
- System description
- Max firing rate = 300 Hz
- On-off neurons
- Goal: linear tuning curve
- How neurons work in abducent motor neuron: an integrator
- Populations, noise, and constraints
- Solution errors associated to the number of neurons
- Noise error
- Static error
- Rounding error
Vector encoding / decoding
- Similar to the scalar case, but replaced with vectors
- Automatically handled by the nengo framework
Nengo examples
RateEncoding.pyArmMovement2D.py
Neural transformation
- Linear
- Non-linear
- Weighting: positive (excitatory) / negative (inhibitory)
Multiplication
- Controlled integrator (memory)
- ref:
Multiplication.py - Traditional ANN counterpart: Neural clusters A and B fully connected to combination layer, respectively
- Making a subnetwork: factory function
Communication channel
- Output of one ensemble => Input of another ensemble
- Traditional ANN counterpart: fully-connected layers
- $w_{ji} = \alpha_je_jd_i$
- nengo: simply
Connection(A, B)
Static gain c (multiplication with a scalar)
- $w_{ji} = c\alpha_je_jd_i$
- nengo:
Connection(A, B, transform=c)
Addition
- c = a + b
- nengo:
Connection(A, C); Connection(B, C) - Adding two vectors: just change
dimension
Nonlinear transformation
- nengo: define a vector transformation function
f=>Connection(A, B, function=f)
Negative weight
- An ensemble of inhibitory neurons
Neural dynamics
- Neural control systems: non-linear, time-variant (modern control theory)
Representation
- 1st order ODEs
- State variables as a vector
- $\mathbf{x}(t) = \mathbf{x}(t - \Delta t) + f(t - \Delta t, \mathbf{x}(t - \Delta t))$
- Example: cellular automata finite state machine (Game of life)
Linear control theory
u: input, y: output, x: internal states $\mathbf{\dot{x}}(t) = A \mathbf{\dot{x}}(t) + B \mathbf{u}(t)$ $\mathbf{y}(t) = C \mathbf{x}(t) + D \mathbf{u}(t)$
Frequency response and stability analysis
- Laplace transform $L{f(t)} = \int^\infty_0e^{-st}f(t)dt = F(s)$
- Impulse response: $h(t) = \frac{1}{\tau}e^{-t/\tau}, \ H(s) = \frac{1}{1 + s\tau}$. Stable (pole at the left half plane)
- Convolution in the time domain = multiplication in the Laplace (s-domain)
Neural population model
- Linear decoder for post-synaptic current (PSC)
- $A^\prime = \tau A + I$
- $B^\prime = \tau B$
Recurrent connections
- Positive feedback:
Feedback1.py - Negative feedback:
Feedback2.py(without stimuli),Feedback3.py(with stimuli) - Dynamics:
Dynamics1.pyandDynamics2.py: step stimuli + feedback - Integrators: $A = \frac{-1}{\tau} I$
- Oscillators: $A = \begin{bmatrix} 0&1 cr\ -\omega^2&0 \cr \end{bmatrix}$
Equations for different levels
- Nengo: higher level
- Implementation: lower rate / spiking levels
Sensation and Perception
Environment (stimulation) (analog signal) -> sensory transduction (feature extraction) -> impulse signal (sensory nerve) -> perceptions (sensory cortex) -> processing (CNS) -> action selection (motor cortex) -> impulse signal (motor nerve) -> acuator(e.g. muscle) -> action
Perception
- Internal representation of stimuli impulses
- The experience in the association cortex (not necessary the same as the outside world)
- Book: making uo the mind
Psychophysical
e.g. Psychoacoustics: used in MP3 compression
- Threshold in quiet / noisy environment
- Equal-loudness contour in different frequencies
- Weber’s law: change perceived in percent change $S = klg\frac{I}{I_0}$
Vision
- Convergence of information inside retina
- 260M photoreceptor cells indirectly connected to 2M ganglion (optic nerve) cells)
- Dimension reduction (pooling / convolution)
- Need of learning to see (mechanism of amblyopia): Neural wiring in the visual tract and the visual cortex (training of CNNs)
V1: primary visual cortex
- Detection of oriented edges, grouped by cortical columns with sensitivity to different angles
- Similar to the tuning curve in NEF
Successively richer layers
Optic nerve -> LGN (thalamus) -> V1 -> V2 / V4 -> dorsal (metric) or ventral (identification) tracks
- Feature extraction
- Similar to convolutional neural network (CNNs)
- Demonstrated in fMRI
Ventral track
- What is the object?
- V2 / V4 -> Post. Inf. temporal (PIT) cortex -> Ant. Inf. temporal (AIT) cortex
- PIT: More complex features e.g. fusiform face area for fast facial recognition
- AIT: Classification of objects regardless of size, color, viewing angle…
- Hyperdimensional vector (EECS) = semantic pointer (NEF)
- Neural ensemble of 20000 in monkeys
- Thus the functions of the temporal lobe = categorizing the world:
- Primary and associative auditory
- Labeling visual objects
- Language processing for both visual and auditory cues
- Episodic memory formation by hippocampus
Dorsal track
- Where is the object?
- V1 -> V2 -> V5 -> parietal lobe (visual association area)
- metrical information and mathematics
- Motion detection and information for further actions
Ambiguous figures / optical illusions
Forms 2 attractors (interpretations)
e.g Necker cube
Feedback
- External cue and expectation (top down perception)
- Report to LGN about the error
Object perception
- In biology: robust recognition despite color, viewing angle differences (object consistency)
- View-dependent frame of reference vs. View-invariant (grammar pattern) frame of reference
Autoencoders
Ewert’s central problems
- Perception: encoding stimuli from analog to digital spikes
- Central processing: transformation and recall of information, action selection
- Action execution: decoding digital spikes to response
Autoencoder in traditional ANNs
- Compressing the input into a smaller (dim.) representation then expand to the estimation
- Hyper dimension vector in CS
- Semantic pointer in NEF
- Novelty detection: comparison of the input to the output from trained autoencoder
Basic machine learning
- For y = f(x), find f
- Training, testing, validation sets
- Learning curves: overfitting if overtraining
- Cross validation to reduce overfitting and increase testing accuracy
- K-fold cross validation
- SVM: once worked better than ANNs
- Converting low dim but complex border to higher dim. simpler (even linear) border by transformation of data points
Classical cognitive systems (expert system)
- Symbols and syntax processing (LISP)
- Failed due to low BP (unable to solve to meaning of symbols)
- Another attempt: connectionist (semantic space) => too complex
- Symbol binding system: 500M neurons to recognize simple sentences (fail)
- Until the semantic pointer hypothesis: explaining high level cognitive function
- Halle Berry neurons (grandmother neurons): highly selective to one category instances (sparse coding)
- However most instances are population coding
Semantic pointer and SPA
- Equals to hyperdimensional vector in the mathematical sense
- Presented by an ensemble of neurons in biology
- The semantic space (hyperdimensional space) holds information features
- Needs enough dimensions for the overwhelming number of concepts in the world
- Pointers = symbols = general concepts
- Indirect addressing of complex information
- Shallow and deep manipulation (dual coding theory)
- Efficient transformation (call by address)
- Shallow semantics (e.g. text mining): symbols and stats only, does not encode the meaning of words
- Nengo:
nengo-spa
Encoding information in the semantic pointer
Circular convolution for syntax processing
- Readily extract the information in SP after filtered some noise
- Does not incur extra dimensions
- Works on reals numbers (XOR works on binaries only)
- Solves Jackendoff’s challenges
- Binding problem : red + square vs green + circle
- Problem of 2: small star vs big star
- Problem of variable: blue fly (n.) vs. blue fly(v.): binding restrictions
- Binding in working memory vs long-term memory
One could combine multiple sources of input (word, visual, smell, auditory)
Action control
Behavioral pattern / coordination
Affordance competition hypothesis
- Affordance part: continuously updating the status
- Competition part: select best action by utility (spiking activity) In biology:
- Premotor / supplementary motor cortex
- Weighted summation of previously learned motor components (basis functions) -> desired movement
- Primary motor cortex
- Basal ganglia
- Caudate, putamen, globus pallidus, SN
- Excitation and inhibitory projections
- Dopaminergic neurons: reward expectation: reinforcement learning
- Movement initiation
- Direct, indirect, and hyperdirect pathways
- Cerebellum
- Learning and control of movements
- Error-driven (similar to back propagation): supervised learning
- Hippocampus: self-organizing (Hebbian, STDP): unsupervised learning
Neural optimal control hierarchy (NOCH)
Computational model by students of Eliasmith, including:
- Cortex (premotor)
- cerebellum
- basal ganglia
- motor cortex
- brain stem and spinal cord
Performing movement in robot arms
- Joint angle space [θ1, θ2, …]: degree of freedom
- Operational space (end point vector)
High level -> mid level -> low level control signals
Similar to the latter half of autoencoder.
Functional level model
Loop of
- Cortex: memory / transformations, crude selection
- Basal ganglia: utility -> action (cosine similarity)
- Thalamus: monitoring
Rules for manipulation
- Symbols, fuzzy logic, but not compatible to neural networks
- Basal ganglia: manipulation
- Rehearsal of alphabet
sequence.py
Attention
Timing of neuron’s response: ~15ms delay to make decision.
The less utility difference, the longer the latency.
- Parametric study on computational models
Tower of Hanoi task
- Perceptual strategy from symbolic calculation is not biologically plausible in Eliasmith paper (not learning the rule).
- 150k neurons
ACT-R architecture
Symbol -> neural networks
Comparative to fMRI BOLD signal.
Learning and memory
Ref: Neuroeconomics, decision making and the brain.
Learning: stimulus altered behavior. Not hardwired.
Memory: storage of learned information.
Learning in biology
- Neural level: synapse strength, neural gene expression
- Brain regions: coordination
Machine learning
- Weight changes in synaptic connections
- Neural activity states: dynamic stability (attractor)
Biological memories in detail
- Declarative (explicit) memory: medial temporal lobe and neocortex
- Events (episodic): 5W1H, past experience
- Facts (semantic): grammar, common sense (context-free)
- Non-declarative memory
- Procedural: basal ganglia
- Perceptual priming: short path for recall for previous stimuli
- Conditioning: cerebellum
- Non-associative: reflex
- Sensory memory: buffer
- 9-10 sec for schoic (hearing)
- 0.5 sec for iconic (vision)
Conditioning
- Pavlov’s dog: classical conditioning
- Skinner: operant conditioning
- Acquisition, extinction, spontaneous recovery (long-term memory)
Terms
- Memory: recall / recognize past experience
- Conditioning: associate event and response
- Learning: change behavior to stimuli
- Plasticity: change neural connections
- Functional: chemical connection change
- Structural: physical connection change
Hippocampus
Dentate gyrus -> CA3 -> CA1
- Long-term potentiation (LTP) upon high freq stimulation: enhances EPSP
- Long-term depression (LTD) upon los freq stimulation: inhibits EPSP
- Neural growth even at 40 y/o
Inside LTP / LTD
Neurotransmitters
- Glutamate (AMPAR, NMDAR) : excitatory
- GABA: inhibitory
Second messengers (mid-term effects)
Learning rules
Hebbian
Freud -> Hebb (1949): fire together, wire together
$ \Delta w = \epsilon\gamma_i\gamma_j $
$\epsilon$: learning rate
$\gamma_i$: postsynaptic firing rate
$\gamma_j$: presynaptic firing rate
STDP
- Spike-time-dependent plasticity from experimental data
- Pre synaptic spike then post one: LTP
- Post synaptic spike then pre one: LTD
hPES rule
Limitations on weight change
Reinforcement learning
E.g. operant conditioning (Skinner)
Value
- Expected value $E[ x ]$
- Expected utility $U(E[ x ]) \approx log(E[ x ])$
- Basic axiomatic form (Pareto)
- Weak axioms of revealed preference (WARP)
- Generated axioms of revealed preference (GARP)
Value function V(s) and prediction error
$V_{k+1}(s_k) = (1-\alpha)V_k(s_k) + \alpha\delta_k$
Error: $\delta_k = r_k - V_k(s_k)$
For multiple stimuli: Rescorla-Wagner model
$V_k^{net} = \Sigma V_{k}(stim)$
Biological RL
Dopamine reward pathway for movement and motivation.
Increased dopamine secretion for a sudden reward. The same as Error: $\delta_k = r_k - V_k(s_k)$
Decision making
Problem: no immediate feedback (reward) => need to think about the future and maximize aggregate reward
Bellman equation: reduction of recursive reward with temporal difference ($V_k(S_{t+1})- V_k(S_t)$)
$V(S_t) = r(S_t) + E[V(S_{t+1})|S_t]$
$\delta_t = r_t + V_k(S_{t+1})- V_k(S_t)$
Markov decision process
Q learning
- Q function $Q(s, \pi)$
- Policy $\pi(s)$: mapping state to actions
$Q_{t+1}(S_t, a_t) = Q_{t}(S_t, a_t) + \alpha\delta_t$
$\delta_t = r_t \gamma_{max}Q_{t+1}(S_t, a_t) - Q_{t}(S_t, a_t)$
SPAUN model
SPAUN = Semantic pointer architecture unified network, all things put together
- Single perceptual system (eye)
- Single motor system (arm)
- Background knowledge (SPA)
- Abilities
- Similar to human in working mem limitations (3-7)
- Behavior flexibility
- Adaptation to reward
- Confusion to invalid input