Computational Cognitive Neuroscience
Course Notes of Computational Cognitive Neuroscience by Prof. 鄭士康.
Course Information¶
- Lecturer: 鄭士康
- Time: Fri. 789
- Location: EE2-146
- Homeworks:
- HW1: 10/25 (Topic free)
- HW2: 11/29
- Group presentation: 1/22
- Handouts (G suite):
Central Questions¶
- Could machine perceive and think like humans?
- Turing test
- Stimuli -> acquire -> store -> transform (process, emotion) -> recall -> response (actions)
Cognitive Psychology¶
- Assumption: materialism: mind = brain function
- Later became Cognitive Neuroscience
- Models: Box and arrow -> Computational (mechanistic) vs Statistical model
- Neuronal network connections
Artificial intelligence¶
- Reductionism
- Search space of parameters
- General problem solver
- Expert systems (symbol and rule-based)
- Symbol processing ≢ intelligence (Chinese room argument)
- Does the machine really know semantics from the symbols and rules?
- Mimicking biological neural networks (H&H neuron model) -> spiking neuron network & Hebbian learning
- Perceptron : Limitations by Minsky (unable to solve XOR problem) -> 1st winter of AI
- Multilayer and backpropagation: connectionism
- Parallel distributed processing (1986): actually neural networks (a taboo by then)
- Convolution neuronal networks (CNNs)
- Computer vision
- Similar to image processing in the visual cortex
- Decomposition of features: stripes, angles, colors, etc.
- Does intelligence emerge from complex networks?
- Dynamicism
- embodied approach
- Feedback system
- Systems of non-linear DEs
- Cybernetics: control system for ML (system identification)
- Bayesian approach : pure statistics regardless of underlying mechanism
Biological plausibility¶
- Low = little similarity to biological counterpart
- e.g. expert systems
- CNN: medium BP
- SpiNNator and Nengo: high BP
Levels (scales) of nervous system¶
- Focused on mesoscopic scale (neurons and synapses) in this course
Building a brain with math models¶
Feymann: What I cannot create, I do not understand.
- Understanding brain functions -> health (AD, PD, HD)
- AI modeling and applications
3D brain structure¶
The scale of brain models¶
- Neuron
- Small clusters of neurons
- Large scale connections (connectomes)
Neuron biology¶
- dendrite
- soma
- axon and myelin sheath
Hodgkin and Huxley model (1952)¶
- Math model from recordings of squid giant axon
- Action potential
- Biophysically accurate, but harder to do numerical analysis
- Chance and Design by Alan Hodgkin
Derived models¶
- Simpler models with action potentials and multiple inputs
- Leaky, Integrated and Fire model (LIF model)
- LEBRA: single equation for a neuron, no spatial components
- Compartment model of dendrite, soma, and axon.
- Delay effect (+)
- Discretization of the partial differential equation (PDE) model
- Could Delayed Differential Equations (DDEs) used in this context?
- Data (from fMIR, DTI, ...) rich and theory poor
- Large-scale models (connectomes)
- Neuromorphic hardware
NEF (Neural Engineering Network) & SPA (Semantic Pointer Architecture)¶
Semantic Pointer¶
- Semantics important for both symbolic and NN models
- Example : autoencoder
- Dimension reduction layer by layer (raw data -> symbols)
- Similar to visual cortex and associative areas
- Reverse the network to adjust the weights
Loss = predicted - input
Spaun model: Autoencoders to process multiple sensory inputs as well as motor functions and decision making (transformation, working memory, reward, selection).
Ewert's Question: How is neural activity coordinated, learned and controlled?
- Capturing semantics
- Encoding syntactic structures
- Controlling information flow?
- Memory, learning?
Embodied semantics¶
- Neural firing patterns
- High dimensional vector symbolic architectures
Working memory¶
- 7 +/- 2 items, with highest recall for the 1st and the last item
Spike-Timing-Dependent plasticity (STDP)¶
- non-linear mapping for learning through synapses
Spiking models¶
- Keywords: spike firing rate, tuning curves, *Poisson models
- Adrian's frog leg test: loading induced spikes in the sciatic nerve
- Stereotyped signals = spikes
- Firing rate is a function to stimuli
- Fatigue (adaptation) over time
Neural responses¶
- Raster plot: dot = one spike. x: time; y: neuron id
- Firing rate histogram: x: time; y: # of spikes
- Neural signal response: with Dirac delta function (signal processing?)
\rho(t) = \Sigma{n=1}^N\delta(t - ti)
Individual spikes -> Firing rates (in Hz) with a windows (moving average)
Similar to pulse density modulation (PDM)
Tuning curve¶
- x: stimuli trait; y: response
- e.g. visual cortical neuron response to line orientation
- Present in both sensory and motor cortices
Poisson process for spike firing¶
- Poisson process: a random process with constant rate (or average waiting time).
- The probability
events fired in a periodT
given a firing rater
could be expressed by:
Rate code v.s. temporal code¶
- Dense firing for the former, sparse firing for the latter
- Population code (a group of neurons firing)
Encoding / decoding¶
- encoding: stimuli \(x(t)\) -> spikes \(\delta (t-t_i)\)
- decoding: spikes \(\delta (t-t_i)\) -> interpretation of stimuli \(\hat x(t)\)
Neural Physiology¶
- Neuron: dendrites, soma, axon
- Synapses: neurotransmitter / electrical conduction
- AP from axon => Graded potential in dendrite / soma
- Temporal / spatial summation of graded potential: AP in axial hillock
Excitable membrane¶
- Phospholipid bilayer (plasma membrane) as barrier
- Integral / peripheral proteins: ion carriers and channels
- Selected permeability to ions: Na / K gradients
Action potential¶
- Voltage-gated Na channel: both positive and negative feedback (fast)
- Voltage-gated K channel: negative feedback (slow)
- Leaky chloride channel: helping maintaining resting potential (constant)
- Refractory period (5 ms): available Na fraction is too low for AP
- Nodes of Ranvier and myelin sheath: accelerates AP conduction
- Signaling molecules in the synaptic cleft
- AP -> Ca influx -> vesicle release -> receptor binding -> graded potentials (EPSP/IPSP) -> recycle / degradation of neurotransmitters
Neural models¶
- Features to reproduce: Integrating input, AP spikes, refractory period
Electrical activity of neurons¶
- Nernst equation for one species of ion across a semipermeable membrane
- GHK voltage equation for multiple ions
- Quasi-ohmic assumption for ion channels \(I_x = g_x (V_m-E_x)\)
- Membrane as capacitor (1 \(\mu F/ cm^2\))
- Equivalent circuit: An RC circuit
HH model¶
- GHK voltage equation not applicable (not in steady state)
- Using Kirchhoff's current law to get voltage change over time
- Parameters from experiments on the squid giant axon
- K channel: gating variable n
α and β are determined by voltage (membrane potential)
- Na channel: two gating variables, m and h
g{Na} &= \bar g m^3h \cr
\frac{dm}{dt} &= \alpham - n (\alpham + \betam) \cr
\frac{dh}{dt} &= \alphah - n (\alphah + \betah) \cr
`αs and βs are determined by voltage (membrane potential)`
- Model fidelity (biological relevance) vs simplicity (ease to simulate and analyze)
- Biological plausibility
Dynamic system theory¶
A system of ODEs
e.g. Butterfly effect (chaos system): small deviation of initial conditions > huge different results
Morris-Lecar neuron model¶
- Similar to the HH model (KCL)
- Ca, K, and Cl ions
- two state variables: voltage (V) and one variable (w) for K
- using tanh and cosh functions
Phase plane analysis¶
- Stability: Eigenvalues of rhs Jacobian matrix in the steady-state
- External current (Ie) = 0: single stable steady-state (intersection of V and w nullclines)
- Increasing Ie: shifting V null cline => unstable steady-state (limit cycle)
- Bifurcation: V vs Ie
Integrate and fire (IF) model¶
- A simple RC circuit
- Single state variable (V)
- Use of conditional statements to control spiking firing and refractory period
- Used in nengo (plus leaky = LIF model)
- Firing rate adaption: IF model + more terms
Izhikevich model¶
- Two state variables
- Realistic spike patterns by adjusting parameters
- Could be used in large systems (100T synapses)
Compartment model¶
- Spatial discretization for neuron models
- Coupled RC circuits -> FEM grids
- Presynaptic AP -> synapse neurotransmitter release -> Postsynaptic potentials
- Approximated by an LTI(linear, time invariant) system
- Linear: superposition
- Time invariant: unchanged with time shifting
- Impulse response: given a impulse (delta function) -> h(t), transformed results
- Convolution: h(t) instead of the system itself
- Fourier transform: Convolution -> multiplication
Synapse model¶
- Synapse = RC low pass filters with time scale = \(\tau\)
- \(\tau\) is dependent on types of neurotransmitter and receptors
Intro to brain¶
- Simple linear algebra (vector and matrix operations)
- Graph theory: connections
Reverse engineering the brain¶
- Complexity, scale, connection, plasticity, low-power
- Design: brain scheme; designer: natural selection
Why a brain¶
- To survive and thrive.
- Brainless (single-celled organisms): simple perceptions and reactions. Some endogenous activity
- Simple brain (C. elegans): aversive response and body movement
- Connectome routing study (as in EDA) showed 90% of the neurons are in the optimal positions
- General scheme: sensory -> CNS -> motor (with endogenous states (thoughts) in the CNS)
Design constraints¶
- Information theory (information efficiency)
- Energy efficiency
- Space efficiency
- Human brain is already relatively larger than almost all animals
Evolution of the brain in Cordates¶
- Dorsal neural tube -> differentiation respecting sensory, motor, and inter connections
Central pattern generator¶
- The brainless walking cat: endogenous activity in the spinal cord
- Main functioN unit in the CNS
nengo programming¶
- Network: model itself
- Node: input signal
- Ensemble: neuronss
- Connection: synapses
- Probe: output
- Simulator: simulator (literally)
Integrator implementation¶
- Similar to the Euler method in numerical integration
import matplotlib.pyplot as plt
import nengofrom nengo.processes
import Piecewise
# The model
model = nengo.Network(label='Integrator')
with model:
# Neurons representing one number
A = nengo.Ensemble(100, dimensions=1)
# Input signal
src = nengo.Node(Piecewise({0: 0, 0.2: 1, 1: 0, 2: -2, 3: 0, 4: 1,5: 0}))
tau = 0.1
# Connect the population to itself
# transform: transformation matrix
# synapse: time scale of low pass filter
nengo.Connection(A, A, transform=[ [1] ], synapse=tau)
nengo.Connection(src, A, transform=[ [tau] ], synapse=tau)
input_probe = nengo.Probe(src)
A_probe = nengo.Probe(A, synapse=0.01)
# Create our simulator
with nengo.Simulator(model) as sim:
# Run it for 6 seconds
# Plot the decoded output of the ensemble
plt.plot(sim.trange(),[input_probe], label="Input")
plt.plot(sim.trange(),[A_probe], 'k', label="Integrator output")
Oscillator implementation¶
Harmonic oscillator: one 2nd order ODE -> two 1st order ODEs
import matplotlib.pyplot as plt
import nengo
from nengo.processes import Piecewise
# Create the model object
model = nengo.Network(label='Oscillator')
with model:
# Neurons representing 2 numbers (dim = 2)
neurons = nengo.Ensemble(200, dimensions=2)
# Input signal
src = nengo.Node(Piecewise({0: [1, 0], 0.1: [0, 0]}))
nengo.Connection(src, neurons)
# Create the feedback connection. Note the transformation matrix
nengo.Connection(neurons, neurons, transform=[ [1, 1], [-1, 1] ], synapse=0.1)
input_probe = nengo.Probe(src, 'output')
neuron_probe = nengo.Probe(neurons, 'decoded_output', synapse=0.1)
# Create the simulator
with nengo.Simulator(model) as sim:
# Run it for 5 seconds
plt.xlabel('Time (s)', fontsize='large')
plt.legend(['$x_0$', '$x_1$'])
data =[neuron_probe]
plt.plot(data[:, 0], data[:, 1], label='Decoded Output')
plt.xlabel('$x_0$', fontsize=20)
plt.ylabel('$x_1$', fontsize=20)
Connectivity analysis¶
- Structural: anatomical structures e.g. water diffusion via DTI
- Functional: statistic, dynamic weights
- Effective: causal interactions (presynaptic spikes -> postsynaptic firing)
- ref. 因果革命
Microscale vs Macroscale¶
- Microscale: um ~ nm (synapses)
- Macroscale: mm (voxels) coherent regions
Graph theory¶
- Node: brain areas (or neurons)
- Edges: connections (or synapses)
- Represented by adjacency matrices (values = connection weights)
Types of networks¶
- Nodes in a circle; Connections in an adjacency matrix
- Measure: degrees of a node (inward / outward) / neighborhood (Modularity Q, Small-worldness S)
Same edge probability
- Power law
- Fractal
- Increased robustness to neural damage
- Local connections only
- hierarchial clusters
- Built by attraction and repulsion between nodes
- In some biological neural networks
Small world¶
- Similar to social networks, sparse global connections
- A few hubs (opinion leaders) with high degrees (connecting edges)
- Rich hub organization in biological neural networks (10 times the connections to the average)
- Anatomical basis (maximize space / energy efficiency)
Neural Engineering Framework (NEF)¶
- By Eliasmith
- Intended for constant structures without synaptic plasticity
- Compared to SNNs (with learning = synaptic plasticity)
- Neural compiler (high level function <=> low level spikes)
Central problems¶
- Stimuli detection (sensors)
- Representation / manipulation of information (sensory n.)
- As spikes (pulse density modulation = PDM)
- Recall / transform (CNS)
Heterogeneity in realistic neural networks¶
- Different set of parameters for each neuron in response to stimuli
- Represented as tuning curves
Building NEF models with nengo¶
- Hypothesis / data / structure from the real counterpart
- Build NEF and check behavior
- Rinse and repeat
Central NEF principles¶
- Action potential: digital, non-linear encoding (axon hillock)
- Graded potential: analog, linear decoding (dendrite)
- Compared to ANNs:
- dendrite = weighted sum from other neurons
- axon hillock: non-linear activation function (real number output)
- Examples: Physical values: heat, light, velocity, position
- mimicking sensory neurons = transducer producing pulse signals
Transformation of encoding information by neuron clusters¶
Neural dynamics for an ensemble of neurons¶
HH model, LIF, control theory
- Neurons are noisy
- In the NEF: the basic unit is an ensemble of neurons
- Post synaptic current: approximated by one time constant
Neural representation¶
Encoding / decoding¶
- Ensemble = Digital-analog converter like digital audio processing
Symbols used when neural coding¶
- x: strength of external stimuli
- J(x): x-induced current
- \(a(x) = G[J(x)]\): firing rate of spikes ≈ activation function in ANNs
- Most important parameters
- \(J_{th}\) (threshold current)
- \(\tau_{ref}\) (refractory period → maximal spiking rate)
Population encoding¶
A group of neurons determine the value by their spikes collectively.
Contrary to sparse coding.
Some linear algebra¶
- Any vector could be decomposed as an unique linear combination of basis vectors
- The most convenient ones are orthogonal bases e.g. sin / cos in Fourier series
- The stimuli through the ensemble could be estimated from the linear combination of weights of neurons with different tuning curves
- Simplest : two neuron model (on and off)
- Adding more and more neurons differing in tuning curves (more bases) = more accurate representation
Optimal ensemble linear encoder¶
- Calculated by solving a linear system
- Nengo derives the best set of weights for an ensemble of neurons automatically
- Adding Gaussian noise in fact enhanced the robustness of the matrix of tuning curves
Example: horizontal eye position in NEF¶
- System description
- Max firing rate = 300 Hz
- On-off neurons
- Goal: linear tuning curve
- How neurons work in abducent motor neuron: an integrator
- Populations, noise, and constraints
- Solution errors associated to the number of neurons
- Noise error
- Static error
- Rounding error
Vector encoding / decoding¶
- Similar to the scalar case, but replaced with vectors
- Automatically handled by the nengo framework
Nengo examples¶
Neural transformation¶
- Linear
- Non-linear
- Weighting: positive (excitatory) / negative (inhibitory)
- Controlled integrator (memory)
- ref:
- Traditional ANN counterpart: Neural clusters A and B fully connected to combination layer, respectively
- Making a subnetwork: factory function
Communication channel¶
- Output of one ensemble => Input of another ensemble
- Traditional ANN counterpart: fully-connected layers
- \(w_{ji} = \alpha_je_jd_i\)
- nengo: simply
Connection(A, B)
Static gain c
(multiplication with a scalar)¶
- \(w_{ji} = c\alpha_je_jd_i\)
- nengo:
Connection(A, B, transform=c)
- c = a + b
- nengo:
Connection(A, C); Connection(B, C)
- Adding two vectors: just change
Nonlinear transformation¶
- nengo: define a vector transformation function
=>Connection(A, B, function=f)
Negative weight¶
- An ensemble of inhibitory neurons
Neural dynamics¶
- Neural control systems: non-linear, time-variant (modern control theory)
- 1st order ODEs
- State variables as a vector
- \(\mathbf{x}(t) = \mathbf{x}(t - \Delta t) + f(t - \Delta t, \mathbf{x}(t - \Delta t))\)
- Example: cellular automata finite state machine (Game of life)
Linear control theory¶
u: input, y: output, x: internal states
\(\mathbf{\dot{x}}(t) = A \mathbf{\dot{x}}(t) + B \mathbf{u}(t)\)
\(\mathbf{y}(t) = C \mathbf{x}(t) + D \mathbf{u}(t)\)
Frequency response and stability analysis¶
- Laplace transform \(L\{f(t)\} = \int^\infty_0e^{-st}f(t)dt = F(s)\)
- Impulse response: \(h(t) = \frac{1}{\tau}e^{-t/\tau}, \ H(s) = \frac{1}{1 + s\tau}\). Stable (pole at the left half plane)
- Convolution in the time domain = multiplication in the Laplace (s-domain)
Neural population model¶
- Linear decoder for post-synaptic current (PSC)
- \(A^\prime = \tau A + I\)
- \(B^\prime = \tau B\)
Recurrent connections¶
- Positive feedback:
- Negative feedback:
(without stimuli),
(with stimuli) - Dynamics:
: step stimuli + feedback - Integrators: \(A = \frac{-1}{\tau} I\)
- Oscillators: \(A = \begin{bmatrix} 0&1 cr\ -\omega^2&0 \cr \end{bmatrix}\)
Equations for different levels¶
- Nengo: higher level
- Implementation: lower rate / spiking levels
Sensation and Perception¶
Environment (stimulation) (analog signal) -> sensory transduction (feature extraction) -> impulse signal (sensory nerve) -> perceptions (sensory cortex) -> processing (CNS) -> action selection (motor cortex) -> impulse signal (motor nerve) -> acuator(e.g. muscle) -> action
- Internal representation of stimuli impulses
- The experience in the association cortex (not necessary the same as the outside world)
- Book: making uo the mind
e.g. Psychoacoustics: used in MP3 compression
* Threshold in quiet / noisy environment
* Equal-loudness contour in different frequencies
* Weber's law: change perceived in percent change \(S = klg\frac{I}{I_0}\)
- Convergence of information inside retina
- 260M photoreceptor cells indirectly connected to 2M ganglion (optic nerve) cells)
- Dimension reduction (pooling / convolution)
- Need of learning to see (mechanism of amblyopia): Neural wiring in the visual tract and the visual cortex (training of CNNs)
V1: primary visual cortex¶
- Detection of oriented edges, grouped by cortical columns with sensitivity to different angles
- Similar to the tuning curve in NEF
Successively richer layers¶
Optic nerve -> LGN (thalamus) -> V1 -> V2 / V4 -> dorsal (metric) or ventral (identification) tracks
- Feature extraction
- Similar to convolutional neural network (CNNs)
- Demonstrated in fMRI
Ventral track¶
- What is the object?
- V2 / V4 -> Post. Inf. temporal (PIT) cortex -> Ant. Inf. temporal (AIT) cortex
- PIT: More complex features e.g. fusiform face area for fast facial recognition
- AIT: Classification of objects regardless of size, color, viewing angle...
- Hyperdimensional vector (EECS) = semantic pointer (NEF)
- Neural ensemble of 20000 in monkeys
- Thus the functions of the temporal lobe = categorizing the world:
- Primary and associative auditory
- Labeling visual objects
- Language processing for both visual and auditory cues
- Episodic memory formation by hippocampus
Dorsal track¶
- Where is the object?
- V1 -> V2 -> V5 -> parietal lobe (visual association area)
- metrical information and mathematics
- Motion detection and information for further actions
Ambiguous figures / optical illusions¶
Forms 2 attractors (interpretations)
e.g Necker cube
- External cue and expectation (top down perception)
- Report to LGN about the error
Object perception¶
- In biology: robust recognition despite color, viewing angle differences (object consistency)
- View-dependent frame of reference vs. View-invariant (grammar pattern) frame of reference
Ewert's central problems¶
- Perception: encoding stimuli from analog to digital spikes
- Central processing: transformation and recall of information, action selection
- Action execution: decoding digital spikes to response
Autoencoder in traditional ANNs¶
- Compressing the input into a smaller (dim.) representation then expand to the estimation
- Hyper dimension vector in CS
- Semantic pointer in NEF
- Novelty detection: comparison of the input to the output from trained autoencoder
Basic machine learning¶
- For y = f(x), find f
- Training, testing, validation sets
- Learning curves: overfitting if overtraining
- Cross validation to reduce overfitting and increase testing accuracy
- K-fold cross validation
- SVM: once worked better than ANNs
- Converting low dim but complex border to higher dim. simpler (even linear) border by transformation of data points
Classical cognitive systems (expert system)¶
- Symbols and syntax processing (LISP)
- Failed due to low BP (unable to solve to meaning of symbols)
- Another attempt: connectionist (semantic space) => too complex
- Symbol binding system: 500M neurons to recognize simple sentences (fail)
- Until the semantic pointer hypothesis: explaining high level cognitive function
- Halle Berry neurons (grandmother neurons): highly selective to one category instances (sparse coding)
- However most instances are population coding
Semantic pointer and SPA¶
- Equals to hyperdimensional vector in the mathematical sense
- Presented by an ensemble of neurons in biology
- The semantic space (hyperdimensional space) holds information features
- Needs enough dimensions for the overwhelming number of concepts in the world
- Pointers = symbols = general concepts
- Indirect addressing of complex information
- Shallow and deep manipulation (dual coding theory)
- Efficient transformation (call by address)
- Shallow semantics (e.g. text mining): symbols and stats only, does not encode the meaning of words
- Nengo:
Encoding information in the semantic pointer¶
Circular convolution for syntax processing
* Readily extract the information in SP after filtered some noise
* Does not incur extra dimensions
* Works on reals numbers (XOR works on binaries only)
* Solves Jackendoff's challenges
* Binding problem : red + square vs green + circle
* Problem of 2: small star vs big star
* Problem of variable: blue fly (n.) vs. blue fly(v.): binding restrictions
* Binding in working memory vs long-term memory
One could combine multiple sources of input (word, visual, smell, auditory)
Action control¶
Behavioral pattern / coordination
Affordance competition hypothesis¶
- Affordance part: continuously updating the status
- Competition part: select best action by utility (spiking activity)
In biology: - Premotor / supplementary motor cortex
- Weighted summation of previously learned motor components (basis functions) -> desired movement
- Primary motor cortex
- Basal ganglia
- Caudate, putamen, globus pallidus, SN
- Excitation and inhibitory projections
- Dopaminergic neurons: reward expectation: reinforcement learning
- Movement initiation
- Direct, indirect, and hyperdirect pathways
- Cerebellum
- Learning and control of movements
- Error-driven (similar to back propagation): supervised learning
- Hippocampus: self-organizing (Hebbian, STDP): unsupervised learning
Neural optimal control hierarchy (NOCH)¶
Computational model by students of Eliasmith, including:
* Cortex (premotor)
* cerebellum
* basal ganglia
* motor cortex
* brain stem and spinal cord
Performing movement in robot arms¶
- Joint angle space [θ1, θ2, ...]: degree of freedom
- Operational space (end point vector)
High level -> mid level -> low level control signals
Similar to the latter half of autoencoder.
Functional level model¶
Loop of
* Cortex: memory / transformations, crude selection
* Basal ganglia: utility -> action (cosine similarity)
* Thalamus: monitoring
Rules for manipulation¶
- Symbols, fuzzy logic, but not compatible to neural networks
- Basal ganglia: manipulation
\vec{s} = M_b \cdot \vec{w}
$$ - Rehearsal of alphabet
Timing of neuron's response: ~15ms delay to make decision.
The less utility difference, the longer the latency.
- Parametric study on computational models
Tower of Hanoi task¶
- Perceptual strategy from symbolic calculation is not biologically plausible in Eliasmith paper (not learning the rule).
- 150k neurons
ACT-R architecture¶
Symbol -> neural networks
Comparative to fMRI BOLD signal.
Learning and memory¶
Ref: Neuroeconomics, decision making and the brain.
Learning: stimulus altered behavior. Not hardwired.
Memory: storage of learned information.
Learning in biology¶
- Neural level: synapse strength, neural gene expression
- Brain regions: coordination
Machine learning¶
- Weight changes in synaptic connections
- Neural activity states: dynamic stability (attractor)
Biological memories in detail¶
- Declarative (explicit) memory: medial temporal lobe and neocortex
- Events (episodic): 5W1H, past experience
- Facts (semantic): grammar, common sense (context-free)
- Non-declarative memory
- Procedural: basal ganglia
- Perceptual priming: short path for recall for previous stimuli
- Conditioning: cerebellum
- Non-associative: reflex
- Sensory memory: buffer
- 9-10 sec for schoic (hearing)
- 0.5 sec for iconic (vision)
- Pavlov's dog: classical conditioning
- Skinner: operant conditioning
- Acquisition, extinction, spontaneous recovery (long-term memory)
- Memory: recall / recognize past experience
- Conditioning: associate event and response
- Learning: change behavior to stimuli
- Plasticity: change neural connections
- Functional: chemical connection change
- Structural: physical connection change
Dentate gyrus -> CA3 -> CA1
* Long-term potentiation (LTP) upon high freq stimulation: enhances EPSP
* Long-term depression (LTD) upon los freq stimulation: inhibits EPSP
* Neural growth even at 40 y/o
Inside LTP / LTD¶
* Glutamate (AMPAR, NMDAR) : excitatory
* GABA: inhibitory
Second messengers (mid-term effects)
Learning rules¶
- Freud -> Hebb (1949): fire together, wire together
\Delta w = \epsilon\gammai\gammaj
\(\epsilon\): learning rate
\(\gamma_i\): postsynaptic firing rate
\(\gamma_j\): presynaptic firing rate
- Spike-time-dependent plasticity from experimental data
- Pre synaptic spike then post one: LTP
- Post synaptic spike then pre one: LTD
hPES rule¶
Limitations on weight change
Reinforcement learning¶
E.g. operant conditioning (Skinner)
- Expected value \(E[ x ]\)
- Expected utility \(U(E[ x ]) \approx log(E[ x ])\)
- Basic axiomatic form (Pareto)
- Weak axioms of revealed preference (WARP)
- Generated axioms of revealed preference (GARP)
Value function V(s) and prediction error¶
\(V_{k+1}(s_k) = (1-\alpha)V_k(s_k) + \alpha\delta_k\)
Error: \(\delta_k = r_k - V_k(s_k)\)
For multiple stimuli: Rescorla-Wagner model
\(V_k^{net} = \Sigma V_{k}(stim)\)
Biological RL¶
Dopamine reward pathway for movement and motivation.
Increased dopamine secretion for a sudden reward. The same as Error: \(\delta_k = r_k - V_k(s_k)\)
Decision making¶
- Problem: no immediate feedback (reward) => need to think about the future and maximize aggregate reward
- Bellman equation: reduction of recursive reward with temporal difference (\(V_k(S_{t+1})- V_k(S_t)\))
\(V(S_t) = r(S_t) + E[V(S_{t+1})|S_t]\)
\(\delta_t = r_t + V_k(S_{t+1})- V_k(S_t)\)
* Markov decision process
* Q learning
* Q function \(Q(s, \pi)\)
* Policy \(\pi(s)\): mapping state to actions
\(Q_{t+1}(S_t, a_t) = Q_{t}(S_t, a_t) + \alpha\delta_t\)
\(\delta_t = r_t \gamma_{max}Q_{t+1}(S_t, a_t) - Q_{t}(S_t, a_t)\)
SPAUN model¶
SPAUN = Semantic pointer architecture unified network, all things put together
- Single perceptual system (eye)
- Single motor system (arm)
- Background knowledge (SPA)
- Abilities
- Similar to human in working mem limitations (3-7)
- Behavior flexibility
- Adaptation to reward
- Confusion to invalid input