Skip to content

Commit 8f5a650

Browse files
ago109Alexander Ororbia
and
Alexander Ororbia
authored
Dynamic synapses and updates to lessons (including operant conditioning) (#109)
* init commit of exp-syn material * wrote dynamics for exp-syn * wrote dynamics for exp-syn * wrote test for exp-syn * updates to exp-syn/testing lesson * modded docs to reflect exp-syn * integrated alpha-synapse * integrated alpha-syn, cleaned up exp-syn * cleaned up alpha/exp syn * cleaned up help for exp/alpha-syn * cleaned up help for exp/alpha-syn * minor tweaks + init of rl-snn exhibit lesson (#108) Co-authored-by: Alexander Ororbia <ago@hal3.cs.rit.edu> * began draft of dyn-syn lesson * revised dyn-syn lesson further * added some text to rl-snn lesson * cleaned up dyn-syn/lesson * cleaned up dyn-syn/lesson * fixed scaling issue in exp/alpha syn * update to docs * cleaned up neurocog tutorials on plasticity * cleaned up dyn-syn lesson * cleaned up dyn-syn lesson * integrated double-exp syn model * mod to dyn-syn lesson * finished v1 of rat-maze rl-snn --------- Co-authored-by: Alexander Ororbia <ago@hal3.cs.rit.edu>
1 parent 0182340 commit 8f5a650

30 files changed

+1271
-74
lines changed

docs/images/museum/rat_accuracy.jpg

34.9 KB
Loading

docs/images/museum/rat_rewards.jpg

33.8 KB
Loading

docs/images/museum/ratmaze.png

1.01 KB
Loading

docs/images/museum/real_ratmaze.jpg

26.8 KB
Loading
30.1 KB
Loading
Loading
Loading
Loading
31.1 KB
Loading
30 KB
Loading

docs/modeling/synapses.md

+28
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,34 @@ This synapse performs a deconvolutional transform of its input signals. Note tha
6060

6161
## Dynamic Synapse Types
6262

63+
### Exponential Synapse
64+
65+
This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through an expoential kernel.
66+
67+
```{eval-rst}
68+
.. autoclass:: ngclearn.components.ExponentialSynapse
69+
:noindex:
70+
71+
.. automethod:: advance_state
72+
:noindex:
73+
.. automethod:: reset
74+
:noindex:
75+
```
76+
77+
### Alpha Synapse
78+
79+
This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through a kernel that models more realistic rise and fall times of synaptic conductance..
80+
81+
```{eval-rst}
82+
.. autoclass:: ngclearn.components.AlphaSynapse
83+
:noindex:
84+
85+
.. automethod:: advance_state
86+
:noindex:
87+
.. automethod:: reset
88+
:noindex:
89+
```
90+
6391
### Short-Term Plasticity (Dense) Synapse
6492

6593
This synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that it engages in short-term plasticity (STP), meaning that its efficacy values change as a function of its inputs/time (and simulated consumed resources), but it does not provide any long-term form of plasticity/adjustment.

docs/museum/rl_snn.md

+98-1
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,115 @@ exhibit can be found
1111

1212
## Modeling Operant Conditioning through Modulation
1313

14+
Operant conditioning refers to the idea that there are environmental stimuli that can either increase or decrease the occurrence of (voluntary) behaviors; in other words, positive stimuli can lead to future repeats of a certain behavior whereas negative stimuli can lead to (i.e., punish) a decrease in future occurrences. Ultimately, operant conditioning, through consequences, shapes voluntary behavior where actions followed by rewards are repeated and actions followed by punished/negative outcomes diminish.
15+
16+
In this lesson, we will model very simple case of operant conditioning for a neuronal motor circuit used to engage in the navigation of a simple maze.
17+
The maze's design will be the rat T-maze and the "rat" will be allowed to move, at a particular point in the maze, in one of four directions (up/North, down/South, left/West, and right/East). A positive reward will be supplied to our rat neuronal circuit if it makes progress towards the direction of food (placed in the upper right corner of the T-maze) and a negative reward will be provided if fails to make progress/gets stuck, i.e., a dense reward functional will be employed. For the exhibit code that goes with this lesson, an implementation of this T-maze environment is provided, modeled in the same style/with the same agent API as the OpenAI gymnasium.
1418

1519
### Reward-Modulated Spike-Timing-Dependent Plasticity (R-STDP)
1620

21+
Although [spike-timing-dependent plasticity](../tutorials/neurocog/stdp.md) (STDP) and [reward-modulated STDP](../tutorials/neurocog/mod_stdp.md) (MSTDP) are covered and analyzed in detail in the ngc-learn set of tutorials, we will briefly review the evolution
22+
of synaptic strengths as prescribed by modulated STDP with eligibiility traces here. In effect, STDP prescribes changes
23+
in synaptic strength according to the idea that <i>neurons that fire together, wire together, except that timing matters</i>
24+
(a temporal interpretation of basic Hebbian learning). This means that, assuming we are able to record the times of
25+
pre-synaptic and post-synaptic neurons (that a synaptic cable connects), we can, at any time-step $t$, produce an
26+
adjustment $\Delta W_{ij}(t)$ to a synapse via the following pair of correlational rules:
27+
28+
$$
29+
\Delta W_{ij}(t) = A^+ \big(x_i s_j \big) - A^- \big(s_i x_j \big)
30+
$$
31+
32+
where $s_j$ is the spike recorded at time $t$ of the post-synaptic neuron $j$ (and $x_j$ is an exponentially-decaying trace that tracks its spiking history) and $s_i$ is the spike recorded at time $t$ of the pre-synaptic neuron $i$ (and $x_i$ is an exponentially-decaying trace that tracks its pulse history). STDP, as shown in a very simple format above, effectively can be described as balancing two types of alterations to a synaptic efficacy -- long-term potentiation (the first term, which increases synaptic strength) and long-term depression (the second term, which decreases synaptic strength).
33+
34+
Modulated STDP is a three-factor variant of STDP that multiplies the final synaptic update by a third signal, e.g., the modulatory signal is often a reward (dopamine) intensity value (resulting in reward-modulated STDP). However, given that reward signals might be delayed or not arrive/be available at every single time-step, it is common practice to extend a synapse to maintain a second value called an "eligibility trace", which is effectively another exponentially-decaying trace/filter (instantiated as an ODE that can be integrated via the Euler method or related tools) that is constructed to track a sequence of STDP updates applied across a window of time. Once a reward/modulator signal becomes available, the current trace is multiplied by the modulator to produce a change in synaptic efficacy.
35+
In essence, this update becomes:
36+
37+
$$
38+
\Delta W_{ij} = \nu E_{ij}(t) r(t), \; \text{where } \; \tau_e \frac{\partial E_{ij}(t)}{\partial t} = -E_{ij}(t) + \Delta W_{ij}(t)
39+
$$
40+
41+
where $r(t)$ is the dopamine supplied at some time $t$ and $\nu$ is some non-negative global learning rate. Note that MSTDP with eligibility traces (MSTDP-ET) is agnostic to the choice of local STDP/Hebbian update used to produce $\Delta W_{ij}(t)$ (for example, one could replace the trace-based STDP rule we presented above with BCM or a variant of weight-dependent STDP).
1742

1843
## The Spiking Neural Circuit Model
1944

45+
In this exhibit, we build one of the simplest possible spiking neural networks (SNNs) one could design to tackle a simple maze navigation problem such as the rat T-maze; specifically, a three-layer SNN where the first layer is a Poisson encoder and the second and third layers contain sets of recurrent leaky integrate-and-fire (LIF) neurons. The recurrence in our model is to be non-plastic and constructed such that a form of lateral competition is induced among the LIF units, i.e., LIF neurons will be driven by a scaled Hollow-matrix initialized recurrent weight matrix (which will multiply spikes encountered at time $t - \Delta t$ by negative values), which will (quickly yet roughly) approximate the effect of inhibitory neurons. For the synapses that transmit pulses from the sensory/input layer to the second layer, we will opt for a non-plastic sparse mixture of excitatory and inhibitory strength values (much as in the model of <b>[1]</b>) to produce a reasonable encoding of the input Poisson spike trains. For the synapses that transmit pulses from the second layer to the third (control/action) layer, we will employ MSTDP-ET (as shown in the previous section) to adjust the non-negative efficacies in order to learn a basic reactive policy. We will call this very simple neuronal model the "reinforcement learning SNN" (RL-SNN).
46+
47+
The SNN circuit will be provided raw pixels of the T-maze environment (however, this view is a global view of the
48+
entire maze, as opposed to something more realistic such as an egocentric view of the sensory space), where a cross
49+
"+" marks its current location and an "X" marks the location of the food substance/goal state. Shown below is an
50+
image to the left depicting a real-world rat T-maze whereas to the right is our implementation/simulation of the
51+
T-maze problem (and what our SNN circuit sees at the very start of an episode of the navigation problem).
2052

21-
### Neuronal Dynamics
53+
```{eval-rst}
54+
.. table::
55+
:align: center
2256
57+
+-------------------------------------------------+------------------------------------------------+
58+
| .. image:: ../images/museum/real_ratmaze.jpg | .. image:: ../images/museum/ratmaze.png |
59+
| :width: 250px | :width: 200px |
60+
| :align: center | :align: center |
61+
+-------------------------------------------------+------------------------------------------------+
62+
```
2363

2464
## Running the RL-SNN Model
2565

66+
To fit the RL-SNN model described above, go to the `exhibits/rl_snn`
67+
sub-folder (this step assumes that you have git cloned the model museum repo
68+
code), and execute the RL-SNN's simulation script from the command line as follows:
69+
70+
```console
71+
$ ./sim.sh
72+
```
73+
which will execute a simulation of the MSTDP-adapted SNN on the T-maze problem, specifically executing four uniquely-seeded trial runs (i.e., four different "rat agents") and produce two plots, one containing a smoothened curve of episodic rewards over time and another containing a smoothened task accuracy curve (as in, did the rat reach the goal-state and obtain the food substance or not). You should obtain plots that look roughly like the two below.
74+
75+
```{eval-rst}
76+
.. table::
77+
:align: center
78+
79+
+-----------------------------------------------+-----------------------------------------------+
80+
| .. image:: ../images/museum/rat_rewards.jpg | .. image:: ../images/museum/rat_accuracy.jpg |
81+
| :width: 400px | :width: 400px |
82+
| :align: center | :align: center |
83+
+-----------------------------------------------+-----------------------------------------------+
84+
```
85+
86+
Notice that we have provided a random agent baseline (i.e., uniform random selection of one of the four possible
87+
directions to move at each step in an episode) to contrast the SNN rat motor circuit's performance with. As you may
88+
observe, the SNN circuit ultimately becomes conditioned to taking actions akin to the optimal policy -- go North/up
89+
if it perceives itself (marked as a cross "+") at the bottom of the T-maze and then go East/right once it has reached the top
90+
of the T of the T-maze and go right upon perception of the food item (goal state marked as an "X").
91+
92+
The code has been configured to also produce a small video/GIF of the final episode `episode200.gif`, where the MSTDP
93+
weight changes have been disabled and the agent must solely rely on its memory of the uncovered policy to get to the
94+
goal state.
95+
96+
### Some Important Limitations
97+
98+
While the above MSTDP-ET-driven motor circuit model is useful and provides a simple model of operant conditioning in
99+
the context of a very simple maze navigation task, it is important to identify the assumptions/limitations of the
100+
above setup. Some important limitations or simplifications that have been made to obtain a consistently working
101+
RL-SNN model:
102+
1. As mentioned earlier, the sensory input contains a global view of the maze navigation problem, i.e., a 2D birds-eye
103+
view of the agent, its goal (the food substance), and its environment. More realistic, but far more difficult
104+
versions of this problem would need to consider an ego-centric view (making the problem a partially observable
105+
Markov decision process), a more realistic 3D representation of the environment, as well as more complex maze
106+
sizes and shapes for the agent/rat model to navigate.
107+
2. The reward is only delayed with respect to the agent's stimulus processing window, meaning that the agent essentially
108+
receives a dopamine signal after an action is taken. If we ignore the SNN's stimulus processing time between video
109+
frames of the actual navigation problem, we can view our agent above as tackling what is known in reinforcement
110+
learning as the dense reward problem. A far more complex, yet more cognitively realistic, version of the problem
111+
is to administer a sparse reward, i.e., the rat motor circuit only receives a useful dopamine/reward stimulus at the
112+
end of an episode as opposed to after each action. The above MSTDP-ET model would struggle to solve the sparse
113+
reward problem and more sophisticated models would be required in order to achieve successful outcomes, i.e.,
114+
appealing to models of memory/cognitive maps, more intelligent forms of exploration, etc.
115+
3. The SNN circuit itself only permits plastic synapses in its control layer (i.e., the synaptic connections between
116+
the second layer and third output/control layer). The bottom layer is non-plastic and fixed, meaning that the
117+
agent model is dependent on the quality of the random initialization of the input-to-hidden encoding layer. The
118+
input-to-hidden synapses could be adapted with STDP (or MSTDP); however, the agent will not always successfully
119+
and stably converge to a consistent policy as the encoding layer's effectiveness is highly dependent on how much
120+
of the environment the agent initially sees/explores (if the agent gets "stuck" at any point, STDP will tend to
121+
fill up the bottom layer receptive fields with redundant information and make it more difficult for the control
122+
layer to learn the consequences of taking different actions).
26123

27124
<!-- References/Citations -->
28125
## References

docs/source/ngclearn.components.neurons.graded.rst

-8
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,6 @@ ngclearn.components.neurons.graded.rateCell module
3636
:undoc-members:
3737
:show-inheritance:
3838

39-
ngclearn.components.neurons.graded.rateCellOld module
40-
-----------------------------------------------------
41-
42-
.. automodule:: ngclearn.components.neurons.graded.rateCellOld
43-
:members:
44-
:undoc-members:
45-
:show-inheritance:
46-
4739
ngclearn.components.neurons.graded.rewardErrorCell module
4840
---------------------------------------------------------
4941

docs/source/ngclearn.components.synapses.hebbian.rst

-8
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,6 @@ ngclearn.components.synapses.hebbian.hebbianSynapse module
3636
:undoc-members:
3737
:show-inheritance:
3838

39-
ngclearn.components.synapses.hebbian.hebbianSynapseOld module
40-
-------------------------------------------------------------
41-
42-
.. automodule:: ngclearn.components.synapses.hebbian.hebbianSynapseOld
43-
:members:
44-
:undoc-members:
45-
:show-inheritance:
46-
4739
ngclearn.components.synapses.hebbian.traceSTDPSynapse module
4840
------------------------------------------------------------
4941

docs/source/ngclearn.components.synapses.rst

+24
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@ ngclearn.components.synapses.STPDenseSynapse module
2323
:undoc-members:
2424
:show-inheritance:
2525

26+
ngclearn.components.synapses.alphaSynapse module
27+
------------------------------------------------
28+
29+
.. automodule:: ngclearn.components.synapses.alphaSynapse
30+
:members:
31+
:undoc-members:
32+
:show-inheritance:
33+
2634
ngclearn.components.synapses.denseSynapse module
2735
------------------------------------------------
2836

@@ -31,6 +39,22 @@ ngclearn.components.synapses.denseSynapse module
3139
:undoc-members:
3240
:show-inheritance:
3341

42+
ngclearn.components.synapses.doubleExpSynapse module
43+
----------------------------------------------------
44+
45+
.. automodule:: ngclearn.components.synapses.doubleExpSynapse
46+
:members:
47+
:undoc-members:
48+
:show-inheritance:
49+
50+
ngclearn.components.synapses.exponentialSynapse module
51+
------------------------------------------------------
52+
53+
.. automodule:: ngclearn.components.synapses.exponentialSynapse
54+
:members:
55+
:undoc-members:
56+
:show-inheritance:
57+
3458
ngclearn.components.synapses.staticSynapse module
3559
-------------------------------------------------
3660

0 commit comments

Comments
 (0)