You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dynamic synapses and updates to lessons (including operant conditioning) (#109)
* init commit of exp-syn material
* wrote dynamics for exp-syn
* wrote dynamics for exp-syn
* wrote test for exp-syn
* updates to exp-syn/testing lesson
* modded docs to reflect exp-syn
* integrated alpha-synapse
* integrated alpha-syn, cleaned up exp-syn
* cleaned up alpha/exp syn
* cleaned up help for exp/alpha-syn
* cleaned up help for exp/alpha-syn
* minor tweaks + init of rl-snn exhibit lesson (#108)
Co-authored-by: Alexander Ororbia <ago@hal3.cs.rit.edu>
* began draft of dyn-syn lesson
* revised dyn-syn lesson further
* added some text to rl-snn lesson
* cleaned up dyn-syn/lesson
* cleaned up dyn-syn/lesson
* fixed scaling issue in exp/alpha syn
* update to docs
* cleaned up neurocog tutorials on plasticity
* cleaned up dyn-syn lesson
* cleaned up dyn-syn lesson
* integrated double-exp syn model
* mod to dyn-syn lesson
* finished v1 of rat-maze rl-snn
---------
Co-authored-by: Alexander Ororbia <ago@hal3.cs.rit.edu>
Copy file name to clipboardExpand all lines: docs/modeling/synapses.md
+28
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,34 @@ This synapse performs a deconvolutional transform of its input signals. Note tha
60
60
61
61
## Dynamic Synapse Types
62
62
63
+
### Exponential Synapse
64
+
65
+
This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through an expoential kernel.
This (chemical) synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that its efficacies are a function of their pre-synaptic inputs; there is no inherent form of long-term plasticity in this base implementation. Synaptic strength values can be viewed as being filtered/smoothened through a kernel that models more realistic rise and fall times of synaptic conductance..
80
+
81
+
```{eval-rst}
82
+
.. autoclass:: ngclearn.components.AlphaSynapse
83
+
:noindex:
84
+
85
+
.. automethod:: advance_state
86
+
:noindex:
87
+
.. automethod:: reset
88
+
:noindex:
89
+
```
90
+
63
91
### Short-Term Plasticity (Dense) Synapse
64
92
65
93
This synapse performs a linear transform of its input signals. Note that this synapse is "dynamic" in the sense that it engages in short-term plasticity (STP), meaning that its efficacy values change as a function of its inputs/time (and simulated consumed resources), but it does not provide any long-term form of plasticity/adjustment.
Copy file name to clipboardExpand all lines: docs/museum/rl_snn.md
+98-1
Original file line number
Diff line number
Diff line change
@@ -11,18 +11,115 @@ exhibit can be found
11
11
12
12
## Modeling Operant Conditioning through Modulation
13
13
14
+
Operant conditioning refers to the idea that there are environmental stimuli that can either increase or decrease the occurrence of (voluntary) behaviors; in other words, positive stimuli can lead to future repeats of a certain behavior whereas negative stimuli can lead to (i.e., punish) a decrease in future occurrences. Ultimately, operant conditioning, through consequences, shapes voluntary behavior where actions followed by rewards are repeated and actions followed by punished/negative outcomes diminish.
15
+
16
+
In this lesson, we will model very simple case of operant conditioning for a neuronal motor circuit used to engage in the navigation of a simple maze.
17
+
The maze's design will be the rat T-maze and the "rat" will be allowed to move, at a particular point in the maze, in one of four directions (up/North, down/South, left/West, and right/East). A positive reward will be supplied to our rat neuronal circuit if it makes progress towards the direction of food (placed in the upper right corner of the T-maze) and a negative reward will be provided if fails to make progress/gets stuck, i.e., a dense reward functional will be employed. For the exhibit code that goes with this lesson, an implementation of this T-maze environment is provided, modeled in the same style/with the same agent API as the OpenAI gymnasium.
Although [spike-timing-dependent plasticity](../tutorials/neurocog/stdp.md) (STDP) and [reward-modulated STDP](../tutorials/neurocog/mod_stdp.md) (MSTDP) are covered and analyzed in detail in the ngc-learn set of tutorials, we will briefly review the evolution
22
+
of synaptic strengths as prescribed by modulated STDP with eligibiility traces here. In effect, STDP prescribes changes
23
+
in synaptic strength according to the idea that <i>neurons that fire together, wire together, except that timing matters</i>
24
+
(a temporal interpretation of basic Hebbian learning). This means that, assuming we are able to record the times of
25
+
pre-synaptic and post-synaptic neurons (that a synaptic cable connects), we can, at any time-step $t$, produce an
26
+
adjustment $\Delta W_{ij}(t)$ to a synapse via the following pair of correlational rules:
where $s_j$ is the spike recorded at time $t$ of the post-synaptic neuron $j$ (and $x_j$ is an exponentially-decaying trace that tracks its spiking history) and $s_i$ is the spike recorded at time $t$ of the pre-synaptic neuron $i$ (and $x_i$ is an exponentially-decaying trace that tracks its pulse history). STDP, as shown in a very simple format above, effectively can be described as balancing two types of alterations to a synaptic efficacy -- long-term potentiation (the first term, which increases synaptic strength) and long-term depression (the second term, which decreases synaptic strength).
33
+
34
+
Modulated STDP is a three-factor variant of STDP that multiplies the final synaptic update by a third signal, e.g., the modulatory signal is often a reward (dopamine) intensity value (resulting in reward-modulated STDP). However, given that reward signals might be delayed or not arrive/be available at every single time-step, it is common practice to extend a synapse to maintain a second value called an "eligibility trace", which is effectively another exponentially-decaying trace/filter (instantiated as an ODE that can be integrated via the Euler method or related tools) that is constructed to track a sequence of STDP updates applied across a window of time. Once a reward/modulator signal becomes available, the current trace is multiplied by the modulator to produce a change in synaptic efficacy.
where $r(t)$ is the dopamine supplied at some time $t$ and $\nu$ is some non-negative global learning rate. Note that MSTDP with eligibility traces (MSTDP-ET) is agnostic to the choice of local STDP/Hebbian update used to produce $\Delta W_{ij}(t)$ (for example, one could replace the trace-based STDP rule we presented above with BCM or a variant of weight-dependent STDP).
17
42
18
43
## The Spiking Neural Circuit Model
19
44
45
+
In this exhibit, we build one of the simplest possible spiking neural networks (SNNs) one could design to tackle a simple maze navigation problem such as the rat T-maze; specifically, a three-layer SNN where the first layer is a Poisson encoder and the second and third layers contain sets of recurrent leaky integrate-and-fire (LIF) neurons. The recurrence in our model is to be non-plastic and constructed such that a form of lateral competition is induced among the LIF units, i.e., LIF neurons will be driven by a scaled Hollow-matrix initialized recurrent weight matrix (which will multiply spikes encountered at time $t - \Delta t$ by negative values), which will (quickly yet roughly) approximate the effect of inhibitory neurons. For the synapses that transmit pulses from the sensory/input layer to the second layer, we will opt for a non-plastic sparse mixture of excitatory and inhibitory strength values (much as in the model of <b>[1]</b>) to produce a reasonable encoding of the input Poisson spike trains. For the synapses that transmit pulses from the second layer to the third (control/action) layer, we will employ MSTDP-ET (as shown in the previous section) to adjust the non-negative efficacies in order to learn a basic reactive policy. We will call this very simple neuronal model the "reinforcement learning SNN" (RL-SNN).
46
+
47
+
The SNN circuit will be provided raw pixels of the T-maze environment (however, this view is a global view of the
48
+
entire maze, as opposed to something more realistic such as an egocentric view of the sensory space), where a cross
49
+
"+" marks its current location and an "X" marks the location of the food substance/goal state. Shown below is an
50
+
image to the left depicting a real-world rat T-maze whereas to the right is our implementation/simulation of the
51
+
T-maze problem (and what our SNN circuit sees at the very start of an episode of the navigation problem).
To fit the RL-SNN model described above, go to the `exhibits/rl_snn`
67
+
sub-folder (this step assumes that you have git cloned the model museum repo
68
+
code), and execute the RL-SNN's simulation script from the command line as follows:
69
+
70
+
```console
71
+
$ ./sim.sh
72
+
```
73
+
which will execute a simulation of the MSTDP-adapted SNN on the T-maze problem, specifically executing four uniquely-seeded trial runs (i.e., four different "rat agents") and produce two plots, one containing a smoothened curve of episodic rewards over time and another containing a smoothened task accuracy curve (as in, did the rat reach the goal-state and obtain the food substance or not). You should obtain plots that look roughly like the two below.
0 commit comments