Pavlovian experiment: make a dog salivate by providing food whenever there’s a certain stimulus
Stimulus then Unconditional stimulus ⇒ Unconditional response
Then, add a new stimulus ⟶ the animal will learn to associate it with the food
still stimulus, but no food afterwards ⟶ the animal “unlearn” to pair the stimulus with the food
- Stimulus: $u_i ∈ \lbrace 0, 1 \rbrace$
Reward: $r_i ∈ \lbrace 0, 1 \rbrace$
- Predictor: $v_i ≝ w u_i$
L_i ≝ δ_i^2 = (r_i - v_i)^2
Then, gradient descent:
Rescorla-Wagner Rule:w ← w + ε u_i δ_i
Reward ⟶ delivered with a certain probability $p$
⟹ the predicted function fluctuates around $p$ ⟶ this is what is actually seen in experiments
The annimal can’t learn the association between the second stimulus and the reward if the reward is already predicted by the first stimulus.
You want to transfer what it learnt with a first stimulus to a second one (make the animal forget about the first stimulus and just focus on the second one).
In practice, the animal can learn without a reward, so the model isn’t fitted for in this case.