Lecture 4: Neuromimetic Navigation Strategies
Teacher: Benoît Girard
Navigation: communities of roboticians and neuroscientists intermingled
Robotics/Neuroscience: more or less same algorithms/concepts, but not the same motivations
Ideas more general than navigation ⇒ decision making, etc… (it could have nothing to do with vision)
Neural basis for navigation:
- Basal ganglia: for Reinforcement Learning
Superior Colliculus: several layers, one of them being a map of the visual field ⟹ there’s a mapping between the surface of the SC and the visual field
- communicates with basal ganglia
Taxonomy: different kinds of strategies:
- Target/Beacon approach: go straight toward the target
- Stimulus-triggered response
- Map-based: place-triggered (from place cells ⟶ model-free, you don’t have the transition function) VS topological/metric (model-based)
⟶ Complexity of spatial information processing is increasing.
Simpler taxonomy: response-based (at least 6 different strategies) VS map-based navigation
Ex: a rodent in a pool, looking for the platform (not to swim anymore). In this case, reaching the platform is rewarding in itself.
Hidden platform with a cue (ex: flag) to indicate it ⟹ aim = to select some cues among all, focus on the relevant ones (perceptual discrimination among all the visual cues).
The cues are not themselves linked to the relevant actions (the actions are up to the SC for instance)
We remove all the allocentric information, the animal look for food in a dark room: when it finds food, it goes back straight to the nest (in a straight pathway) ⇒ the animal has done path integration.
⟹ there’s no learning involved (you sum your accumulated movements mentally) BUT you accumulate errors → not reliable for navigation (to use just when you have no choice)
Grid cells are likely to be involved in this
1936: mutilated rats (deaf, blind, etc… → no allocentric input anymore) in a maze already trained to follow a complicated path by path-integration ⟶ end up finding their way in it
Praxis ⇒ supervized learning first, then use automated routine and idiothetic information
TD-learning with a discount factor → curse of dimensionality, too many repetitions needed
But advantage ⟶ easy computation (the price for it the long convergence time)
Other main drawback: to relearn something, it’s even longer than for the initialization
Other possibility: learn the graph of the location, then use this information to go from $A$ to $B$ ⟶ problem: shortest path linear in the number of vertices and edges ⇒ costly, compared to accessing the relevant values in an vector.
Q-learning not practical → no reasoning on the structure of space, you may need to do a long backpropagation to learn that a path you explore leads to a state you know is not advantageous.
Egocentric sequential strategy
Learn based on sequences of your sensory input, not the places/locations you encounter.
You no longer use location as an input, but sensory information.
Then: tradeoff to meet between location and sensory information.
Srategy combination: combine the Q-values estimations of differents system by summing them.
Ex: rat in a maze: the input might be, classically the intersections in the maze, but also the “shape of the corridor” (ex: T-shape intersection, straight corridor, etc…). Then, apply Q-learning to both of these and merge them.
Leave a comment