Nice review
Adaptive Computation Time allows for varying amounts of computation per step.
See Neural networks with memory
Neural Turing machine
Differentiable neural computer
See Attention in machine learning
Neural programmer-interpreter
http://www.thespermwhale.com/jaseweston/