Attention using RL
Lecture 8 - Generating Language with Attention [Chris Dyer
Pervasive attention: https://github.com/elbayadm/attn2d
https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0
Transformer
Attention is all you need