Natural language understanding

cosmos 22nd August 2019 at 1:41am
Artificial intelligence

Use Natural language processing and more Machine learning/Artificial intelligence methods to understand Natural language.

Language model, Text generation

https://rasa.ai/?ref=producthunt

New benchmark task: SuperGLUE (2019): https://ai.facebook.com/blog/new-advances-in-natural-language-processing-to-better-connect-people/

https://github.com/allenai/deep_qa

Many of the methods use Augmented RNNs, which have Memory, and Attention features. I think that multi-sensory integration should also be a crucial feature. So this would need to be related to Video understanding and Image understanding

Automatically make timelines from unsttructured text http://www.cs.ubc.ca/labs/imager/tr/2015/TimeLineCurator/

https://openai.com/blog/unsupervised-sentiment-neuron/https://openai.com/blog/language-unsupervised/

https://venturebeat.com/2019/08/13/nvidia-trains-worlds-largest-transformer-based-language-model/amp/

Dialogue systems

https://github.com/deepmipt/DeepPavlov?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=The%20Wild%20Week%20in%20AI

https://www.research.ibm.com/artificial-intelligence/project-debater/research.html


Language grounding

Grounding language means giving meaning to abstract symbols, like words. What is meaning? Well it just means anything that is well-defined in whatever universe you are considering. If your universe is a little simulated world a meaning could be a particular object, a particular series of actions or tasks, etc.

Language grounding can be approached as an unsupervised learning task, where patterns are found in both structured data, and language. One then tries to align these patterns.

This is a very difficult task, but deep learning is making good progress in this. This paper uses reinforcement and unsupervised learning to ground not only words relating to things, but to actions of the agent. It is even able to ground relations between objects like "next to", and properties like "red", "large", and actions like "pick up", "find", all of which it learns to interpret in the appropriate way in the environment. All of this without any supervision, except the occasional positive reward when it does a task right, and negative when it does it wrong.

For this they use an actor-critic algorithm which just updates the network's parameters in the direction which makes the policy (distribution over actions) have more predicted reward. This direction is computed using a math trick known as the Policy Gradients theorem.

The network takes as input a visual input from its camera, and a task description in language form (like "pick up pink ladder"), it combines these through some networks to produce a latent representation, which is finally transformed through another net into its policy, and an estimate of the value of the current state (value function used in the actor-critic algo)

[(unsupervised) autoregressive objectives] They make the network learn to predict its near future given its internal representation of the present visual input, and an action (temporal autoencoding). The network also learns to estimate the task description from its visual input alone (language prediction). Finally, they also include other auxiliary objectives that train the network, namely reward prediction and value replay. These task make the internal representations be better, and so it speeds up the training (which otherwise is slow because of sparse reward signals). In fact, they turned out crucial for the agent to learn anything useful at all!

Furthermore the language prediction module lets the network say, in words, what it thinks its currently doing, giving a very cool way to make the network evolution more interpretable!

After finding that the fully equipped agent (with autoregressive objectives) could learn vocabulary, they did some experiments where they found that the agent could learn new words faster if it had learned other words before. This is because before it can exhibit any lexical knowledge, the agent must learn various skills and capacities that are independent of the specifics of any particular language instruction. These include an awareness of objects as distinct from floors or walls; some capacity to sense ways in which those objects differ; and the ability to both look and move in the same direction.

They also trained the agent on a set of monograms or bigrams (1 or 2 words) specifying shape/colour/property. Then they tested the agent on combinations of words it had never seen. The fact that it was still able to succeed showed the agent’s ability to decompose phrases into constituent (emergent) lexical concepts. This reflects an ability that may be essential for human-like language learning in naturalistic environments, since linguistic stimuli rarely contain words in isolation.

Learning more complex tasks, involving more distracting objects, larger rooms and more difficult descriptions, were possible, only after the agent was trained to solve simpler tasks. This is known as curriculum learning, and it works just like schools. Curiously, from a quick estimate, their training time would correspond to no more than a few years of a human, I think. Finally, they used curriculum learning to make the agent learn the most complex tasks, where it could be instructed to not only "find X", but "find X in room Y", or "find X next to Y", the task being chosen randomly at each episode.

See the trained agent in action here: https://www.youtube.com/watch?v=wJjdu1bPJ04&feature=youtu.be