Maximise the distance of the closest point from the decision boundary.
Points that are closest to the decision boundary are support vectors
Max-margin: learning a function that identifies sensible data (e.g. sentences that make sense), thats what we do with the algorithm he explains of finding a prob dist bigger at the data points that "anywhere" else. This will, in particular, make the NN learn a good representation of the data, or embedding. For this we use hinge loss. In practice, we do this