See Learning theory, Learning curve
Generalization bound techniques
Basically, anything that restricts/controls the capacity. That is, gives an inductive bias!
Almost-everywhere algorithmic stability and generalization error
Generalization in deep learning
Generalization ability of Boolean functions implemented in feedforward neural networks
Introduction to supervised learning theory
One-shot learning, Zero-shot learning
Entropy dependence of generalization in matrix map, learned via Logistic regression. Should I try with Perceptron algorithm?
[Btw side thought. For the 0/1 matrices, the only way of explaining what we see that I can think of, is that if the VC dimension of the set of epsilon-bad functions is higher for maps with the medium entropies (intuitively: there are more functions which predict wrongly when the true function has medium entropy, than when the true function has either high or low entropy)..
After looking back at some learning theory, I think this pattern can be understood if, for some reason, there are more functions (in the class of matrix map functions) which disagree significantly with the functions with row entropy 0.9, than with other functions. In technical jargon: the VC dimension of the set of epsilon-bad functions relative to functions with entropy 0.9 is larger than for other functions..
See WWIS, emails, etc..
NIPS 2017-Variance-based Regularization with Convex Objectives – optimal bias/variance tradeoff, Variance-based regularization with convex objectives