Comments on why deep learning works: perspectives from theoretical chemistry

cosmos 4th November 2016 at 2:43pm

See discussion in comments here and here

Following from the comments in https://charlesmartin14.wordpress.com/2016/10/21/improving-rbms-with-physical-chemistry/#comment-1499, which are more relevant here.

“This is the standard ‘picture’. Now this may or may not be a good model for what is going on. ” Ok, I see. Do you know of any other proposed alternatives to the standard “picture which explains how the energy landscape corresponds to overfitting”? Hm, I suppose that in the paper where “It is suggested that having too many Hidden nodes–at a fixed Temp.–leads to a glassy state prone to overtraining small data sets.”, there may a way of looking at overfitting as a glassy state, but I haven’t finished reading the paper yet.

Hm, thinking a bit more, if the spin-glass state has minima which are quite deep, then getting stuck in them would coincide with overfitting/small trainning error. I suppose this could be a kind of overfitting that you get if you have no regularization (or if it isn’t working properly), while the no-early-stopping overfitting is related to global minima in the funnel.. Though, I think I’m now seeing why the whole thing could be more complicated..

Anyway I think I’m starting to get how regularization is a proxy for temperature. Basically, regularization helps either remove bad minima by changing the free energy landscape, or helps escaping from them by adding fluctuations, like dropout. Actually, even dropout can be seen as changing the energy landscape I think, as adding fluctuations is like increasing temperature, which makes entropy matter more (when writting free energy as F=U-TS). Does this sound right?

This last thought made me remember the “survival of the flattest” idea in evolution ( http://www.nature.com/nature/journal/v412/n6844/abs/412331a0.html ) . Let’s say we have a wide funnel, even if it’s not that deep, a system with high mutation rate/temperature will actually end up there over an isolated minimum even if its deeper (say corresponding to overtraining)!