<aside> 📎 Jialin LU, Feb 2020 This is presented at the group meeting of Ester's lab.
</aside>
Basically this is an outsider who does not work on Bayesian methods, but somehow was volunteered to do a survey and give a presentation of the pros/cons of Bayesian Deep Learning. I will not talk about the math, but only the ideas intuitively.
Please see the typesetted pdf if you prefer reading as a short paper.
This is a supplementary version in PDF.
This is a supplementary version in PDF.
This is the corresponding slide for the presentation
This is the corresponding slide for the presentation
<aside> 💡 A reminder: if you are not into the long blog, you can first look at the supplementary pdf listed left-side, which is self-contained and should provide the main idea.
</aside>
Why I am doing this
I barely know deep learning, I do not work on this and I know even much less about Bayesian methods.
But what happened is on the NeurIPS 2020, there is a tutorial called on Bayesian Deep Learning by Emtiyaz Khan (RIKEN, Japan), which I actually failed to attend.
And later, some lab members and Dear Martin seem to agree that
"wow bayesian deep learning is kind of cool"
Then I somehow was volunteered to do this presentation
Lab mates at NeurIPS 2019 (I am the one in the rightest, wearing the volunteer shirt)
<aside> 💡 There will be no math, I will try to convey only the intuitive way for understanding.
</aside>
But anyway I read some papers and organize a short survey into this topic of Bayesian Deep Learning. The outline of this post can be summarized as follows
First I give the motivations of combing Bayesian methods with deep learning.
Then in Part 2 I introduce the main theme of approximating the posterior distribution of NN’s parameter and talk about two technical approaches for tackling it.
namely, the variational-based approximation and the interpolation-based approximation (I use this term to refer to Stochastic Weight Average and related methods)
In Part 3 I show, by referring a simple experiment, that the BDL thing is a little bit frustrating in practice and does not really work yet.
Specfically, I refer to Deep Ensemble as a simple baseline, and discuss why it works.
I will also suggest that why, for fair comparison, we should use multi-mode variational inference (such as a mixture of Gaussians) and multi-trajectory Interpolation. This is because the true posterior of DNN is too complicated and it certainly has multiple modes (high-performing local minima as lottery tickets).
I will then end with some more personal opinions
There are mainly two advice, or the possible direction of research we can do.
Bayesian Learning is great, deep learning is also great.