No single paper stands out, and I realize talking to people that different researchers are impressed by different contributions, so the choice of the advances below is very subjective:
* the Batch Normalization paper is exciting because of the impact it already had in training numerous architectures, and it has been adopted as a standard
* the Ladder Networks paper is exciting because it is bringing back unsupervised learning ideas (here some particularly interesting stack of denoising autoencoders) into the competition with straight supervised learning, especially in a semi-supervised contex...
Research is by definition exploratory, which means that (a) we do not know what will work and (b) we need to explore many paths, we need a lot of diversity of research directions in the scientific community. So I can only tell you about my current gut feelings and visions of where I see important challenges and opportunities that appeal to my personal aesthetics and instincts. Here are some elements of this:
Contrary to what some people think, I believe that we have already a good basic understanding of fundamentals about why deep learning works, e.g.,
* We understand that distributed representations, depth, and elements of the convolutional architecture and recurrent architectures correspond to preferences in the space of functions (or informally, priors), and we have theory explaining why some of these preferences can buy an important (sometimes exponential) statistical advantage (in the sense of needing less data to achieve some level of accuracy); more details in my book, pointing to some ...
If we only use the reinforcement signal to guide training, then I agree with Yann LeCun that it is the cherry on the cake. Even worse: when using a global reinforcement signal that is not a known differentiable function of the representations (which is typically the case), there is a serious scaling problem in terms of the number of hidden units (or action dimensions) that can be trained with respect to that signal. The number of examples, random samples or trials of actions may have to grow at least linearly with the number of units in order to provide credit assignment of quality comparab...
Like many of those who did research on neural networks in the early days (including my colleagues Geoff Hinton and Yann LeCun), I believe that we have a beautiful opportunity to learn something useful for building AI when we consider what is known about the brain, and this becomes more and more true as neuroscientists are collecting more and more data about the brain. This belief is associated with the reverse idea, that in order to really understand the core reasons why brains allow us to be intelligent, we need to construct a "machine learning" interpretation of what is happening in a bra...
It depends what you mean by deep learning. If you mean the current algorithms we know then the answer is very probably yes. But of course deep learning continues to evolve as research on this topic is thriving, and there is a clear trend to expand the realm of applications of deep learning. Neural networks used to be mostly successful for pattern recognition tasks, with phoneme and object recognition being good examples of that. However, we are seeing more and more work expanding into more classical AI areas such as reasoning, knowledge representation, and manipulating symbolic data structu...
This expands on an earlier question.
Every researcher has their opinion on this, which is a good thing. Here are some I see:
If it's hype, it's exaggeration. The exaggeration exists, I have seen it. It is there when someone presents this body of work as something that puts us much closer to human-level intelligence than we really are, often relying on the mental images many people have built of AI based on movies and science-fiction.
In my career, I have often had the thought that humans are usually too greedy. We spend too much effort on the short-term objectives and we would get more out of our effort if we tilted the balance towards longer term goals. And that means accepting that there are still many fundame...
Like many things competitive machine learning is good in the right amount. It is great to motivate some (especially new) students, those who like to compete. It makes them really learn the practice of machine learning, which is not something you can learn by only reading papers. Benchmarks also play an important role to raise our attention to new methods that outperform the earlier state-of-the-art. But they should *not* be used to discard research that does not beat the benchmark. Otherwise we risk getting stuck in incremental research. This mindset has killed innovation in some fields I k...
TensorFlow is a more direct competitor to Theano, constructed around the same basic idea of building and manipulating a computational graph representing symbolically the numerical computation to be performed. However, it needs more work, and Google seems very committed to improve it and make it a useful tool for all. We'll see how things move and where students and researchers choose to go. I'm proud of what we accomplished with Theano and seeing that Google is building something even better along similar lines, but Theano is not a religion for me. I want to support the tools which will wor...
I've had strong intuitions about the appeal of neural networks since the early days of my graduate studies, fed by the powerful ideas that people like David Rumelhart and Geoff Hinton impressed on me. In the mid to late 90's, when the machine learning community started to turn their head away from neural nets, these intuitions brought me to explore why and how neural networks had the potential to bypass the curse of dimensionality, which I considered (and still do) to be a central challenge for machine learning. This lead first to a paper with my brother Samy (at NIPS'1999, Modeling High-Di...
Unsupervised pre-training remains used heavily in natural language processing, when we use a very large unlabeled corpus of text to pre-train a representation for words, and then use or fine-tune these pre-trained representations on smaller labeled datasets. However, we have known for many years (starting with the ICML'2008 paper with Hugo Larochelle on Classification using Discriminative Restricted Boltzmann Machines) that it is generally better to jointly train the parameters with respect to a combination of both the supervised and the unsupervised objective. A recent success of this kind...