I think in general, weather forecasting is a problem where there is a well-developed underlying theory based on physics, so that is the starting point for models in the field. A commonly used model is the WRF model for example ( http://www.wrf-model.org/index.php) which is used by the National Weather Service and other public and private applications. This model has an important component called Data assimilation, where model forecasts are reconciled with real observations. The approach used in that part of weather forecasting is in many ways similar to machine learning and many techniques can be transferable (and probably are by some research groups).
Largely driven by the widespread belief that ensemble based dynamical forecast systems offer the best possible predictions, the modern day meteorological community has largely shunned/abandoned purely statistical/ML approaches. There ARE notable exceptions, including the reforecast project (http://www.esrl.noaa.gov/psd/for... /reforecast2/index.html), constructed analogs (http://www.cpc.ncep.noaa.gov/pro... people/wd51hd/), linear inverse modeling (http://www.esrl.noaa.gov/psd/for... sstlim/for4gl.html), CCA-based forecasting and more.
Perhaps the most widely applied techniques in meteorology are SVD/PCA as reduction of dimensionality techniques; clustering is also widely used for post-processing of ensemble output. Multivariate regression is used for post-processing of grid point level forecast output (MOS techniques).
For future work, it likely that the prediction of seasonal and sub seasonal time-scale processes such as MJO, long duration low-frequency anomalies such as atmospheric blocking patterns and index cycles as are amenable to ML-based techniques, as are the spatial and temporal patterns and evolutions of predictand fields (e.g., temperature and rainfall) associated with some of these processes; but the noise in the full oceanic-atmospheric-land system is significant and can make extraction of coherent signals very challenging. However, the inherent limit of meaningful predictive skill attainable from dynamical modeling is about 2-3 weeks and so it is phenomena with time scales greater than that horizon that may be the most profitable for ML-based prediction techniques.