I've went through the hundreds of ICML 2016 papers and curated a subset that look interesting to me.
In no particular order:
Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier
Jacob Abernethy,
Elad Hazan
[abs]
[pdf]
[supplementary]
Variance Reduction for Faster Non-Convex Optimization
Zeyuan Allen-Zhu,
Elad Hazan
[abs]
[pdf]
Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives
Zeyuan Allen-Zhu,
Yang Yuan
[abs]
[pdf]
Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling
Zeyuan Allen-Zhu,
Zheng Qu,
Peter Richtarik,
Yang Yuan
[abs]
[pdf]
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
Dario Amodei,
Rishita Anubhai,
Eric Battenberg,
Carl Case,
Jared Casper,
Bryan Catanzaro,
JingDong Chen,
Mike Chrzanowski,
Adam Coates,
Greg Diamos,
Erich Elsen,
Jesse Engel,
Linxi Fan,
Christopher Fougner,
Awni Hannun,
Billy Jun,
Tony Han,
Patrick LeGresley,
Xiangang Li,
Libby Lin,
Sharan Narang,
Andrew Ng,
Sherjil Ozair,
Ryan Prenger,
Sheng Qian,
Jonathan Raiman,
Sanjeev Satheesh,
David Seetapun,
Shubho Sengupta,
Chong Wang,
Yi Wang,
Zhiqian Wang,
Bo Xiao,
Yan Xie,
Dani Yogatama,
Jun Zhan,
Zhenyao Zhu
[abs]
[pdf]
On the Iteration Complexity of Oblivious First-Order Optimization Algorithms
Yossi Arjevani,
Ohad Shamir
[abs]
[pdf]
[supplementary]
Black-box Optimization with a Politician
Sebastien Bubeck,
Yin Tat Lee
[abs]
[pdf]
Importance Sampling Tree for Large-scale Empirical Expectation
Olivier Canevet,
Cijo Jose,
Francois Fleuret
[abs]
[pdf]
[supplementary]
CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy
Ran Gilad-Bachrach,
Nathan Dowlin,
Kim Laine,
Kristin Lauter,
Michael Naehrig,
John Wernsing
[abs]
[pdf]
Solving Ridge Regression using Sketched Preconditioned SVRG
Alon Gonen,
Francesco Orabona,
Shai Shalev-Shwartz
[abs]
[pdf]
[supplementary]
Variance-Reduced and Projection-Free Stochastic Optimization
Elad Hazan,
Haipeng Luo
[abs]
[pdf]
[supplementary]
On Graduated Optimization for Stochastic Non-Convex Problems
Elad Hazan,
Kfir Yehuda Levy,
Shai Shalev-Shwartz
[abs]
[pdf]
[supplementary]
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang,
Lihong Li
[abs]
[pdf]
[supplementary]
Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning
Xingguo Li,
Tuo Zhao,
Raman Arora,
Han Liu,
Jarvis Haupt
[abs]
[pdf]
[supplementary]
A Variational Analysis of Stochastic Gradient Algorithms
Stephan Mandt,
Matthew Hoffman,
David Blei
[abs]
[pdf]
[supplementary]
Stochastic Variance Reduction for Nonconvex Optimization
Sashank J. Reddi,
Ahmed Hefny,
Suvrit Sra,
Barnabas Poczos,
Alex Smola
[abs]
[pdf]
[supplementary]
A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums
Anton Rodomanov,
Dmitry Kropotov
[abs]
[pdf]
[supplementary]
SDCA without Duality, Regularization, and Individual Convexity
Shai Shalev-Shwartz
[abs]
[pdf]
Training Neural Networks Without Gradients: A Scalable ADMM Approach
Gavin Taylor,
Ryan Burmeister,
Zheng Xu,
Bharat Singh,
Ankit Patel,
Tom Goldstein
[abs]
[pdf]