I've went through the hundreds of ICML 2016 papers and curated a subset that look interesting to me.
In no particular order:

Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier

Jacob Abernethy,
Elad Hazan

[abs]
[pdf]
[supplementary]

Variance Reduction for Faster Non-Convex Optimization

Zeyuan Allen-Zhu,
Elad Hazan

[abs]
[pdf]

Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives

Zeyuan Allen-Zhu,
Yang Yuan

[abs]
[pdf]

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Zeyuan Allen-Zhu,
Zheng Qu,
Peter Richtarik,
Yang Yuan

[abs]
[pdf]

Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin

Dario Amodei,
Rishita Anubhai,
Eric Battenberg,
Carl Case,
Jared Casper,
Bryan Catanzaro,
JingDong Chen,
Mike Chrzanowski,
Adam Coates,
Greg Diamos,
Erich Elsen,
Jesse Engel,
Linxi Fan,
Christopher Fougner,
Awni Hannun,
Billy Jun,
Tony Han,
Patrick LeGresley,
Xiangang Li,
Libby Lin,
Sharan Narang,
Andrew Ng,
Sherjil Ozair,
Ryan Prenger,
Sheng Qian,
Jonathan Raiman,
Sanjeev Satheesh,
David Seetapun,
Shubho Sengupta,
Chong Wang,
Yi Wang,
Zhiqian Wang,
Bo Xiao,
Yan Xie,
Dani Yogatama,
Jun Zhan,
Zhenyao Zhu

[abs]
[pdf]

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms

Yossi Arjevani,
Ohad Shamir

[abs]
[pdf]
[supplementary]

Black-box Optimization with a Politician

Sebastien Bubeck,
Yin Tat Lee

[abs]
[pdf]

Importance Sampling Tree for Large-scale Empirical Expectation

Olivier Canevet,
Cijo Jose,
Francois Fleuret

[abs]
[pdf]
[supplementary]

CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy

Ran Gilad-Bachrach,
Nathan Dowlin,
Kim Laine,
Kristin Lauter,
Michael Naehrig,
John Wernsing

[abs]
[pdf]

Solving Ridge Regression using Sketched Preconditioned SVRG

Alon Gonen,
Francesco Orabona,
Shai Shalev-Shwartz

[abs]
[pdf]
[supplementary]

Variance-Reduced and Projection-Free Stochastic Optimization

Elad Hazan,
Haipeng Luo

[abs]
[pdf]
[supplementary]

On Graduated Optimization for Stochastic Non-Convex Problems

Elad Hazan,
Kfir Yehuda Levy,
Shai Shalev-Shwartz

[abs]
[pdf]
[supplementary]

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Nan Jiang,
Lihong Li

[abs]
[pdf]
[supplementary]

Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning

Xingguo Li,
Tuo Zhao,
Raman Arora,
Han Liu,
Jarvis Haupt

[abs]
[pdf]
[supplementary]

A Variational Analysis of Stochastic Gradient Algorithms

Stephan Mandt,
Matthew Hoffman,
David Blei

[abs]
[pdf]
[supplementary]

Stochastic Variance Reduction for Nonconvex Optimization

Sashank J. Reddi,
Ahmed Hefny,
Suvrit Sra,
Barnabas Poczos,
Alex Smola

[abs]
[pdf]
[supplementary]

A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums

Anton Rodomanov,
Dmitry Kropotov

[abs]
[pdf]
[supplementary]

SDCA without Duality, Regularization, and Individual Convexity

Shai Shalev-Shwartz

[abs]
[pdf]

Training Neural Networks Without Gradients: A Scalable ADMM Approach

Gavin Taylor,
Ryan Burmeister,
Zheng Xu,
Bharat Singh,
Ankit Patel,
Tom Goldstein

[abs]
[pdf]