Publications & Tech Reports

### Conference Publications

A Simple Practical Accelerated Method for Finite Sums

[PDF]
[Code]
We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method. Our method achieves an accelerated convergence rate on strongly convex smooth prob- lems. Our method has only one parameter (a step size), and is radically simpler than other accelerated methods for finite sums. Additionally it can be applied when the terms are non-smooth, yielding a method applicable in many areas where operator splitting methods would traditionally be applied.

@ARTICLE{adefazio-nips2016,
author = {Aaron Defazio},
title = {A Simple Practical Accelerated Method for Finite Sums},
journal = {Advances in Neural Information Processing Systems 29 (NIPS 2016)},
year = {2016}
}

Non-Uniform Stochastic Average Gradient Method for
Training Conditional Random Fields

[PDF]
We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. Our experimental results reveal that our method often significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimally-tuned stochastic gradient methods in terms of test error.

@ARTICLE{mschmidt-aistats2015,
author = {Mark Schmidt and Reza Babanezhad and Mohamed Osama Ahmed and
Aaron Defazio and Ann Clifton and Anoop Sarkar
},
title = {Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields},
journal = {18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015)},
year = {2015}
}

SAGA: A Fast Incremental Gradient Method With Support for
Non-Strongly Convex Composite Objectives

[PDF]
In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA, SAGA supports non-strongly convex problems directly, and is adaptive to any inherent strong convexity of the problem. We give experimental results showing the effectiveness of our method.

@ARTICLE{adefazio-nips2014,
author = {Aaron Defazio and Francis Bach and Simon Lacoste-Julien},
title = {SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly
Convex Composite Objectives},
journal = {Advances in Neural Information Processing Systems 27 (NIPS 2014)},
year = {2014}
}

Finito: A Faster, Permutable Incremental Gradient Method
for Big Data Problems

[PDF]
[Appendix]
Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement scheme that in practice gives further speed-ups. We give empirical results showing state of the art performance.

@ARTICLE{adefazio-icml2014,
author = {Aaron Defazio and Tiberio Caetano and Justin Domke},
title = {Finito: A Faster, Permutable Incremental Gradient Method for Big
Data Problems},
journal = {The 31st International Conference on Machine Learning (ICML 2014)},
year = {2014}
}

A Convex Formulation for Learning Scale-Free Networks
via Submodular Relaxation

[PDF]
[Appendix]
[Code]
A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lovasz extension to obtain a convex relaxation. For tractable classes such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

@ARTICLE{adefazio-nips2012,
author = {Aaron Defazio and Tiberio Caetano},
title = {A Convex Formulation for Learning Scale-Free Networks via Submodular
Relaxation},
journal = {Advances in Neural Information Processing Systems 25 (NIPS 2012)},
year = {2012}
}

A Graphical Model Formulation of Collaborative Filtering
Neighbourhood Methods

with Fast Maximum Entropy Training

[PDF]
with Fast Maximum Entropy Training

Item neighbourhood methods for collaborative filtering learn a weighted graph over the set of items, where each item is connected to those it is most similar to. The prediction of a user's rating on an item is then given by that rating of neighbouring items, weighted by their similarity. This paper presents a new neighbourhood approach which we call item fields, whereby an undirected graphical model is formed over the item graph. The resulting prediction rule is a simple generalization of the classical approaches, which takes into account non-local information in the graph, allowing its best results to be obtained when using drastically fewer edges than other neighbourhood approaches. A fast approximate maximum entropy training method based on the Bethe approximation is presented which utilizes a novel decomposition into tractable sub-problems. When using precomputed sufficient statistics on the Movielens dataset, our method outperforms maximum likelihood approaches by two orders of magnitude.

@ARTICLE{adefazio-icml2012,
author = {Aaron Defazio and Tiberio Caetano},
title = {A Graphical Model Formulation of Collaborative Filtering Neighbourhood
Methods with Fast Maximum Entropy Training},
journal = {The 29th International Conference on Machine Learning (ICML 2012)},
year = {2012}
}

### Tech Reports

A Comparison of Learning Algorithms on the Arcade Learning Environment

[PDF]
Reinforcement learning agents have traditionally been evaluated on small toy problems. With advances in computing power and the advent of the Arcade Learning Environment, it is now possible to evaluate algorithms on diverse and difficult problems within a consistent framework. We discuss some challenges posed by the arcade learning environment which do not manifest in simpler environments. We then provide a comparison of model-free, linear learning algorithms on this challenging problem set.

@TECHREPORT{adefazio-rl2014,
author = {Aaron Defazio and Thore Graepel},
title = {A Comparison of Learning Algorithms on the Arcade Learning Environment},
institution = {Australian National University},
year = {2014}
}

### Articles

Linear programming in low dimensions is easy

[PDF]
### PhD Thesis

New Optimisation Methods for Machine Learning

[PDF]
Supervised by Tiberio Caetano
In this work we introduce several new optimisation methods for problems in machine learning. Our algorithms broadly fall into two categories: optimisation of finite sums and of graph structured objectives. The finite sum problem is simply the minimisation of objective functions that are naturally expressed as a summation over a large number of terms, where each term has a similar or identical weight. Such objectives most often appear in machine learning in the empirical risk minimisation framework in the non-online learning setting. The second category, that of graph structured objectives, consists of objectives that result from applying maximum likelihood to Markov random field models. Unlike the finite sum case, all the non-linearity is contained within a partition function term, which does not readily decompose into a summation.

For the finite sum problem, we introduce the Finito and SAGA algorithms, as well as variants of each. The Finito algorithm is best suited to strongly convex problems where the number of terms is of the same order as the condition number of the problem. We prove the fast convergence rate of Finito for strongly convex problems and demonstrate its state-of-the-art empirical performance on 5 datasets.

The SAGA algorithm we introduce is complementary to the Finito algorithm. It is more generally applicable, as it can be applied to problems without strong convexity, and to problems that have a non-differentiable regularisation term. In both cases we establish strong convergence rate proofs. It is also better suited to sparser problems than Finito. The SAGA method has a broader and simpler theory than any existing fast method for the problem class of finite sums, in particular it is the first such method that can provably be applied to non-strongly convex problems with non-differentiable regularisers without introduction of additional regularisation.

For graph-structured problems, we take three complementary approaches. We look at learning the parameters for a fixed structure, learning the structure independently, and learning both simultaneously. Specifically, for the combined approach, we introduce a new method for encouraging graph structures with the “scale-free” property. For the structure learning problem, we establish SHORTCUT, a O(n^2.5) expected time approximate structure learning method for Gaussian graphical models. For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.

Our thesis as a whole introduces a new suit of techniques for machine learning practitioners that increases the size and type of problems that can be efficiently solved. Our work is backed by extensive theory, including proofs of convergence for each method discussed.

@PHDTHESIS{adefazio-thesis2014,
author = {Aaron Defazio},
title = {New Optimisation Methods for Machine Learning},
school = {Australian National University},
year = {2014},
note = {http://www.aarondefazio.com/pubs.html}
}

### Honours Research

Network Topology Tomography

[PDF]
Supervised by Tiberio Caetano
@MASTERSTHESIS{adefazio-honours2010,
author = {Aaron Defazio},
title = {Network Topology Tomography},
school = {Australian National University (ANU)},
year = {2010},
type = {Undergraduate Honors Thesis}
}