Projected gradient descent, revisited

It turns out that some of my previous statements on PGD may not have been, well, correct. At first I thought it was just a few typos, but as I studied my notes further, I realized there may be some fundamental flaws. So this blog post revisits some ideas and hopefully irons out some past … Continue reading Projected gradient descent, revisited →

A fun convex set fact

This is actually not a new thing at all, but a thing I played with a couple years ago, and I ended up using in a numerical optimization lecture to try to impress some undergrads. They were not impressed. Since then I've tuned it a bit and just kind of take it for granted now, … Continue reading A fun convex set fact →

Small proof: $L$-smoothness definitions and equivalences

tl;dr In a former version of this post, I conjectured that while $L$-smoothness and convexity were not equivalent in definition, they were tied in some of the ``equivalent" (not equivalent) versions of their statements. To clarify, my former statement: "Convexity and smoothness are totally independent concepts and should not be confused with each other " … Continue reading Small proof: $L$-smoothness definitions and equivalences →

Small proof: The gradient norms in gradient descent descends.

I've been racking my brain lately if there is an appropriate place to insert "small proofs", e.g. stuff that isn't trivial, I can't find it anywhere, but isn't cool enough to inspire a whole new paper. For now, I think I'll put it here. Background We investigate $\underset{x}{\min}\, f(x)$ using gradient descent $$ x^{(k+1)} = … Continue reading Small proof: The gradient norms in gradient descent descends. →

Newton’s method II : Self concordant functions

tl;dr: Self-concordance. definition, some examples, some interesting properties. Proof of Newton's rates left as exercise for reader 🙂 An interesting generalization of strongly convex functions are these self-concordant functions, which can be defined as $$ |D^3 f(x)[u,u,u]|\leq 2M\|u\|_{\nabla^2 f(x)}^2 $$ Here, the notation $$D^3 f(x) [u,v,w] = \sum_{i,j,k}\frac{\partial^3 f(x)}{\partial x_i \partial x_j \partial x_k} \, … Continue reading Newton’s method II : Self concordant functions →

Newton’s method I: Quadratic convergence rate

The next couple posts will focus on our favorite second order method: Newton's method. I've been going through them, partly as "review" (in quotes because I ended up learning a lot of new things) and partly to develop some intuition as to when acceleration-by-leveraging-second-ordery-info might actually help. This first post will be super short, and … Continue reading Newton’s method I: Quadratic convergence rate →

Convergence proofs II: (simple) estimate sequences

Post by Yifan Sun tl;dr: Estimate sequences are a very interesting way of isolating the problem of finding rates over a wide family of unconstrained problems, and can also be used to generate accelerated methods. However it's unclear whether they can easily be used on non-NAG-like methods, at least when restricted to quadratic functions. The … Continue reading Convergence proofs II: (simple) estimate sequences →

Convergence proofs I: quadratic approximations

Post by Yifan Sun, Jan 8, 2021 Convergence proofs are the bane of every optimizer's existence. They are tricky, extremely long, and mostly unintuitive. Additionally, they tend to be tribal; there are "families" of convergence proof techniques that seem to cover a particular type of method under a particular class of functions, and while efforts … Continue reading Convergence proofs I: quadratic approximations →