Small proof: $L$-smoothness definitions and equivalences

tl;dr In a former version of this post, I conjectured that while $L$-smoothness and convexity were not equivalent in definition, they were tied in some of the ``equivalent" (not equivalent) versions of their statements. To clarify, my former statement: "Convexity and smoothness are totally independent concepts and should not be confused with each other " … Continue reading Small proof: $L$-smoothness definitions and equivalences

Small proof: The gradient norms in gradient descent descends.

I've been racking my brain lately if there is an appropriate place to insert "small proofs", e.g. stuff that isn't trivial, I can't find it anywhere, but isn't cool enough to inspire a whole new paper. For now, I think I'll put it here. Background We investigate $\underset{x}{\min}\, f(x)$ using gradient descent $$ x^{(k+1)} = … Continue reading Small proof: The gradient norms in gradient descent descends.

Newton’s method II : Self concordant functions

tl;dr: Self-concordance. definition, some examples, some interesting properties. Proof of Newton's rates left as exercise for reader 🙂 An interesting generalization of strongly convex functions are these self-concordant functions, which can be defined as $$ |D^3 f(x)[u,u,u]|\leq 2M\|u\|_{\nabla^2 f(x)}^2 $$ Here, the notation $$D^3 f(x) [u,v,w] = \sum_{i,j,k}\frac{\partial^3 f(x)}{\partial x_i \partial x_j \partial x_k} \, … Continue reading Newton’s method II : Self concordant functions

Convergence proofs IV: My journey with automatic proof generators

Post by Yifan Sun In comparison to past posts, this post is really about some (relatively) recent work by a bunch of people ([1],[2],[3] to name a few). I therefore tried to spend a lot less time detailing how these things work, as these authors have posted their own excellent blog posts and presentations; however, … Continue reading Convergence proofs IV: My journey with automatic proof generators

Newton’s method I: Quadratic convergence rate

The next couple posts will focus on our favorite second order method: Newton's method. I've been going through them, partly as "review" (in quotes because I ended up learning a lot of new things) and partly to develop some intuition as to when acceleration-by-leveraging-second-ordery-info might actually help. This first post will be super short, and … Continue reading Newton’s method I: Quadratic convergence rate

Convergence proofs III: Continuous time interpretation

Post by Yifan Sun I have recently been caught up in the flurry of analyzing popular convex optimization problems as discretizations of continuous ODEs, with the objective of simplifying convergence analysis and constructing better methods. This "flurry" is by no means recent, with a somewhat "but we always knew this" vibe, and is by no … Continue reading Convergence proofs III: Continuous time interpretation

Convergence proofs II: (simple) estimate sequences

Post by Yifan Sun tl;dr: Estimate sequences are a very interesting way of isolating the problem of finding rates over a wide family of unconstrained problems, and can also be used to generate accelerated methods. However it's unclear whether they can easily be used on non-NAG-like methods, at least when restricted to quadratic functions. The … Continue reading Convergence proofs II: (simple) estimate sequences