Video time! Discretization methods on quadratic and logistic regression

Hi all, this is another "tiny project" that I've been wanting to do for a while. Lately, I've been somewhat obsessed with method flows and discretization methods. While we can sit here and write integrals until our faces turn blue, I think it's time to just simulate some stuff, and just see what happens. In … Continue reading Video time! Discretization methods on quadratic and logistic regression

Small proof: $L$-smoothness definitions and equivalences

tl;dr In a former version of this post, I conjectured that while $L$-smoothness and convexity were not equivalent in definition, they were tied in some of the ``equivalent" (not equivalent) versions of their statements. To clarify, my former statement: "Convexity and smoothness are totally independent concepts and should not be confused with each other " … Continue reading Small proof: $L$-smoothness definitions and equivalences

Small proof: The gradient norms in gradient descent descends.

I've been racking my brain lately if there is an appropriate place to insert "small proofs", e.g. stuff that isn't trivial, I can't find it anywhere, but isn't cool enough to inspire a whole new paper. For now, I think I'll put it here. Background We investigate $\underset{x}{\min}\, f(x)$ using gradient descent $$ x^{(k+1)} = … Continue reading Small proof: The gradient norms in gradient descent descends.

Newton’s method II : Self concordant functions

tl;dr: Self-concordance. definition, some examples, some interesting properties. Proof of Newton's rates left as exercise for reader 🙂 An interesting generalization of strongly convex functions are these self-concordant functions, which can be defined as $$ |D^3 f(x)[u,u,u]|\leq 2M\|u\|_{\nabla^2 f(x)}^2 $$ Here, the notation $$D^3 f(x) [u,v,w] = \sum_{i,j,k}\frac{\partial^3 f(x)}{\partial x_i \partial x_j \partial x_k} \, … Continue reading Newton’s method II : Self concordant functions

Convergence proofs IV: My journey with automatic proof generators

Post by Yifan Sun In comparison to past posts, this post is really about some (relatively) recent work by a bunch of people ([1],[2],[3] to name a few). I therefore tried to spend a lot less time detailing how these things work, as these authors have posted their own excellent blog posts and presentations; however, … Continue reading Convergence proofs IV: My journey with automatic proof generators

Newton’s method I: Quadratic convergence rate

The next couple posts will focus on our favorite second order method: Newton's method. I've been going through them, partly as "review" (in quotes because I ended up learning a lot of new things) and partly to develop some intuition as to when acceleration-by-leveraging-second-ordery-info might actually help. This first post will be super short, and … Continue reading Newton’s method I: Quadratic convergence rate

Convergence proofs III: Continuous time interpretation

Post by Yifan Sun I have recently been caught up in the flurry of analyzing popular convex optimization problems as discretizations of continuous ODEs, with the objective of simplifying convergence analysis and constructing better methods. This "flurry" is by no means recent, with a somewhat "but we always knew this" vibe, and is by no … Continue reading Convergence proofs III: Continuous time interpretation