Projected gradient descent, revisited

It turns out that some of my previous statements on PGD may not have been, well, correct. At first I thought it was just a few typos, but as I studied my notes further, I realized there may be some fundamental flaws. So this blog post revisits some ideas and hopefully irons out some past … Continue reading Projected gradient descent, revisited →

Video time! Discretization methods on quadratic and logistic regression

Hi all, this is another "tiny project" that I've been wanting to do for a while. Lately, I've been somewhat obsessed with method flows and discretization methods. While we can sit here and write integrals until our faces turn blue, I think it's time to just simulate some stuff, and just see what happens. In … Continue reading Video time! Discretization methods on quadratic and logistic regression →

Some proofs of continuous time things

In trying to understand how continuous time interpretations can help optimization research, I find myself doing a lot of "recreating the wheel", but in a nontrivial way; I am getting these proofs that I'm sure are standard knowledge, but it's not like we learned them back when we were kids. So I'm going to try … Continue reading Some proofs of continuous time things →

A fun convex set fact

This is actually not a new thing at all, but a thing I played with a couple years ago, and I ended up using in a numerical optimization lecture to try to impress some undergrads. They were not impressed. Since then I've tuned it a bit and just kind of take it for granted now, … Continue reading A fun convex set fact →

Small proof: $L$-smoothness definitions and equivalences

tl;dr In a former version of this post, I conjectured that while $L$-smoothness and convexity were not equivalent in definition, they were tied in some of the ``equivalent" (not equivalent) versions of their statements. To clarify, my former statement: "Convexity and smoothness are totally independent concepts and should not be confused with each other " … Continue reading Small proof: $L$-smoothness definitions and equivalences →

Small proof: The gradient norms in gradient descent descends.

I've been racking my brain lately if there is an appropriate place to insert "small proofs", e.g. stuff that isn't trivial, I can't find it anywhere, but isn't cool enough to inspire a whole new paper. For now, I think I'll put it here. Background We investigate $\underset{x}{\min}\, f(x)$ using gradient descent $$ x^{(k+1)} = … Continue reading Small proof: The gradient norms in gradient descent descends. →

Achievable online predictions

Hi all, it's been a while! I took a step back from this blog to focus on teaching and research and such, but now that the last class is over for the fall, time to get back to this! Today's topic is now focused more on learning theory, which is an area I'm trying to … Continue reading Achievable online predictions →

Newton’s method II : Self concordant functions

tl;dr: Self-concordance. definition, some examples, some interesting properties. Proof of Newton's rates left as exercise for reader 🙂 An interesting generalization of strongly convex functions are these self-concordant functions, which can be defined as $$ |D^3 f(x)[u,u,u]|\leq 2M\|u\|_{\nabla^2 f(x)}^2 $$ Here, the notation $$D^3 f(x) [u,v,w] = \sum_{i,j,k}\frac{\partial^3 f(x)}{\partial x_i \partial x_j \partial x_k} \, … Continue reading Newton’s method II : Self concordant functions →

Convergence proofs IV: My journey with automatic proof generators

Post by Yifan Sun In comparison to past posts, this post is really about some (relatively) recent work by a bunch of people ([1],[2],[3] to name a few). I therefore tried to spend a lot less time detailing how these things work, as these authors have posted their own excellent blog posts and presentations; however, … Continue reading Convergence proofs IV: My journey with automatic proof generators →

Newton’s method I: Quadratic convergence rate

The next couple posts will focus on our favorite second order method: Newton's method. I've been going through them, partly as "review" (in quotes because I ended up learning a lot of new things) and partly to develop some intuition as to when acceleration-by-leveraging-second-ordery-info might actually help. This first post will be super short, and … Continue reading Newton’s method I: Quadratic convergence rate →