Featured

Nonuniform smoothness: The curvature that vanishes at the finish line

This blog post is about our recent ICML accepted paper: Convergence of Steepest Descent and Adam under Non-Uniform Smoothness, by Sharan Vaswani, Yifan Sun, and Reza Babanezhad Harikandeh Why does it always feel like key, fundamental optimization results in machine learning don't seem that predictive in practice? For example, our rates on Nesterov-accelerated gradient descent … Continue reading Nonuniform smoothness: The curvature that vanishes at the finish line →

Implicit bias: Easy methods on convex problems still play favorites

Ever since we're tiny little optimization children, we're told some nursery rhyme to the tune of "convex problems are easy, nonconvex problems are hard". The reason is because convex problems can be solved in polynomial time (that is, under the assumption that the number of variables and constraints are also polynomial). On the other hand, … Continue reading Implicit bias: Easy methods on convex problems still play favorites →

Nonuniform smoothness part 2: Reduction to a toy problem

This post is a "companion piece" to our ICML submission, covered in the previous post. I wrote this up in a much more detailed technical report as well, for fun. If you read our ICML paper, you'll find yourself transported on an epic journey through a new paradigm of thinking of functions, not as just … Continue reading Nonuniform smoothness part 2: Reduction to a toy problem →

This post is kind of (p)-hacky.

What is a p-value? It's what we use to say "my experiment isn't full of crap". Specifically, if I want to prove a drug is effective, then I take my data, compute a p-value, and if it's below some acceptable value (say, 0.05) I get to say that my drug was effective. Similarly, I can … Continue reading This post is kind of (p)-hacky. →

I’m almost sure this converges (to a random variable)

It's always nice to have friends who encourage you to revisit old concepts that you had seen before as a student, and at that time had just concluded "yeah, there's no way I can ever understand that", but now are forced encouraged to confront it a second time. So today's post is going to be … Continue reading I’m almost sure this converges (to a random variable) →

What does it mean to be subgaussian?

For many of us who have visited the steps of a concentration inequality from time to time, we are probably familiar with this term "subgaussian". We know it means something like "variance", in that if the variance of a random variable is 0, then it is also subgaussian with constant 0. We also know it … Continue reading What does it mean to be subgaussian? →

Spectral graph theory: deriving effective resistance

There are some themes that you hear about here and there for years. You know it's cool, you know it's probably profound, but you've never really taken it seriously. Then, one day, you decide to take a deeper look at it, and you find yourself sucked in a black hole of mathematical revelations. To me, … Continue reading Spectral graph theory: deriving effective resistance →

Tangoing with nonsymmetric contraction matrices: Understanding Gelfand’s formula and its implications on linear convergence rates

I have finally reached the point in my life when I have the misfortune of meeting not one, but two nonsymmetric "contraction" matrices, and have had to try to understand why it is that, even after all the hard work of proving that a matrix's largest eigenvalue is $\lambda_{\max}\leq \rho <1$, it does not mean that … Continue reading Tangoing with nonsymmetric contraction matrices: Understanding Gelfand’s formula and its implications on linear convergence rates →

High dimensional mean value theorem.

Here's a new "overthinking it" issue for you. For a while now, I've been relying on the high-dimensional version of a Taylor series approximation result to help me flip variables and gradients. That is, for 1-D functions with continuous Hessians everywhere, the mean value theorem says that there always exists a $z$ where $x\leq z … Continue reading High dimensional mean value theorem. →

We need to talk about Adam

At some point in every optimist's life, we are going to be asked the age-old question: "How does your method compare with Adam?" or, even more annoyingly, "Does this analysis help understand Adam?" I mean, Adam is a nice method that works very well for a very unique set of optimization problems, and not, like, … Continue reading We need to talk about Adam →