Here’s a new “overthinking it” issue for you. For a while now, I’ve been relying on the high-dimensional version of a Taylor series approximation result to help me flip variables and gradients. That is, for 1-D functions with continuous Hessians everywhere, the mean value theorem says that there always exists a $z$ where $x\leq z \leq y$, such that
$$
g”(z) (x-y)=g'(x)-g'(y).
$$
Without blinking, we use this to extend to $f:\mathbb R^n\to \mathbb R$, and say that there always exists some $z$ where
$$
\nabla^2 f(z) (x-y) = \nabla f(x) – \nabla f(y).\tag{MVT-hd}
$$
Never mind that the notion of “between $x$ and $y$” doesn’t really make sense. We just love the result, and use it to flip primal and dual solutions, based on the qualities of $\nabla^2 f(z)$, willy nilly.
But, one thing I am realizing as I go through my convex analysis adventure, is that everything that is true in high dimensions can be proven using only tools in low dimensions. And by low dimensions, I mean 1-dimensional. So, I should be able to, using only 1-D function knowledge, prove (MVT-hd).
First, (MVT-hd) is true if
$$
u_k^T\nabla^2 f(z) (x-y) =u_k^T( \nabla f(x) – \nabla f(y)).\tag{MVT-hd}
$$
for some set of $u_1,u_2,…,u_n$ spanning $\mathbb R^n$. To make life easy, let’s pick the standard basis. Then,
$$
e_k^T\nabla^2 f(z)(x-y) = \sum_{i=1}^n \frac{\partial f(u)^2}{\partial u_k\partial u_k}\Bigg|_{u=z} \cdot (x_i-y_i)
$$
and
$$
e_k^T(\nabla f(x)-\nabla f(y)) = \frac{\partial f(u)}{\partial u_k}\Bigg|_{u=x}- \frac{\partial f(u)}{\partial u_k}\Bigg|_{u=y}
$$
Let us now define some dummy 1-D functions as
$$
g_k(\alpha) = \frac{\partial f(u)}{\partial u_k}\Bigg|_{u=x+\alpha(y-x)}
$$
Then,
$$
e_k^T(\nabla f(x)-\nabla f(y))=g_k(1)-g_k(0)
$$
and
$$
g’_k(\alpha) = \sum_{j=1}^n \frac{\partial^2 f(u)}{\partial u_k\partial u_j}\Bigg|_{u=x+\alpha(y-x)}.\cdot (y_j-x_j)
$$
Now this is very promising! because now we can invoke MVT on $g_k(\alpha)$ and say that there must exist some $0\leq \bar \alpha \leq 1$ where
$$
g_k(1)-g_k(0)=g’_k(\bar\alpha)\iff e_k^T(\nabla f(x)-\nabla f(y)) = \sum_{j=1}^n \frac{\partial^2 f(u)}{\partial u_k\partial u_j}\Bigg|_{u=x+\bar \alpha(y-x)}\cdot (y_j-x_j).
$$
However, there’s one catch: there’s no reason to assume that this $\bar\alpha$ is the same point for each $k$. So, while (MVT-hd) has one $z$ for its relation, here, we’ve got $n$ different $\bar \alpha_k$s, corresponding to each $g_k$.
Of course, you could always create a new function $g(\alpha) = \sum_k u_k g_k(\alpha)$, and following the same analysis, you could show that, for any $u$, there exists some $z$ where
$$
u^T\nabla^2 f(z) (x-y) =u^T( \nabla f(x) – \nabla f(y)).\tag{MVT-hd-proj}
$$
It turns out that this alone is enough for me to prove most of what I need. However, right now I am hesitant to say that one unique $z$ exists for all $u$.
I hope to be proven wrong, but in the meantime, I shall begin my search for a counterexample!