Holder Inequality – Generalization and Array Power-Means

inequalitiesreference-request

Summary: I have (I think) a generalisation of Hölder's inequality for positive reals that I can neither prove nor find references to. Pointers would be appreciated or, indeed, a counterexample. Thank you.

The original enquiry at math stackexchange met with no response, hence cross-posting here.

The rest of this post describes Hölder's inequality for positive reals, the conjectured "array power-means" generalisation, and some experimental evidence for the conjecture.


Hölder's inequality for real vectors $a=(a_i)_{i=1,\ldots,n}$ up to, say, $z=(z_i)_{i=1,\ldots,n}$ and weights $\Lambda=(\lambda_{a,\ldots,z})$ s.t. $\lambda_a+\ldots+\lambda_z=1$ states that
$$
(a_1+\ldots+a_n)^{\lambda_a}\ldots(z_1+\ldots+z_n)^{\lambda_z}
\ge a_1^{\lambda_a}\ldots z_1^{\lambda_z}+\ldots+a_n^{\lambda_a}\ldots z_n^{\lambda_z}
$$

Write those vectors as rows in a matrix, $M$. The LHS of Hölder is obtained by replacing each row with its arithmetic mean ($\text{rowAM}$), taking the $\Lambda$-weighted geometric mean ($\text{colGM}_\Lambda$) of the resulting column vector, then multiplying by $n$. The RHS is obtained by applying $\text{colGM}_\Lambda$ to each column, taking the $\text{rowAM}$ of the resulting row vector, then multiplying by $n$. Concisely, then, Hölder is equivalent to:
$$
\text{colGM}_\Lambda(\ \text{rowAM}(M)\ )\ \ge\ \text{rowAM}(\ \text{colGM}_\Lambda(M)\ )
$$

Motivated by the AM-GM inequality, we might say very informally that "big mean followed by little mean beats little mean followed by big mean", where the big mean is AM along the rows and the little mean is weighted GM down the columns.


The generalisation is that the "big mean" and "small mean" need not be AM and weighted-GM; they can be any pair of weighted power means with different exponents. The weights don't seem to matter. The only evidence I have, however, is from experimentation.

Formally, define the $\Lambda$-weighted $k$-power mean $\mathcal{P}_\Lambda^k$ by its action on a vector of positive reals $x=(x_\star)$ thus:
$$
\mathcal{P}_\Lambda^k(x) = \left( \sum_i \lambda_i {x_i}^k \right)^{1/k},k\ne 0,
\quad\text{and}\quad
\mathcal{P}_\Lambda^0(x) = \prod_i {x_i}^{\lambda_i}
$$

The claim is then that for arbitrary weight-vectors $\Lambda_1$ and $\Lambda_2$ and exponents $k_2\ge k_1$
$$
\text{col}\mathcal{P}_{\Lambda_1}^{k_1}(\ \text{row}\mathcal{P}_{\Lambda_2}^{k_2}(M)\ )
\ \ge\
\text{row}\mathcal{P}_{\Lambda_2}^{k_2}(\ \text{col}\mathcal{P}_{\Lambda_1}^{k_1}(M)\ )
$$

A trivial point in favour of the claim is that we have equality when $k_1=k_2$, irrespective of the potentially different weights $\Lambda_1$ and $\Lambda_2$.


Python-3 code to demonstrate the conjectured inequality is attached. Alter the final line to change the exponents in the two power means and to alter the shape of the matrix. It's set up here to fail with what I assume is merely loss of precision (the exponents are deliberately close and the matrix small). I haven't yet seen it produce an error that genuinely invalidates the inequality.

from random import random

def PM(x,p,w):
    '''Weighted power mean.  p==0 (GM) not implemented.'''
    assert len(x) == len(w)
    return sum(ww*xx**p for ww,xx in zip(w,x)) ** (1/p)

def rand_wts(n):
    '''Random weights.  They don't need to sum to 1.'''
    return [random() for _ in range(n)]

def doit(H_p,V_p,nrows,ncols):
    '''Run the experiment on prescribed powers and matrix shape.'''
    assert H_p > V_p, 'programmer must supply H_p > V_p'

    # horizontal and vertical power-mean operations
    def big_H(wts,M):
        return [ [PM(m,H_p,wts)] for m in M ]
    def small_V(wts,M):
        J = range(len(M[0]))
        return [[ PM([m[j] for m in M],V_p,wts) for j in J ]]

    # loop "forever"
    for trial in range(1,10**10):

        # random weights, random matrix
        H_w = rand_wts(ncols)
        V_w = rand_wts(nrows)
        M = [ [random() for _ in range(ncols)] for _ in range(nrows) ]

        # power-mean reduction, both orderings
        hv = big_H(H_w, small_V(V_w, M ) )[0][0]
        vh = small_V(V_w, big_H(H_w, M ) )[0][0]

        # big-then-small (vh) should win; if not, report the error
        if hv > vh:
            print(f'\nFailed on trial {trial} with M =')
            for m in M:
                print(f'  {m}')
            print(f'and H_w = {H_w}')
            print(f'and V_w = {V_w}')
            print(f'\nH(V(M)) = {hv} > {vh} = V(H(M))')
            print(f'H(V(M))/V(H(M)) = {hv/vh}\n')
            exit(1)

        # reassure the user that the experiment is actually running
        if trial>1000 and trial&(trial-1) == 0:
            print(f'OK after {trial} trials ...')

# should fail after a few seconds, only(?) due to precision issues
doit( 4.00000001,4, 3,2 )

Best Answer

This inequality is known. Indeed, inequality (1.1) by Kwapień and Szulga states that $$\Big(\int_S\Big(\int_T|f(s,t)|^q\mu(dt)\Big)^{p/q}\nu(ds)\Big)^{1/p} \\ \le \Big(\int_T\Big(\int_S|f(s,t)|^p\nu(ds)\Big)^{q/p})\mu(dt)\Big)^{1/q}$$ if $0<q<p$, where $\mu$ and $\nu$ are measures.

The case $p<q<0$ is obtained from this by replacing $f$ (which is without loss of generality $>0$) by $1/f$, at the same time replacing $p,q$ by $-p,-q$.

The case $q=0$ is now obtained by the limit transition as $q\to0$, which then yields the case $q<0<p$.

Related Question