Product of Expectations (4/365)

Suppose you invent a function F that operates on some domain D. An incredibly fruitful exercise in mathematics is to try and see if F distributes in some way over some operator in the domain. Consider these familiar examples.

  1. (a+b)c = ac + bc; multiplication distributes over addition
  2. P(A \cap B) = P(A)P(B); probability distributes over the intersection of events when A,B are independent
  3. g (x y) g^-1 = (g x g^{-1}) (g y g ^ {-1}); conjugation by g in a group distributes over the product

Sometimes, you are only fortunate enough to get an inequality

  1. \parallel x + y \parallel \le \parallel x \parallel + \parallel y \parallel, the triangle inequality that says that the straight line distance between two points is always the shortest. Here the distance function is distributed over addition.
  2. H(X,Y) \le H(X) + H(Y), the joint entropy of X,Y is always smaller than the sum of their individual entropies. A variation of the above.
  3. \psi(E(X)) \le E(\psi(X)), this is Jensen’s inequality where \psi is a convex function and E(X) is the expectation of a random variable X. This is incredibly useful when it comes to maximizing the probability of the data in machine learning.

Why all this motivation? Well, there is one equality that everyone would have seen many times and one that I have constantly failed to remember or work out why is the case. And that is the Cauchy-Schwarz inequality. In the case of probability (Cauchy-Bunyakovskii inequality) it is the following

(E(|XY|)^2 \le EX^2 EY^2

Now, what’s this all about? I motivated this post by saying that it’s often useful to investigate if some interesting function (in this case E) distributes over other operators. See, we already know that E is a linear operator and hence we get

E(X + Y) = E(X) + E(Y)

But can we get something if we multiply E(X)E(Y)? Let’s try. Consider squaring an expectation: E(X)E(X). When you expand this out you will see that it is an expectation over the cartesian product of the underlying distribution. This means we can’t relate it to E(X^2). So, what will work? How about this: E(X^2)E(X^2)? This is equal to (E(X^2))^2. Though trivial, we seem to be getting somewhere if we look at the expectation of the square of the random variable. Now the general case E(X^2)E(Y^2). Unfortunately, the proof in the book is not very intuitive/revealing. And I may come back to this later when I find one. But, the proof is as follows

\begin{aligned}  (a-b)^2 \ge 0 \implies a^2 + b^2 \ge 2ab \\  \text{Suppose } EX^2 > 0, EY^2 > 0 \\  \text{Let } E\bar{X} = \frac{X}{\sqrt{EX^2}}, E\bar{Y} = \frac{Y}{\sqrt{EY^2}} \\  \text{Since } \bar{X}^2 + \bar{Y}^2 \ge 2|\bar{X}\bar{Y}| \\  2 = E\bar{X}^2 + E\bar{Y}^2 \ge 2E|\bar{X}\bar{Y}| \\  \text{Therefore } E|\bar{X} \bar{Y}| \le 1 \\  \text{Hence } (E|XY|)^2 \le EX^2 EY^2  \end{aligned}

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

1 Response to Product of Expectations (4/365)

  1. Pingback: Correlation – I (6/365) | Latent observations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s