The Cauchy-BUNYAKOVSKY-Schwarz inequality

Episode #2583 of “By the law of large numbers (of Russian mathematicians), your theorem has been proved by some Russian before you even thought of it”:

Viktor Bunyakovsky worked on Number Theory as well as geometry, mechanics and hydrostatics. He discovered the Cauchy-Schwarz inequality 25 years before Cauchy or Schwarz.

Published in:  on April 10, 2007 at 8:41 am Comments (16)

Measure concentration

I’ve been preparing a lecture on the Johnson-Lindenstrauss Lemma and dove deep into the beautiful world of measure concentration. Matousek’s chapter on the topic is excellent in this regard, but I found that the flow of the results (from measure concentration towards the JL Lemma) was not to my taste either personally or pedagogically (it’s possible to discuss the lemma algorithmically using elementary proofs without having to uncover the deep measure concentration machinery chugging below).

In that respect, Alexander Barvinok’s lecture notes on measure concentration prove to be quite useful. He starts from simple measure concentration on the sphere, builds up to the JL Lemma, and then carries on forward into Levy’s theorem, Dvoretsky’s theorem and the Brunn-Minkowsky inequality. He starts with a good intuitive definition of measure concentration:

Any “reasonable” function defined on a “large” probability space takes values close to its average with high probability

which nicely ties it together (in my mind) with other large deviation results in information theory as well (he also has a section on rate distortion and Cramer’s work).

Another useful nugged from Barvinok’s notes is a thorough drilling on the use of the “Laplace trick”: the technique of taking the exponential of a random variable and applying the Markov inequality in order to prove tail bounds (the standard proof of the Chernoff bound is one example of this).

Published in:  on at 8:25 am Comments (6)

Inventor of ‘idf’ passes on

Karen Sparck-Jones, professor at Cambridge University, winner of the Lovelace Medal and the Allen Newell Award (among other awards), died today. She was a pioneer in NLP and information retrieval, and is also the inventor of one of the most fundamental notions in information retrieval: the concept of IDF (or inverse document frequency).

Information retrieval methods are at the heart of web search today, even in the age of link structures and PageRank. One of the most elementary and yet most enduring ideas in information retrieval is the concept of a document as a “bag of words”; namely that even if we encoded a document as a (weighted) set of terms occuring in it, we can extract meaningful information from it. The ‘bag-of-words’ model itself dates back to at least to Gerald Salton’s IR book in 1968; it is probably much older.

Once you think of documents as bags of “terms”, you need a way of weighting these terms in a way that allows “similar” documents to have a similar set of weights. Term frequency (TF) is one way of doing this: weight a term with a number proportional to how often it appears. This makes some sense; a document about soccer might use the word ‘football’ or ’soccer’ fairly often, as compared with a document on cricket.

However, words like ‘a’, ‘an’ and ‘the’ also appear very often in a document. How does one prevent these terms from washing away any semantic context one might have ? One idea is to weight the term frequency by something called ‘Inverse Document Frequency’ (IDF). For a given collection of documents, count in how many documents a fixed term appears. This number is the term’s document frequency. Clearly, a term that appears in all documents can’t be a useful distinguisher of a document and so the larger this number is, the less relevant the term. Therefore, by multiplying the term frequency by a function inversely related to the document frequency (if p is the fraction of documents a term appears in, the IDF is often set equal to log(1/p)), you get a more accurate estimate of the importance of the term.

This to me is a very modern way of dealing with documents, and yet the idea itself was proposed by Karen Sparck-Jones back in 1972, well before we could imagine search engines, and fast computers that could crunch these numbers in a jiffy. I’m not an expert on IR methods, but I dare say that a simple TF-IDF computation is, even today, a first preprocessing step when organizing document collections.

(Via Hal Daume)

Published in:  on April 4, 2007 at 10:57 pm Leave a Comment

Generalized(?) Dyck Paths

A Dyck path is a useful metaphor for dealing with Catalan numbers. Suppose you are walking on an n X n grid, starting at (0,0). You can make a horizontal move to the right or a vertical move upwards. A Dyck path is a path in this grid where

  • the path ends at (n,n)
  • the path may touch (but never goes over) the main diagonal

The number of Dyck paths is precisely the Catalan number Cn-1. One way of seeing this is that all Dyck paths that touch the diagonal at some interior point can be written as the concatenation of two Dyck paths of shorter length (one from the origin to the point, and one from the point to (n,n)). Dyck paths are also related to counting binary trees, and strings of balanced parentheses (called Dyck languages).

People have studied “generalized” Dyck paths, where the grid is now rectangular (n X m), and the step lengths are appropriately skewed. However, what I’m interested in is the following seemingly simple generalization:

Let a (n,k)-Dyck path be a Dyck path with the modification that the path, instead of ending at (n,n), ends at (n,k) (k <=n). What is the number of (n,k)-Dyck paths ?

It seems like this should have been looked at, but I’ve been unable so far to find any reference to such a structure. I was wondering if readers had any pointers ? Note that this number is at most Cn-1 since any such path can be completed into a unique Dyck path.

Published in:  on April 3, 2007 at 5:09 pm Comments (6)