Ah, the good old days…

BoingBoing points us to a dictionary of swearing, in every language imaginable. Reading the Hindi section brings back memories of the good ol’ days at IIT Kanpur: I can assure you that every phrase on the list could be heard almost every day. In fact, we spent the first month of freshman year receiving instruction in all possible variations on these insults (being IITans, we were quick learners ;) )

[ brief pause while I am overcome with nostalgia...]

It should come as no surprise to the Indians who read this that the Punjabi section was one of the top swear picks. Yiddish also seems like a good choice, but I am surprised at the inclusion of French in the top picks list.

Tags:

Published in: on February 28, 2005 at 5:29 pm Comments (4)

The Road to Reality

Roger Penrose’s new book, The Road To Reality, is out. Peter Woit has a detailed review of the book (I am waiting for mine to show up), but in the meantime, consider this amusing quote from the NYT review:

The perfect reader for ”The Road to Reality,” I fantasized, would be someone comfortable traversing the highest planes of abstract reasoning, yet who had missed some of the most important landmarks of scientific history — a being, perhaps, from another place and time.

I don’t know. It sounds an awful lot like a mathematician to me.

Published in: on at 8:53 am Comments (4)

SODA Outtakes: Market Equilibria

I know I know. By the time I am done with SODA outtakes, it will be time for SODA 2006. This is why we need more algorithms bloggers ! Machine learning blogs appear to be popping up faster than you can say ‘expectation-maximization’, but there is no corresponding growth spurt in algorithms. Sigh….

This outtake is much easier, because there is a well-written survey I can point to. The topic is market equilibria, or the problem of determining when equilibrium sets in when traders exchange good at various prices, and how fast this can be determined.

More formally, the problem is: given an initial allocation of goods to traders (a “supply” vector, where each “dimension” represents a single commodity), and a “preference” function for each trader (I have a high affinity for gold and cookies, you have an aversion to silver, but really like beer), determine a set of prices for goods, and a “demand” vector for each trader so that if each trader sells what they have and buys what they want (at the given prices), their preference is maximized.

Here, the preference function component for any commodity is a concave function of its amount. Note that you want a concave function in order to model “diminishing returns”. In addition, having concave utilities helps ensure unique optimal solutions (because the average of two solutions has better utility).

There are also sanity conditions (you can only buy with whatever money you have from selling, and the total amount of goods is fixed – this does not model producer economies).

This SIGACT News survey by Codenotti, Pemmaraju and Varadarajan lays out the terrain of market equilbria very nicely, outlining the history, the techniques (some beautiful methods) and what is currently known.

It’s a good example of an area that has been studied extensively (in economics) but where the computer science angle brings in questions that were not necessarily of concern earlier: issues of computational feasibility, efficiency, approximations that are characteristic of the algorithmic way of approaching a problem.

Tags: , ,

Published in: on at 6:11 am Comments (2)

Numb3rs Review: "Sabotage"

A nice thrilling episode, with very little math (but see below). As always, major spoiler alert.

Plot summary: Series of trains are derailed. At each site, a paper containing the same set of numbers is left behind. Everyone thinks it’s a code, but it actually isn’t (at least not in the traditional sense). Some good old sleuthing yields the answer.

It was funny when the NTSB official asks Charlie if he knows any cryptography. Although technically isn’t cryptanalysis what the code breakers do ? Larry the loony physicist had some hilarious quotes:

  • “Our evening was primal, in the carnal sense, not the mathematical.”
  • “The math department, the least libidinous place on campus”

The last fact is sadly not too far from the truth, but wait ! Mathematicans are cool (via Mathpuzzle)

The only real math nuggets came at the end, when Charlie was trying to explain to the female FBI agent that math appears in nature. His example: The Fibonacci series ! Bzzttt…

As it turns out, Rudbeckia Hirta just today posted a note about the exaggerated claims concerning the Fibonacci series; that the Greeks referred to the “golden ratio” (no), that the pyramids and the Parthenon somehow magically encode this ratio (only if you round off heavily), and even how the ratio is supposed to be aesthetically pleasing (no evidence). Keith Devlin has more on this.

Oh well: if Dan Brown can do it…

Tags: ,

p.s I understand that we need some kind of romantic interest to give the series some spice (although CSI and Law and Order manage perfectly well without it), but why on earth are they setting up an advisor-student relationship ? Allegedly at some distant point in the past this used to be fairly common in the humanities, but nowadays ? My only question is: how long before this becomes a “Big Issue” in the academic blogosphere ?

Published in: on February 26, 2005 at 6:39 am Comments (28)

Meta: recent comments

I added a new feature so you can see the recent comments posted on the site (this is handy if you posted a note and are looking to see if there was a response). It’s on the left, below the “Previous Posts” section.

Comments were broken for a while because I forgot to update the hack that allows me comments in the first place. They should be working again.

Published in: on February 25, 2005 at 7:01 pm Leave a Comment

SODA Outtakes: Lloyd’s algorithm and analysing heuristics

Uri Feige’s invited talk at SODA 05 was on ‘Rigorous Analysis of Heuristics for NP-hard Problems‘. His main argument (at least in the initial portion of his talk) was that it’s a good idea to analyze known heuristics for problems, because they work well in practice, and a rigorous analysis will shed light on why.

[Aside: in the theoryCS community, a "heuristic" usually refers to a procedure that has no provable performance guarantees; this could mean either no clear running time analysis, or no clear quality of optimization guarantee. The term "algorithm" is usually reserved for a procedure that has well-defined steps and ends in a well-defined state after a definite (finite) number of steps. One will occasionally find the term 'algorithm' used to refer to a heuristic in non-theory areas; beware !]

Doing such analysis, although maybe not as “sexy” as coming up with a new algorithm, is important also for “sociological” reasons. Although it might technically be correct to argue that a particular heuristic is not reliable and thus should not be used, the resolute theoretician is often faced with a mountain of empirical evidence in support of said heuristic, compared with the possibly complex and hard-to-understand algorithm that she proposes.

Some of the best known heuristics show up in clustering, especially since this is a problem that transcends many different application domains, including areas that are not as familiar with theoretical algorithmic methods. One of the best known methods for clustering with k centers is Lloyd’s algorithm (or the k-means algorithm), which works essentially as a discrete variant of expectation-maximization. The algorithm itself is very simple:

  1. Choose k “centers”
  2. Assign each point to its nearest center.
  3. Compute the k centroids of points assigned to a given center. These are the new k “centers”.
  4. Go back to step 2.

It is known that Lloyd’s algorithm will hit a local minimum fairly quickly (though not a global one: k-means is NP-hard). However, there is no analysis of how quickly this happens. In a SODA 2005 paper, Har-Peled and Sadri examine the complexity of k-means. They show that:

  1. In 1D, and in d-D (on the grid), the number of iterations of the k-means algorithm is polynomial in the number of points, the dimension, and the “spread” (the ratio of max distance to min-distance).
  2. Even in 1D, there are example point sets on which the k-means algorithm takes a linear number of iterations to converge.

What they also show is that slight modifications to the k-means heuristic result in algorithms that converge in time independent of the dimension (and with much better upper bounds). The simpler of their two schemes can be described as follows: in Step 2 of the original approach, merely move one “misclassified” point to its true cluster, rather than all.

They go on to present empirical evidence in support of their claim that these algorithms generate similar quality answers (in similar time) to k-means. This part is still a work in progress: there are very fast implementations of k-means that they don’t compare against, (but the data structures might be used to speed up their approach as well).

It is not clear to me that their paper answers to any degree the question of why Lloyd’s algorithm works as well as it appears to. However, the fact that they can prove bounds for their alternate algorithms suggests that maybe this is a line of attack to take when analyzing Lloyd’s method. These alternates are also quite simple, and so could conceivably be used as k-means replacements if one wished guaranteed running time bounds.

One final point: the authors have their code available online. Would that more would do the same !

Published in: on at 9:37 am Comments (10)

CiteULike

Yaroslav Bulatov, who runs one of the new machine learning blogs, points us to CiteULike, a very intriguing site. Some of you may have used web-based bookmark sites like Furl or del.icio.us, and may have even used the “social bookmarking” aspect of these sites.

CiteULike is a similar site, for archiving papers. Here’s how it works. If you find a paper you are interested in, you click on a bookmark link, and the paper is added to a web-based collection under your name. However, what makes this unique is that if the paper link is on a “well known” site like the ACM DL, Citeseer, arxiv, or many other places, bibliographic information is automatically added to the entry (this is because the server parses pages etc and extracts the necessary bibliographic information)

Subsequently, when you want to extract bib entries for all these papers, one click gets you a .bib file for all the papers you want. This is very handy (for example) when you are doing a literature search and reading/downloading papers by the dozen.

You can also associate tags with the entries if you choose to. This gets into the “social bookmarking” aspect of CiteULike: the idea being that if you tag a bunch of papers as being related to “artgallery”, and so does someone else, you will learn what the other person has found and vice versa. Personally, I am either too oldfashioned or too clueless to have made effective use of this feature, but the archiving/.bib features above are enough for me try it out.

Like any respectable web-based service, there are RSS feeds for any view of the collection (by tag, by person, by personal tag), and there are also watch lists, so if you want to keep track of when a new page (tag or person) adds a new paper, you can do so.

Here’s my paper feed. It will be an interesting experiment.

Published in: on February 23, 2005 at 5:03 am Leave a Comment

If you think your committee is rough on you…

From Jim Holt’s New Yorker article on Einstein and Gödel:

Einstein’s conclusions were the product of pure thought, proceeding from the most austere assumptions about nature. In the century since he derived them, they have been precisely confirmed by experiment after experiment. Yet his June, 1905, paper on relativity was rejected when he submitted it as a dissertation. (He then submitted his April paper, on the size of atoms, which he thought would be less likely to startle the examiners; they accepted it only after he added one sentence to meet the length threshold.)

The article dwells at length on Gödel’s interpretation of time (something I had mentioned a while ago). Read the whole thing: it’s fascinating.

(HT: Libertarianoid)

Tags: , , ,

Published in: on February 22, 2005 at 8:30 pm Leave a Comment

A great new resource for molecular modelling

From WorldChanging:

ZINC — which, in the free software tradition of recursive acronyms, stands for ZINC Is Not Commercial — is a free database of compounds for “virtual screening.” That is, ZINC provides 3D models of chemical compounds in a standard “docking” format used in chemistry and biochemistry software, allowing researchers to assemble and test new chemical compositions on their computers.

This is really great. When I was doing my Ph.D, I mucked around with molecular structures for docking and matching purposes, and I had accces only to structures that our collaborators at Pfizer provided us. Having such a large database to play with will really help geometry folks who do molecular modelling, not to mention all the medical researchers who probably already use this.

Published in: on February 21, 2005 at 10:34 pm Leave a Comment

Things we take for granted..

Richard Hofstadter’s classic book Anti-intellectualism in American Life is a fascinating sweep through American history that provides a good answer for those who wail ‘How could someone as (apparently) stupid as Bush get elected ?’. Another book, The Development of Academic Freedom in the United States with Walter P. Metzger, should be at the top of my list of reading, based solely on this review by Scott McLemee. An excerpt from the review (emphasis mine):

Condensing 500 pages into five paragraphs is a fool’s errand, but here goes anyway.

The belief that only the community of scholars has the final authority to determine what counts as valid research or permissible speech has deep roots in the history of the university, going all the way back to its origins in medieval Europe. But it was extremely slow to develop in colonial and antebellum America, which had few institutions of higher learning that were anything but outgrowths of religious denominations.

How we take certain things for granted…

Published in: on at 6:38 pm Leave a Comment