<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Geomblog &#187; research</title>
	<atom:link href="http://geomblog.wordpress.com/category/research/feed/" rel="self" type="application/rss+xml" />
	<link>http://geomblog.wordpress.com</link>
	<description>Ruminations on computational geometry, algorithms, theoretical computer science and life</description>
	<lastBuildDate>Wed, 05 Sep 2007 21:35:00 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='geomblog.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/c91decf9059ec331b1235299c98453a4?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>The Geomblog &#187; research</title>
		<link>http://geomblog.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://geomblog.wordpress.com/osd.xml" title="The Geomblog" />
		<item>
		<title>NP hardness of Euclidean k-median clustering</title>
		<link>http://geomblog.wordpress.com/2007/09/03/np-hardness-of-euclidean-k-median-clustering/</link>
		<comments>http://geomblog.wordpress.com/2007/09/03/np-hardness-of-euclidean-k-median-clustering/#comments</comments>
		<pubDate>Mon, 03 Sep 2007 21:49:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[clustering]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/09/03/np-hardness-of-euclidean-k-median-clustering/</guid>
		<description><![CDATA[Suppose you&#8217;re given a metric space (X, d) and a parameter k, and your goal is to find k &#8220;clusters&#8221; such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster &#8220;center&#8221; (a designated point).
This is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=819&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Suppose you&#8217;re given a metric space (X, d) and a parameter k, and your goal is to find k &#8220;clusters&#8221; such that the sum of cluster costs is minimized. Here, the cost of a cluster is the sum (over all points in the cluster) of their distance to the cluster &#8220;center&#8221; (a designated point).</p>
<p>This is the well-known k-median problem (which differs from the also popular k-means problem by the use of distances, rather than squares of distances). In a general metric space, the k-median problem is known to be NP-hard, as well as being hard to approximate to within arbitrary constant factor. The current best known approximation ratio for the k-median is <strike>4, due to <a href="http://citeseer.ist.psu.edu/250032.html">Charikar and Guha</a></strike> 3 + eps, via a local search heuristic due to <a href="http://citeseer.ist.psu.edu/cache/papers/cs/29674/http:zSzzSzwww.cse.iitd.ernet.inzSz%7ErohitkzSzresearchzSzlsearch-journal.pdf/arya02local.pdf">Arya, Garg, Khandekar, Meyerson, Munagala and Pandit</a> (thanks, Chandra).</p>
<p>If the underlying metric is the Euclidean metric, then the problem complexity changes: in fact a PTAS for the Euclidean k-median can be obtained, due to the results of <a href="http://doi.acm.org/10.1145/276698.276718">Arora, Raghavan and Rao</a>, and then <a href="http://citeseer.ist.psu.edu/690811.html">Kolliopoulos and Rao</a> (who provide an almost-linear time algorithm). But as far as I am aware, there is still no NP-hardness proof for the Euclidean k-median problem, and I&#8217;d be interested in knowing if I am wrong here.</p>
<p>Note that the related problem of Euclidean k-means is known to be NP-hard from an observation by <a href="http://citeseer.ist.psu.edu/drineas99clustering.html">Drineas, Frieze, Kannan, Vempala and Vinay</a>.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/819/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/819/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/819/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/819/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/819/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/819/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/819/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/819/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/819/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/819/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/819/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/819/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=819&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/09/03/np-hardness-of-euclidean-k-median-clustering/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>&quot;Data Mining&quot; = voodoo science ?</title>
		<link>http://geomblog.wordpress.com/2007/09/03/data-mining-voodoo-science/</link>
		<comments>http://geomblog.wordpress.com/2007/09/03/data-mining-voodoo-science/#comments</comments>
		<pubDate>Mon, 03 Sep 2007 08:56:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[data-mining]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/09/03/data-mining-voodoo-science/</guid>
		<description><![CDATA[On the Statistical Modeling blog, Aleks Jakulin has a rant on the virtues of data mining:

I view data analysis as summarization: use the machine to work with large quantities of data that would otherwise be hard to deal with by hand. I am also curious about what would the data suggest, and open to suggestions. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=816&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>On the Statistical Modeling blog, Aleks Jakulin has a <a href="http://www.stat.columbia.edu/%7Ecook/movabletype/archives/2007/08/a_rant_on_the_v.html">rant on the virtues of data mining</a>:
</p>
<blockquote><p>I view data analysis as summarization: use the machine to work with large quantities of data that would otherwise be hard to deal with by hand. I am also curious about what would the data suggest, and open to suggestions. Automated model selection can be used to list a few hypotheses that stick out of the crowd: I was not using model selection to select anything, but merely to be able to quantify how much a hypothesis sticks out from the morass of the null. </p>
<p>  The response from several social scientists has been rather unappreciative along the following lines: &#8220;Where is your hypothesis? What you&#8217;re doing isn&#8217;t science! You&#8217;re doing DATA MINING !&#8221;</p></blockquote>
<p>I had almost the same reaction a while back when I was visiting JPL: the climatologists there were horrified at the idea of trolling for patterns in climate data, and to the person, asked me the dreaded &#8216;But what is the science question?&#8221; question. Of course, given the general hot-potato-ness of climatology right now, one might sympathize with their skittishness.</p>
<p>Data mining is a tricky area to work in, and <a href="http://geomblog.blogspot.com/2007/01/puzzles-and-mysteries.html">I&#8217;ve discussed this problem earlier</a>. It&#8217;s a veritable treasure-chest of rich algorithmic problems, especially in high dimensional geometry, and especially over large data sets. However, it&#8217;s often difficult to get a sense of forward progress, especially since the underlying analysis questions often seem like elaborate fishing expeditions.</p>
<p>In that context, the distinction Aleks makes between confirmatory data analysis (check if the data validates or invalidates a hypothesis) and exploratory data analysis (play with the data to create a non-uniform distribution on plausible hypotheses) is quite helpful. It also emphasizes the interactive and very visual nature of data mining; interactive tools and visualizations are as important as the underlying analysis tools as well.</p>
<p><span style="font-weight:bold;">Update</span>: <a href="http://www.columbia.edu/%7Echw2/">Chris Wiggins</a> points me to some of the earlier references to &#8216;data mining&#8217;. One of the most vituperative is a <a href="http://links.jstor.org/sici?sici=0034-6535%28198302%2965%3A1%3C1%3ADM%3E2.0.CO%3B2-7">paper by Michael Lovell</a> in 1983 in The Review of Economics And Statistics. This paper drips with scorn for &#8216;data miners&#8217;, but makes a point that is at the very least worthy of consideration: namely that because of the large dimensionality of the space of hypotheses that a data mining application typically explores (here couched in terms of explanatory variables for a regression), patterns with apparently high p-values might not actually be that significant (or stated another way, in high dimensional spaces, there are many seemingly rare patterns that aren&#8217;t that rare).</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/816/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/816/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/816/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/816/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/816/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/816/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/816/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/816/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/816/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/816/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/816/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/816/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=816&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/09/03/data-mining-voodoo-science/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>Streaming Summer School Report</title>
		<link>http://geomblog.wordpress.com/2007/08/28/streaming-summer-school-report/</link>
		<comments>http://geomblog.wordpress.com/2007/08/28/streaming-summer-school-report/#comments</comments>
		<pubDate>Tue, 28 Aug 2007 20:08:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[community]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/08/28/streaming-summer-school-report/</guid>
		<description><![CDATA[(Ed: This post is by Piotr Indyk, who is always willing to be your near neighbor)
Greetings from the Kingdom of Denmark! The country of Vikings, meatballs, and football teams that just refuse to win, has hosted the Summer School on Data Stream Algorithms last week (August 20-23). The school was organized under the banner of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=813&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>(<span style="font-style:italic;">Ed: This post is by Piotr Indyk, who is always willing to be your near neighbor</span>)</p>
<p>Greetings from the Kingdom of Denmark! The country of Vikings, <a href="http://en.wikipedia.org/wiki/Frikadeller">meatballs</a>, and <a href="http://uk.eurosport.yahoo.com/27082007/58/international-ireland-romp-win-denmark.html">football teams that just refuse to win</a>, has hosted the <a href="http://www.madalgo.au.dk/html_sider/2_5_Events/SS2007/FrontPage_SS2007.html">Summer School on Data Stream Algorithms </a>last week (August 20-23). The school was organized under the banner of <a href="http://www.madalgo.au.dk/">MADALGO,</a> a new research center dedicated to MAssive DAta ALGOrithms, set up in Aarhus University. The <a href="http://www.madalgo.au.dk/html_sider/2_5_Events/OpeningEvent2007/OpeningEvent_2007.html">inauguration ceremony</a> for the center took place on August 24, with several people giving invited lectures.</p>
<p><a href="http://mysliceofpizza.blogspot.com/">Muthu</a> (one of the invited lecturers) has covered the inauguration ceremony, so I will skip the detailed description. Suffices to say, it was a pleasure to see that the Danish Research Foundation (or as the locals like to say, Grundforskningsfond) is eager <span style="font-family:times new roman;"></span><span style="font-family:times new roman;font-size:10px;"></span><span style="font-family:times new roman;"></span>to support an algorithmic research center with a budget of roughly $10M over 5 years, while its US counterpart spends about $7M per year for the entire Theory of Computing program. Did I mention that the population of Denmark is roughly 2% of that of US ?</p>
<p>Anyway, back to the summer school. We had 70+ participants altogether, including 5 lecturers. The school covered the following topics:
<ul>
<li>The dynamic <a href="http://www.cis.upenn.edu/%7Esudipto/">Sudipto Guha </a>gave two lectures. The first lecture was on algorithms for clustering. Massive amounts of data were clustered, including metric data, graph data, and a few careless participants sitting in the first row. In the second lecture, Sudipto covered the &#8220;random stream model&#8221;, where the elements are assumed to be arriving in a random order, which circumvents the usual worst-case paranoia.</li>
</ul>
<ul>
<li>The twin duo of <a href="http://www.almaden.ibm.com/cs/people/jayram/">T.S. Jayram </a>and <a href="http://research.yahoo.com/%7Eravikumar">Ravi Kumar</a> covered lower bounds: communication complexity, information complexity, and generally &#8220;everything you wanted to know but were afraid to ask&#8221;. It was the first time I have seen the details of the linear-space lower bound for estimating the L_infty distance, and I am happy to report that I understood everything, or at least that is what I thought at the time.  Jayram and Ravi have also occasionally ventured into the land of upper bounds, covering algorithms for the longest increasing sequences and probabilistic data streams.</li>
</ul>
<ul>
<li>The scholarly <a href="http://www.eecs.umich.edu/%7Emartinjs/">Martin Strauss</a> gave an overview of the algorithms for finding frequent elements, heavy hitters (sometimes on steroids) and their more recent versions used in <a href="http://www.dsp.ece.rice.edu/cs/">compressed sensing</a>.</li>
</ul>
<ul>
<li>I have covered the basic upper bounds for  the L_p norm/frequency moments estimation, as well as the algorithms for geometric data (clustering, MST, matching), notably those based on core-sets. The latter topic was originally supposed to be covered by <a href="http://valis.cs.uiuc.edu/%7Esariel/">Sariel Har-Peled</a>; however, the <strike> dark forces </strike>highly enlighted and insightful geniuses of the INS [Sariel's corrections] have jeopardized his plans. I guess the force was not strong enough with this one&#8230;</li>
</ul>
<p>We also had an open problem session. Some of the problems were copy-pasted from the <a href="http://www.cse.iitk.ac.in/users/sganguly/data-stream-probs.pdf">&#8220;Kanpur list&#8221;</a>, but a few new problems were posed as well. The list will be available shortly on the school website, so sharpen your pencils, prepare your napkins, pour some coffee, and &#8230; give all of this to your students!</p>
<p>The lecture slides are also <a href="http://www.madalgo.au.dk/html_sider/2_5_Events/SS2007/Course_material.html">available on-line</a>. If you spot any typos, let the lecturers know.</p>
<p>Overall, I think the school has been a success, perhaps with the notable exception of the weather: it started to rain promptly after the school has began, and it stopped when the school has ended.  One has to admire the timing though.</p>
<p><a href="http://geomblog.blogspot.com/2007/06/socg-2007-magic-hausdorff-lions-have.html">SOCG 2009 will be held in Aarhus</a>. See you then!</p>
<p>(<span style="font-style:italic;">Ed: But what about the beer report ?</span>)</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/813/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/813/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/813/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/813/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/813/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/813/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/813/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/813/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/813/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/813/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/813/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/813/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=813&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/08/28/streaming-summer-school-report/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>Things that make you pull your hair out in despair</title>
		<link>http://geomblog.wordpress.com/2007/08/06/things-that-make-you-pull-your-hair-out-in-despair/</link>
		<comments>http://geomblog.wordpress.com/2007/08/06/things-that-make-you-pull-your-hair-out-in-despair/#comments</comments>
		<pubDate>Mon, 06 Aug 2007 06:40:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[miscellaneous]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/08/06/things-that-make-you-pull-your-hair-out-in-despair/</guid>
		<description><![CDATA[I was recently at AT&#38;T visiting for a few weeks, and I was lucky enough to catch a talk by Amit Chakrabarti on lower bounds for multi-player pointer jumping. A complexity class that figured prominently in his talk was the class ACC0, which consists of constant depth, unbounded fanin, poly sized circuits with AND, OR, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=809&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I was recently at AT&amp;T visiting for a few weeks, and I was lucky enough to catch a talk by <a href="http://www.cs.dartmouth.edu/%7Eac/">Amit Chakrabarti</a> on <a href="http://www.cs.dartmouth.edu/%7Eac/Pubs/ccc07-mpj.pdf">lower bounds for multi-player pointer jumping</a>. A complexity class that figured prominently in his talk was the class <a href="http://qwiki.caltech.edu/wiki/Complexity_Zoo#acc0">ACC<sup>0</sup></a>, which consists of constant depth, unbounded fanin, poly sized circuits with AND, OR, NOT and MOD m gates, for all m.</p>
<p>Suppose we make our life simple by fixing m to be a <span style="font-style:italic;">single prime</span>, like 3. It turns out that the corresponding class <a href="http://qwiki.caltech.edu/wiki/Complexity_Zoo#ac0m">AC<sup>0</sup>[m]</a> can be contained strictly within <a href="http://qwiki.caltech.edu/wiki/Complexity_Zoo#nc1">NC<sup>1</sup></a>: this arises from results in the 80s by <a href="http://qwiki.caltech.edu/wiki/Zooref#raz87">Razborov</a> and <a href="http://qwiki.caltech.edu/wiki/Zooref#smo87">Smolensky</a>. Now suppose that instead of picking a prime integer, we choose a number like 6, which is the product of distinct primes. We do not even know whether AC<sup>0</sup>[6] = <a href="http://qwiki.caltech.edu/wiki/Complexity_Zoo#np">NP</a> or not !</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/809/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/809/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/809/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/809/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/809/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=809&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/08/06/things-that-make-you-pull-your-hair-out-in-despair/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>Saving lives with exact algorithms</title>
		<link>http://geomblog.wordpress.com/2007/08/01/saving-lives-with-exact-algorithms/</link>
		<comments>http://geomblog.wordpress.com/2007/08/01/saving-lives-with-exact-algorithms/#comments</comments>
		<pubDate>Wed, 01 Aug 2007 22:03:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/08/01/saving-lives-with-exact-algorithms/</guid>
		<description><![CDATA[It&#8217;s not often you get to say this in a paper:
We aim for exact algorithms [because] &#8230; any loss of optimality could lead to unnecessary patient deaths.
Anyone who&#8217;s gone through an algorithms class will at some point hear about stable marriage algorithms, and how the method is used to match hospitals and newly minted MDs [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=808&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>It&#8217;s not often you get to say this in a paper:<br />
<blockquote>We aim for exact algorithms [because] &#8230; any loss of optimality could lead to unnecessary patient deaths.</p></blockquote>
<p>Anyone who&#8217;s gone through an algorithms class will at some point hear about <a href="http://en.wikipedia.org/wiki/Stable_marriage_problem">stable marriage algorithms</a>, and how the method is used to <a href="http://www.nrmp.org/res_match/about_res/algorithms.html">match hospitals and newly minted MDs</a> looking for residencies.</p>
<p>Consider now the far more serious problem of matching kidney transplant candidates to potential donors.  Because transplant lists are long, and cadaver donors are few,  <a href="http://www.matchingdonors.com/life/index.cfm">marketplaces</a> matching healthy donors to recipients <a href="http://www.paireddonation.org/">have sprung up in the US</a>. For complicated ethical reasons (which are not without <a href="http://www.vpostrel.com/weblog/archives/002165.html">controversy</a>),  such  exchanges are not made for money, and are viewed as gifts.</p>
<p>So what happens if a donor kidney doesn&#8217;t match the potential recipient ? Ordinarily, nothing. Suppose however that there was another donor-recipient pair with a similar mismatch, and if only the donors were swapped, both transplants could go through ? What about if a 3-way cycle of matchings could be found ? This is called a &#8216;market clearing&#8217;, and is the subject of a <a href="http://www.cs.cmu.edu/%7Edabraham/papers/abs07.pdf">paper by Abraham, Blum and Sandholm</a> to appear in <a href="http://stiet.si.umich.edu/ec07/">EC</a>.</p>
<p>I&#8217;ll get into the problem statement in a second. What&#8217;s far more important is that <span style="font-style:italic;">the results of this paper have already been used to find potential transplant matches where none existed</span> ! The <a href="http://www.paireddonation.org/">Alliance For Pair Donation</a> has been using this method <a href="http://www.cmu.edu/news/archive/2007/June/june11_kidney.shtml">since last December</a> for finding potential matches, and has already found matches that prior methods would have missed. This is an incredible achievement: working on a problem abstracted from a real life-or-death scenario, and actually taking the results back to the source and making a difference.</p>
<p>Technically, the problem is easily stated. Given a directed graph with weights on edges, and a parameter L, find a maximum weight collection of disjoint cycles, each of length at most L. Vertices are agents with items (in this case, transplant recipients with donors). The weight on an edge represents the utility to the source of obtaining the sink&#8217;s item (a donor transfer). The L-constraint reflects reality, in that all such transplants would have to be performed simultaneously (to ensure that all donors go through), and it&#8217;s not feasible to perform more than a few (typically 3) of these transplants simultaneously.</p>
<p>The line I quoted at the beginning of the post comes as part of an explanation as to why they want exact algorithms for the problem (it&#8217;s NP-hard for L &gt;= 3). The technical contributions include finding a way to run an integer-linear program at scale for graphs with hundreds of thousands of nodes.</p>
<p>(Via <a href="http://www.nsf.gov/news/newsletter/jul_07/index.jsp">NSF Current</a>)</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/808/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/808/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/808/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/808/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/808/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/808/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/808/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/808/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=808&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/08/01/saving-lives-with-exact-algorithms/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>A counting gem</title>
		<link>http://geomblog.wordpress.com/2007/07/26/a-counting-gem/</link>
		<comments>http://geomblog.wordpress.com/2007/07/26/a-counting-gem/#comments</comments>
		<pubDate>Thu, 26 Jul 2007 18:10:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/07/26/a-counting-gem/</guid>
		<description><![CDATA[Consider the following puzzle:
Given 2n items, determine whether a majority element (one occuring n+1 times) exists. You are allowed one pass over the data (which means you can read the elements in sequence, and that&#8217;s it), and exactly TWO  units of working storage.
The solution is incredibly elegant, and dates back to the early 80s. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=806&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Consider the following puzzle:<br />
<blockquote>Given 2n items, determine whether a majority element (one occuring n+1 times) exists. You are allowed one pass over the data (which means you can read the elements in sequence, and that&#8217;s it), and exactly TWO  units of working storage.</p></blockquote>
<p>The solution is incredibly elegant, and dates back to the early 80s. I&#8217;ll post the answer in the comments tomorrow, if it hasn&#8217;t been posted already.</p>
<p><span style="font-weight:bold;">Update</span>:  A technical point: the problem is <span style="font-style:italic;">a promise problem</span>, in that you are promised that such an element exists. Or, in the non-promise interpretation, you are not required to return anything reliable if the input does not contain a majority element.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/806/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/806/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/806/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/806/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/806/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=806&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/07/26/a-counting-gem/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>FOCS 07 list out</title>
		<link>http://geomblog.wordpress.com/2007/07/04/focs-07-list-out/</link>
		<comments>http://geomblog.wordpress.com/2007/07/04/focs-07-list-out/#comments</comments>
		<pubDate>Wed, 04 Jul 2007 17:56:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[focs]]></category>
		<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/07/04/focs-07-list-out/</guid>
		<description><![CDATA[The list is here. As I was instructed by my source, let the annual &#8220;roasting of the PC members&#8221; begin !
63 papers were accepted, and on a first look there appears to be a nice mix of topics: it doesn&#8217;t seem as if any one area stands out. Not many papers are online though, from [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=796&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The list is <a href="http://focs2007.org/accepted.txt">here</a>. As I was instructed by my source, let the annual &#8220;roasting of the PC members&#8221; begin !</p>
<p>63 papers were accepted, and on a first look there appears to be a nice mix of topics: it doesn&#8217;t seem as if any one area stands out. Not many papers are online though, from my cursory random sample, so any informed commenting on the papers will have to wait. People who know more about any of the papers are free to comment (even if you&#8217;re the author!). Does anyone know the number of submissions this year ? I heard it was quite high.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/796/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/796/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/796/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/796/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/796/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=796&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/07/04/focs-07-list-out/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>SoCG 2007: Approximate Clustering</title>
		<link>http://geomblog.wordpress.com/2007/06/07/socg-2007-approximate-clustering/</link>
		<comments>http://geomblog.wordpress.com/2007/06/07/socg-2007-approximate-clustering/#comments</comments>
		<pubDate>Thu, 07 Jun 2007 03:01:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[socg]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/06/07/socg-2007-approximate-clustering/</guid>
		<description><![CDATA[I was listening to a couple of talks that improve known bounds for various kinds of approximate clustering in high dimensions, and I got to thinking.
One of the revolutions in geometry over the last 10 years has been the development of nontrivial tools for dealing with approximations in high dimensions. This is of course necessitated [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=792&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I was listening to a couple of talks that improve known bounds for various kinds of approximate clustering in high dimensions, and I got to thinking.</p>
<p>One of the revolutions in geometry over the last 10 years has been the development of nontrivial tools for dealing with approximations in high dimensions. This is of course necessitated by the curse of dimensionality, and the hardness of most high-D data analysis problems (most exact solutions are exponential in the dimension).  So rather than computing the optimal solution to some problem on n points in d dimensions, we compute a (1+e)-approximation to the optimal solution.</p>
<p>One problem this creates is that every algorithm is now described by a complicated expression involving three parameters (n, d, e). Some algorithms are exponential in 1/e, but polynomial in d. Some are poly in 1/e, and exponential in d. The exponent could have a base of n, or 2, or even d, or 1/e.</p>
<p>In short, it&#8217;s an unholy mess of strictly incomparable methods that tradeoff different parameters against each other.</p>
<p>This makes life for me, the &#8220;user&#8221; of approximation technology, rather difficult. What I&#8217;d really like to understand are the gadgets that transform one kind of bound to another (and there are many such gadgets: discretization, enumeration, random sampling, etc). But it&#8217;s hard to gather these from the papers directly: these gadgets (the really useful tools) are buried deep inside lemmas, and inside people&#8217;s heads.</p>
<p>What I&#8217;d like to see is some kind of &#8220;taxonomization&#8221; or classification of the different &#8220;tricks of the trade&#8221; in high-dimensional geometric approximation, with some sense of which techniques apply when, and why. In fact, I like this idea so much I might even try to run a seminar on it..</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/792/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/792/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/792/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/792/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/792/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=792&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/06/07/socg-2007-approximate-clustering/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>SoCG 2007: Geometric Views of Learning</title>
		<link>http://geomblog.wordpress.com/2007/06/07/socg-2007-geometric-views-of-learning/</link>
		<comments>http://geomblog.wordpress.com/2007/06/07/socg-2007-geometric-views-of-learning/#comments</comments>
		<pubDate>Thu, 07 Jun 2007 01:40:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[socg]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/06/07/socg-2007-geometric-views-of-learning/</guid>
		<description><![CDATA[I&#8217;ve been fairly remiss in my SoCG blogging; I&#8217;ll blame it on having session chair duties, and not wanting to lug my laptop around.
The invited talk was by Partha Niyogi from the University of Chicago on &#8216;A Geometric Perspective on Machine Learning&#8217;. You may remember his work from my earlier post on the estimation of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=790&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve been fairly remiss in my SoCG blogging; I&#8217;ll blame it on having session chair duties, and not wanting to lug my laptop around.</p>
<p>The invited talk was by <a href="http://people.cs.uchicago.edu/%7Eniyogi/">Partha Niyogi</a> from the University of Chicago on &#8216;A Geometric Perspective on Machine Learning&#8217;. You may remember his work from my earlier post on the <a href="http://geomblog.blogspot.com/2007/04/estimating-surface-area-of-polytope.html">estimation of the surface area of a convex body</a> (read the comments). More importantly, he is part of the group that developed a method known as <a href="http://people.cs.uchicago.edu/%7Eniyogi/papersps/BNnips01.pdf">Laplacian Eigenmaps</a> for learning a manifold from a collection of data.</p>
<p>Manifold learning is a new set of problems in machine learning that has interesting connections to algorithms and geometry. The basic problem is as follows. Given a collection of (unlabelled) data inhabiting some high dimensional space, can you determine whether they actually lie on some lower dimensional manifold in this space ?</p>
<p>Why do we care ? The problem with any data analysis problem in high dimensionality is the rather poetically named, &#8216;curse of dimensionality&#8217; which basically says that any interesting data analysis algorithm runs in time  exponential in the dimension of the data.  For data that lives in 100s of dimensions, this is  rather bad news.</p>
<p>However, &#8220;data that lives in 100 dimensions&#8221; is really an artifact of the way we represent data, slapping on dimensions willy-nilly for every attribute that might be relevant. What one often expects is that data doesn&#8217;t really lie in 100 dimensions, but in some lower dimensional manifold of it. A beautiful example of this was given by Partha in his talk: he described the problem of inferring a sound wave signal generated at one of a tube by listening in at the other hand. By Fourier analysis, you can think of both signals as living in an infinite dimensional space, but suppose we now vary the tube length, for a fixed input signal. Then the output signal varies smoothly along a curve (i.e a 1-d manifold) in this infinite dimensional space.</p>
<p>&#8220;So what ?&#8221;  one might ask.  The problem is that the standard method of doing data analysis is to translate the problem of interest into some property of the distances between points in the high dimensional space. If the data really lies on some kind of curved surface, then the &#8220;distance in ambient space&#8221; does not reflect the true structure of the data. What we really need is &#8220;distance along the manifold&#8221;, which we could do if we could reconstruct the manifold !</p>
<p>The key idea of the Laplacian Eigenmaps work is this: If you set up an appropriate weighted graph on the data points (where each edge has a weight that is exponentially related to the distance between the points) and compute the Laplacian of this graph, you get a approximation that converges (as the data size increases) to the Laplacian of the underlying manifold !! This assumes that that the data was sampled uniformly (or mostly uniformly) from the manifold. Moreover, the convergence happens at a rate that depends only on the dimension of the manifold, rather than the dimension of ambient space.</p>
<p>There are many ramifications of this idea, that connect to shape modelling, homology, and even volume estimation. But it reinforces the idea of the Laplacian as a key geometric construct that can be modelled combinatorially in a consistent way.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/790/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/790/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/790/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/790/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/790/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/790/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/790/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/790/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=790&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/06/07/socg-2007-geometric-views-of-learning/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
		<item>
		<title>Faure sequences and quasi-Monte Carlo methods</title>
		<link>http://geomblog.wordpress.com/2007/05/08/faure-sequences-and-quasi-monte-carlo-methods/</link>
		<comments>http://geomblog.wordpress.com/2007/05/08/faure-sequences-and-quasi-monte-carlo-methods/#comments</comments>
		<pubDate>Tue, 08 May 2007 18:10:00 +0000</pubDate>
		<dc:creator>geomblog</dc:creator>
				<category><![CDATA[research]]></category>

		<guid isPermaLink="false">http://geomblog.wordpress.com/2007/05/08/faure-sequences-and-quasi-monte-carlo-methods/</guid>
		<description><![CDATA[Fix a number n, and consider the following recursive procedure to construct a permutation $\sigma(n)$ of the numbers 0..n-1. We&#8217;ll view the permutation as sequence, so the permutation (0 2 1) maps 0 to 0, 1 to 2 and 2 to 1. When convenient, we&#8217;ll also treat the sequences as vectors, so we can do [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=781&subd=geomblog&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Fix a number n, and consider the following recursive procedure to construct a permutation $\sigma(n)$ of the numbers 0..n-1. We&#8217;ll view the permutation as sequence, so the permutation (0 2 1) maps 0 to 0, 1 to 2 and 2 to 1. When convenient, we&#8217;ll also treat the sequences as vectors, so we can do arithmetic on them.
<ol>
<li>If n is even, then $\sigma(n) = s_1 \circ s_2$, where $s_1 = 2\sigma(n/2), s_2 = 2\sigma(n/2)+1$.</p>
<p>For example, $\sigma(2) = (0 1)$. $\sigma(4) = (0 2) \circ (1 3) = (0 2 1 3)$</li>
<li>If n is odd, then $\sigma(n) = s_1 \circ (n-1)/2 \circ s_2$, where $s_1$ and $s_2$ are constructed by taking the sequence for $\sigma(n-1)$, incrementing all elements that are at least $(n-1)/2$, and then splitting the sequence into two parts of equal length.
<p>For example, $\sigma(4) = (0 2 1 3)$. Let n = 5. Then incrementing all elements at least 2 gives us $(0 3 1 4)$. Splitting and inserting (2) gives us $\sigma(5) = (0 3 2 1 4)$</li>
</ol>
<p>Here&#8217;s the question: given n and j, can we return the jth entry of $\sigma(n)$ using fewer than $\log(n)$ operations ? Note that writing down $j$ takes $\log n$ bits already, so the question is couched in terms of operations. For reasons explained later, an amortized bound is not useful (i.e we can&#8217;t compute the entire sequence first and then pick off elements in constant time)</p>
<p>For the backstory of where this sequence, known as a <a href="http://www.puc-rio.br/marco.ind/quasi_mc.html#Halton">Faure sequence</a>, comes from, read on&#8230;</p>
<p>One of the applications of geometric discrepancy is in what are called quasi-Monte Carlo methods. If you want to estimate the integral of some complicated function over a domain, (<a href="http://graphics.uni-ulm.de/CourseNotesSIG.pdf">for simulation purposes for example</a>), one way of doing this is to sample at random from the domain, and use the function evals at the sampled points to give an estimate of the integral. Typically, the function is expensive to evaluate as well, so you don&#8217;t want to pick too many points.</p>
<p>Of course in practice, you&#8217;re picking random points using a random number generator that is itself based on some pseudorandom generation process. An important area of study, called quasi-Monte Carlo methods, asks if this middleman can be eliminated. In other words, is it possible to generate a <span style="font-style:italic;">deterministic</span> set of points that suffices to approximate the integration ? This is of course the question of finding a low discrepancy set, something that we&#8217;ve used in computational geometry to derandomized geometric algorithms (especially those based on $\epsilon$-net constructions).</p>
<p>There are many ways of constructing good sequences, and a good overview can be found in Chazelle&#8217;s <a href="http://www.amazon.com/Discrepancy-Method-Bernard-Chazelle/dp/0521003571">discrepancy book</a> (<a href="http://www.cs.princeton.edu/%7Echazelle/pubs/book.pdf">read it free here</a>). One of the more well known is due to Halton, and works by what is called radical inversion: given a base p, write down the numbers from 1 to n in base p, and then reverse the digits, and map back to a number less than one. For example, using a base of 2, we can write 6 as 110, which reversed becomes 011, or after adjoining a decimal point, 0.011 = 3/8. This specific sequence, using base 2, is called a <a href="http://mathworld.wolfram.com/vanderCorputSequence.html">van der Corput sequence</a> and is also used to construct a <a href="http://mathworld.wolfram.com/HammersleyPointSet.html">Hammersley point set</a>. For sampling in d-dimensional space, the trick is to let the jth coordinate of the ith point be the ith entry in a Halton sequence using some base $p_j$ (usually a prime).</p>
<p>It can be shown that these sequences have low discrepancy; however, they can have &#8220;aliasing&#8221; problems, or repeated patterns, if the primes are not large enough. One way around this problem is b observing that if we ignore the decimal point, all we&#8217;re really doing is constructing a permutation of the numbers from 0 to $p^d-1$, (for a d-digit number in base p). Thus, if we could scramble the permutation further, this might allow us to preserve the discrepancy properties without the annoying aliasing effects. The process described above is a well known scramble called the Faure sequence. It maintains the desired properties of the sequence, and is quite popular in the quasi-MC community.</p>
<p>Note that if we were to preprocess all needed sequences, we&#8217;d need n*d space, where n is the number of samples, and d is the dimension of the space. This is not desirable for large dimensional sampling problems, and hence the question about the direct evaluation of coordinates.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/geomblog.wordpress.com/781/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/geomblog.wordpress.com/781/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/geomblog.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/geomblog.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/geomblog.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/geomblog.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/geomblog.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/geomblog.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/geomblog.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/geomblog.wordpress.com/781/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/geomblog.wordpress.com/781/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/geomblog.wordpress.com/781/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=geomblog.wordpress.com&blog=1016853&post=781&subd=geomblog&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://geomblog.wordpress.com/2007/05/08/faure-sequences-and-quasi-monte-carlo-methods/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6537c0a681d22d4a3f7bf4ce7d209a0f?s=96&#38;d=identicon" medium="image">
			<media:title type="html">geomblog</media:title>
		</media:content>
	</item>
	</channel>
</rss>