I think we need to work on the time complexity of the clustering algorithm.

The current one is of O(n) and works quite well for small datasets but for large ones, we need to work on one that is of O(log n). Clustering via SQL is also a viable option especially if the database server is dedicated and has high speed disks. As for the Hiphop PHP, perhaps it's time for a functional programming stack for PHP to get it "lambdanized"

My 2 cents, 1 penny and a carrot.

On Tue, Dec 28, 2010 at 11:35 AM, Milo van der Linden <milo@...> wrote:
There is also another thing that might be interesting to investigate;
clustering through SQL group by rounding coordinates f.i. to whole
degrees.

For instance: Say you know the viewport is approx 10 by 10 degrees,
grouping by degree would create a cluster grid of 10x10 cluster
points. Doing this directly on the database is probably the fastest.
Of course this will only work well when combined with a bounding box.

It would be good to work out some demo queries. I will digg in to this
in january, currently I am extremely busy getting several budgets
allocated for 2011, january will be a month that gives me more
programming time.


2010/12/28 David Kobia <david@...>:
> This is an issue I've been trying to fix for some time too, and is
> especially noticeable on deployments with +10K reports or on deployments
> with a few thousand points but on shared hosting. The way the clustering
> algorithm works is that it cycles through all the available clusters or
> points each time to determine what to group together. As you zoom in, the
> number of clusters increases even if they're well outside the viewport, and
> the more reports there are in the system, you'll notice the system start to
> take major hits as it performs the clustering calculation on higher zoom
> levels. You can see this with my own deployment at http://crime.mapatl.com
> which has +30K points.
>
> Here are some ways of fixing this issue:
>
> Caching - Turn on caching in the admin (+v2.0). By default the system uses
> filecaching, so each zoom level for each category will create an individual
> json file. This means your system won't have to perform the clustering
> calculation each time. If you can use Xcache or APC caching over File
> caching, you'll notice better performance. Increase the cache expiration to
> a day if possible or 'forever' in cases where the data doesn't really
> change. This way the calculation will only be done once.
> Restrict the calculation only to the viewport - this is written into the
> code but I disabled it because the calculation has to be repeated each time
> you pan (not just zoom) and can become quite cumbersome to the user -
> combined with the next point (cpu) it can be quite smooth.
> More CPU!! This is the easy and expensive way out. If you have a 100K
> reports, you can always throw more firepower at it and perform the
> clustering in record time.
> I'd been working on a python script which perhaps can perform the
> calculations faster than PHP. In my initial tests though, the performance
> wasn't all that much different - I'll put it up somewhere and let some
> python gurus figure it out. PHP functions are both its strength and weakness
> - in a looped calculation like this one, each function is called tens of
> thousands of times and that's what makes the cpu beg for mercy. Bottomline,
> the more points you add to the system, the greater the need for a lower
> level programming language - time for some HipHop PHP?
>
>
> David Kobia
> www.ushahidi.com
> Crowdsourcing Crisis Information
>
> Skype: dkobia
>
>
> On Mon, Dec 27, 2010 at 12:23 PM, Erik Hersman <erik@...> wrote:
>>
>> If anyone is online...
>> John McLear (@johnmclear)  is having an issue where he's got a large data
>> set and the closer he zooms in the longer and longer the points on the map
>> take to show up, even if there aren't that many on that particular map view.
>>  Anyone have an idea what this could be and how to fix it?
>> He was kind enough to make a video of it:
>> http://www.youtube.com/watch?v=Jdj6avFH0Zg
>> Erik Hersman
>>
>> www.ushahidi.com | www.iHub.co.ke
>> www.afrigadget.com | www.whiteafrican.com | @whiteafrican
>
>



~~~~~~~~~~~~~~~~~~~~~~~~~~
List Archive: http://list.ushahidi.com/

Would you like to receive list mail batched in a daily digest instead? Send a message to:
developers-digest-subscribe@...

To remove your address from the list, just send a message to
the address in the "List-Unsubscribe" header of any list
message. If you haven't changed addresses since subscribing,
you can also send a message to:
developers-unsubscribe@...

For addition or removal of addresses, we'll send a confirmation
message to that address. When you receive it, simply reply to it
to complete the transaction.

If you need to get in touch with the human owner of this list,
please send a message to:
developers-owner@...




--
Kind Regards,
Emmanuel Kala

Skype: emmanuel.kala

Judgement comes from experience, experience comes from poor judgement