Find a friend


Orkut logo

From Greg Linden's excellent blog post, I ran across this Google paper ( Evaluating Similarity Measures - A Large Scale Study in the Orkut social network [PDF]) detailing how Orkut generates 'related communities'.

The actual findings of the paper do not hold much surprises. Cosine distance works as it should and it doesn't seem to penalize large communities much. By 'penalizing', I mean that if community A or B is real large, there is lesser chance of them showing up as 'related' though they may have Orkut members in both communities.

The paper makes some interesting observations on how much importance to give to whether a user joins a community when he views it by clicking on it in the 'related communities' section. In retrospect, this is pretty obvious - users are shy of joining communities which may 'embarass' them somehow (since all users can see what communities a user is part of).

I'm surprised that Google didn't go the extra step and try and find 'related people' using a similar measure. It would be fun to get a list of people you might be interested in talking to where 'potential interest' is determined by common communities, shared friends, degrees of separation,etc. A true 'find a friend' service. In fact, I remember Orkut's newsletter having something like this long ago - I wonder why they scrapped it.

On a tangential note, I once used the Del.icio.us APIs to try and build a recommendation system ('If you liked this page A, you may also like this page B since other folks who liked A - liked B too'). Unlike Orkut, del.icio.us has too much of a geek/early-adopter leaning. Maybe they will get a healthy influx of normal people due to their Yahoo acquisition.

Btw, if any of you find this stuff interesting, do check out George Karypis' paper - 'Evaluation of Item-based Top-N Recommendation Algorithms'.

P.S Orkut must be only Google service to run on Asp.Net (Writely doesn't count as it is an acquisition). However, I'm not sure whether such a donut-starved server is a good advertisement for Scott Guthrie's team :-)

P.P.S Before someone points it out, yes, I'm a Microsoft employee running a blog powered by Google's Blogger which is hosted on a Linux/Apache webhost. Why? 'Coz a friend already had some space which I could share easily. And Wordpress, the only other option I investigated seriously,had one thing I didn't like - it didn't generate static html pages. I know other blog engines do but I was too lazy to go use one of them. So shoot me :-)




Comments:
Actually, the set of algorithms and factors for collaborative filtering is present in this paper co-authored by Karypis at http://www10.org/cdrom/papers/519/index.html
Going off-topic, I found RACOFI - a system to find related music pretty fascinating (http://www.daniel-lemire.com/fr/abstracts/COLA2003.html)
 
hi Sriram
I am curious about orkut linking a friend algorithm. its amazing that thay find possible llink between two profle via friends of each other. in theory it is a NP-complete problem. do you have any idea how orkut does that
thanks
parbati
 
Hi Sriram, ur article was cool. But have u ever wondered why orkut has provisions for community owners to add their own "related communities"? I have also come across few communities without any "related communities". Do u have any ideas about this issue?? Anyway iam not a greatest of investigators. This was just a thought which arised while reading ur article.
Thanx,
Pradeep...
 
hi..i am dipti chowdhury.i find a friend who is really good
 
Post a Comment



<< Home

Archives

November 2004   January 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   May 2007   June 2007   July 2007   August 2007   September 2007   October 2007   December 2007   January 2008   February 2008   March 2008   April 2008   May 2008