Discover the best of the web!
Learn more about Digg by taking the tour.
Writing a Digg-Style Popularity Algorithm - From Scratch
igglo.co.uk — "Not so long ago I was tasked with creating a web site similar to Digg - users voted on records in the database, and those deemed popular were promoted to the front page. But how exactly does such an algorithm work? Let's take a look under the hood..."
- 922 diggs
- digg it
- bitt3n, on 10/12/2007, -2/+4hm.. reading this, I wonder if Digg takes into account clickthroughs to the story and to the comments page in addition to Digg votes. That might actually be more relevant than Diggs themselves. You could then rely on Bury to kill false stories (which would get incorrectly bumped up if you put significant stock in clickthrough rates).
- garreh, on 10/12/2007, -11/+26The main thing Digg needs to change is the commenting system. It is flawed.
Comments are ranked by Me like this and Me don't like this. Therefore if someone doesn't agree with what that person said, its an instant -1. Instead, comments should be ranked on their content, much like on Slashdot; Insightful, Funny, Troll, etc.
I think Digg is becoming a bit of a joke and I'm afraid to say it.. its because of troll PS3 vs 360 fanboys, Linux vs Windows fanboys, OSX vs.. need I say more?
Kevin Rose, listen up. - merreborn, on 10/12/2007, -3/+5"hm.. reading this, I wonder if Digg takes into account clickthroughs to the story and to the comments page in addition to Digg votes"
At first glance, I don't see any sort of script triggered by a clickthrough. Wouldn't that kind of defeat the concept of digging anyway? If I click through to the comments page/article, and *don't* digg the story, it means _I don't digg it_, so I sure as hell don't want that activity to promote the story! - tnwake, on 10/12/2007, -4/+7@garreh: "The main thing Digg needs to change is the commenting system. It is flawed.
Comments are ranked by Me like this and Me don't like this. Therefore if someone doesn't agree with what that person said, its an instant -1. Instead, comments should be ranked on their content, much like on Slashdot; Insightful, Funny, Troll, etc."
That's what makes digg a social news site instead of a admin moderated news site and that's why I like digg more than slashdot. If you don't like it.. hit F6 and type in slashdot.org and hit enter. - ilyag, on 10/12/2007, -0/+12Why can't you just go download the source code to Pligg:
http://sourceforge.net/projects/pligg
You can see how it works without spending time making this from scratch. - manageMyRights, on 10/12/2007, -1/+30Ignore the speculation. I've obtained the actual Digg algorithm. Here is it:
IF( article ==
Apple
'girl' in title
Wii
Stephen Colbert
Battlestar Galactica
critical of PS3
critical of Microsoft)
THEN
SendToFrontPage()
ELSE
Bury(). - vezquex, on 10/12/2007, -0/+1What you all forget is that the order of the frontpaged items on digg is static. So, I infer that as soon as a story crosses a threshold of votes (say 40) and frequency of votes (like 5 votes per minute in the last 3 minutes), then it is statically entered as the next front page article.
- garreh, on 10/12/2007, -11/+26The main thing Digg needs to change is the commenting system. It is flawed.
- sewalsh, on 10/12/2007, -11/+4you bone kevin rose.
- killerofkiller, on 10/12/2007, -3/+3http://duggmirror.com
- displaynone, on 10/12/2007, -4/+16Not so long ago I was tasked with creating a web site similar to Digg - users voted on records in the database, and those deemed “popular” were promoted to the front page. But how exactly does such an algorithm work? Let’s take a look under the hood…
Before we begin, a quick disclaimer: These are my own thoughts on such an algorithm, from scratch. I don’t know how Digg, Reddit and other such sites rank popular items, so this may be a quite different.
Records and Votes
We have a database record, whether it be a news story, picture, video, podcast, whatever. And we have voting, a method for users to place a single vote on a particular database record. We could order the “Popular” category by number of votes and be done with it.
Popularity = Votes
However records very quickly become stale, as new submissions are entered and the voting process begins again. If records from last year have 300 votes each, but popular records this year only have 100 or so, they won’t see the front page. So we need to look at time as a factor as well.
Records and Time
Let’s introduce the age of the record as another variable. If the record is newer it should have a higher prominence on the front page, yes?
Popularity = Votes / Record Age
The older the record, the more votes it requires to achieve popularity. But that’s not really fair - if a record takes a little while to receive votes, it doesn’t get the credit it deserves. This is especially a problem if your site doesn’t yet have enough traffic.
So what’s the solution? Let’s take a look at the age of each vote for the record.
Records and Vote Time
To keep the front page fresh, we can give more weight to votes placed recently. That way if a story is hasn’t yet received the credit it deserves, it’s still in for a chance if several users notice its value and vote accordingly. So let’s iterate through votes and calculate a popularity score:
Popularity = (V1/A1) + (V2/A2) + … + (Vn/An)
Vn is a vote, and An is the age of that vote (for example, in minutes). If a vote is 60 minutes old, it is worth 1/60th of a vote placed 1 minute ago. All the values of all the votes are added to achieve a popularity score.
This seems to solve the previous problem, but introduces a new one. If a single person votes on a record a year old, his vote will be worth more than 200 votes on a different record posted yesterday. Old material comes back to haunt the front page. So we’re close, but no cigar yet.
Let’s take a look again at the age of the record …
Records and Time and Vote Time
If we put together everything we’ve discussed so far, we get something that looks like this:
Popularity = [ (V1/A1) + (V2/A2) + … + (Vn/An) ] / Record Age
It’s a bit of a mouthful, but basically it adds together the weighted votes based on age, then divides that total by the age of the record. It doesn’t impose too much of a time limit on becoming popular, it dampens votes based on age, and prevents old stories from leaping back to the front page. It solves all our problems.
Dampening Popularity
Admission: In writing this article, I’ve discovered a more advanced algorithm than used previously on my project. Guess what I’m doing this evening?
But already I can see a problem - I think a dampening effect will need to be introduced to prevent wild jumps of increased popularity and back down again. I will update this post when I’ve had a chance to implement the new algorithm in the wild.
Other Variables
What other variables could we introduce to the algorithm?
* Number of page views - a form of popularity, but far less useful than voting
* Page views versus people who don’t vote - If 60% of readers vote, should it be more popular than if 10% vote? Remember this isn’t the number of votes, it’s the percentage of readers who vote
* Iterative algorithm of amount of time between individual votes
* Voting down as well as up
* Trustworthiness of user who originally submitted the record, maybe based on votes of previous submissions
* If a web URL is involved, maybe use metadata such as Google PageRank, inbound links (Google/Yahoo API), Blogosphere activity, and so forth - airquotes, on 10/12/2007, -9/+2Someone should research how you are more likely to get modded down on a monday..
looks like the community has a case of the mondays altogether- fugularity, on 10/12/2007, -5/+7No.....No, man, ***** no, man! I believe you get your ass kicked for
sayin' something like that, man. - Sortaburnt, on 10/12/2007, -0/+2@fugularity
I can't believe no one got the Office Space reference...or thought it was unfunny. - Sortaburnt, on 10/12/2007, -0/+1@cnanney
He was getting dugg down...so neither.
- fugularity, on 10/12/2007, -5/+7No.....No, man, ***** no, man! I believe you get your ass kicked for
- pixelat3d, on 10/12/2007, -2/+2Interesting article to get you into the mind-set of maybe thinking about a decent page-ranking algorithm, however the math and logic behind it are pretty much fluff = (
- mcduckov, on 10/12/2007, -2/+2I was thinking that very thing. One could do some REALLY complex stuff with a large evolving database. I suck at math and I could have come up with this "algorithm" in 1/2 an hour.
- graiz, on 10/12/2007, -1/+6Your algorithm doesn't seem to scale well. The age of a record is constantly changing so every minute/hour/day the 'popularity' variable will change for every record. Do you update every record to calculate popularity? It's likely that the home page uses something different...
Instead of popularity it's easier to compute a digg velocity (the speed that a story accumulates diggs or diggs per hour). This can be calculated at the time of the digg. This will likely be a bell curve as stories get popular they increase speed and as they get less popular they loose speed. Stories that are moving fast enough make the home page.- Homunculiheaded, on 10/12/2007, -0/+3Although after a certain period of time it seems that the 'digg velocity' doesn't' matter at all. Take this as an example: http://www.digg.com/tech_news/YouTube.com_-_A_Flickr_type_tool_for_videos
probably the most dugg and never fronted paged story on digg. It's the first story submitted on digg about youtube, originally it got something less then 30 votes. Then, years later someone mentioned in in another digg article. People (myself included) tried to digg it up to the front page, but obviously it never got there, although it did get around 100 votes in just a few hours. So after a certain amount of time it seems virtually impossible to digg a story to the front page. - mcduckov, on 10/12/2007, -0/+1That really encourages the whole digg-mob mentality. I'd rather see diggs given out on a limited basis to users with good karma and have those diggs subject to metadigging. You'd have to introduce a bit of editorial intervention to keep things from jumping on and off the frontpage but overall you'd then just count diggs for promotion.
- Homunculiheaded, on 10/12/2007, -0/+3Although after a certain period of time it seems that the 'digg velocity' doesn't' matter at all. Take this as an example: http://www.digg.com/tech_news/YouTube.com_-_A_Flickr_type_tool_for_videos
- clinko, on 10/12/2007, -1/+2I'm glad someone is writing about this. I wrote similar code a couple years ago and just thought it was hypocritical that anyone using an algorithm is called a "digg clone" or "ripoff".
I encourage Netscape/Reddit/and other people to present the same data using a new method.
Lets face it, digg is another sort on the same pieces of data we've seen before. At some point this algorithm will become a standard on many websites, much like all blogs sort newest posts at the top, and all comments have newest posts at the bottom. - arctic, on 10/12/2007, -0/+3You can't see their scripts.
- Topher06, on 10/12/2007, -1/+2Don't model Digg, their popularity algorithm is quite obviously broken.
- VhaidraU, on 10/12/2007, -0/+2Why make a digg-style algorithm from scratch? Why not use open-source pligg?
- HigherLogic, on 10/12/2007, -1/+2It's a programmer's thing...it's the same reason we like to rewrite sloppy code or naming conventions when it isn't the way we write (VariablesLikeThis compared to variables_like_this). It's a challenge.
- JimDaGeek, on 10/12/2007, -3/+1How about:
Select * from Votes Order By VoteCount Desc
KISS baby :-)- lemz, on 10/12/2007, -0/+1ya... this will obviously work over time...
- HigherLogic, on 10/12/2007, -0/+1Uh, if you read the article, the popularity of an article does not equal the amount of votes it receives...there's plenty of variables to factor in.
- fuxjoey, on 10/12/2007, -1/+1A guy from Digg should share what's under the hood as well. It doesn't has to be completely everything which would reveal secrets.
- Vanadium, on 10/12/2007, -2/+1I worked on the topic with a different approach. Here is my write-up
http://vallery.net/2007/03/26/scalable-story-promotion/ - xpose, on 10/12/2007, -3/+3I already made a digg -like clone. It cost me under 200 dollars. My comment system is better. My page allows nudity and its about celebrity gossip. Need I say more?
http://www.celebritypwn.com/ - cootetom, on 10/12/2007, -0/+1or http://www.pligg.com/
- justmy15cents, on 10/12/2007, -0/+1Just my 15 cents..
This is an exclusive look inside the main CORE of digg! - justmy15cents, on 10/12/2007, -1/+1Just my 15 cents..
This is an exclusive look inside the main CORE of digg!
php
If (instr($title,"top 10")>0 || instr($title,"sex")>0 || instr($title,"girlfriend")>0)
{
MakePopular();
}
else
{
SendToCNN();
} - zolushkatykva, on 10/12/2007, -0/+0Beautiful! What an epatage!
- supervapio, on 10/10/2007, -1/+1Quite doubtful. Seems the server is down. http://musiclabs.blogspot.com
- topicnation, on 10/10/2007, -1/+1Perfect post! Almost all people think so http://cakeguru.blogspot.com
Digg is coming to a city (and computer) near you! Check out all the details on our