Algorithms - Binary Rating
You have users and they rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of score or rating to sort by. And you want to keep it simple so you just give an option to like or dislike an item. But giving a concrete rating can be unseeingly hard.
Systems has ITEMS that are voted on.
You can either vote: LIKE or DISLIKE
How to give a concrete rating for each ITEM?
Like percentage is broken when you have a small number of votes.
Score = Like Percentage = (Likes) / (Likes + Dislikes)
ITEM #1: 570 / 580 = 98% score, so receives 2% drop from the 10 dislikes
ITEM #2: 1 / 1 = 100% score but has only 1 like, ranging it above I#1
ITEM #3: 1 / 2 = 50% score, so low vote targets ratings are flimsy
Like percentage is broken on a significant number of votes.
Score = Like Percentage = (Likes) / (Likes + Dislikes)
ITEM #1: 600 / 1000 = 60% score and has 200 more likes than dislikes
ITEM #2: 5500 / 10000 = 55% score but has 1000 more likes than dislikes
Just showing likes vs dislikes is a better option, but is not really an option as it has the same problems as above, the bare numbers are hard to translate to "quality" when you are taking into account the difference between likes and dislikes.
Score should be the lower bound of Wilson score confidence interval for a Bernoulli parameter. Wait, what?
require 'statistics2'
def ci_lower_bound(positiveRatings, ratingCount, confidence)
if ratingCount == 0
return 0
end
z = Statistics2.pnormaldist(1 - (1 - confidence) / 2)
phat = 1.0 * positiveRatings / ratingCount
(
phat + z*z/(2*ratingCount) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n)
) / (1+z*z/n)
end
confidence = statistical confidence level
e.g. 0.95 to have 95% change your lower bound is correct.