This page may be out of date. Submit any pending changes before refreshing this page.
Hide this message.
Quora uses cookies to improve your experience. Read more

Is there a better alternative to the 5-star rating system?

47 Answers
John Ciancutti
I encourage the questioner not to think of ratings systems as better vs.
worse. Instead, think of different rating systems as optimizing for
different variables
. These variables can be somewhat in tension with
each other. The specific challenge you’re addressing, and how you define success, will have implications for what rating system will work best for your needs.

In case it isn’t obvious, this answer is entirely just my view. I won’t sprinkle that point throughout my assertions, but please keep it in mind. This answer turned out quite long so I've gone and bolded some bits for a casual perusal.

Think of a rating system as an avenue for dialogue between your users and your service. This involves investment on the part of your users, with some expectation of return for the investment. Understand and help guide this expectation so you can do your best to deliver against it. That will drive more ratings.

So, what are you trying to accomplish with your rating system? What
expectations do your users have for engaging with it? For your users,
here are some non-exclusive examples:

    • They want to reward good content/punish bad content. They want to express their opinion.
    • They want to contribute to the average rating, or the popularity/unpopularity of specific content. They want to edit and help the community.
    • They expect that giving their tastes as input will result in your service improving for them.
    • You’ve made the act of rating fun/game-like, and so for a period of time the act of rating is inherently worthwhile.
    • You’ve set an expectation with your service that rating will provide better/more access to your content.
    • They want to share great content with their social graph.

    For the service, here are some reasons that gathering ratings may be
    worthwhile. Some of these are the same as user motivations, which is
    helpful:

      • Understanding a user’s tastes helps you provide a better, more personalized experience for them.
      • Aggregate user ratings help you sort and filter content for all users.
      • Generally speaking, getting users to engage in taste input deepens their relationship with your service. This is highly correlated, and somewhat causal.
      • Ratings are the key input to recommender prediction systems. If you have one of those, you generally need ratings to power it.
      • Ratings provide a basis for evidence down the road. They can justify why you’re showing your user other content.

      You may want to meet every one of those user motivations (and others), and you may hope to accomplish each of those goals with your service. However, it is best to focus and prioritize within each list.

      Here is an example of the tension I mentioned earlier: in general (but
      depending somewhat on the specific algorithms used), recommender systems benefit from ratings systems with as many choices for the user as possible. A 5 option system is better than a 3 option system. By better, I mean the recommender will make more accurate predictions per rating.

      However, your users will rate less content the more options your rating system provides. You will get more ratings if you give your users fewer choices, so a 10 option system turns out to be worse than a 5 option system. Netflix uses a 5 star rating system. At a couple of points in the product history, Netflix made available or tested half stars.

      One group of members could rate movies and TV shows 1-5 stars, but whole stars only. Another group of members could rate the same movies and shows, on the same scale, but they could rate in half stars as well as full stars. Everything else about the Netflix service was identical for both groups.

      The Netflix members offered half stars rated significantly fewer titles. However, the recommender system Netflix used at the time was able to make more accurate predictions per rating for those members.

      Your users have a certain mental budget they will invest in your rating
      system. The more work you make each decision, the fewer decisions you will get. This is true in many contexts other than rating systems as well.

      Which approach offers more overall utility for a recommender system is quite dependent on the specific algorithms employed to make predictions.

      If your goal is to provide evidence for future suggestions more than
      having perfect predictions and recommendations, you’d generally want to opt for simple systems that motivate broader engagement.

      As Evan Hamilton points out in his answer to this same question, some services struggle to engage users more deeply than a de facto or by-design “thumbs up/thumbs down” model. Such systems can work marvelously well, depending on the service. I would cite Quora as a handy example we’re both familiar with.

      Let me give some general advice when designing rating systems:

      • Don’t follow general advice, such as this list, except maybe as a starting point. Your service is unique (if you’re going to get anywhere), so test, test, test! Be clear on your goals and priorities and experiment.
      • If you want your ratings system to be an important part of your service, set that expectation as your user first begins using your service. Make engaging with your system core to the experience, not a sideline feature. My favorite manifestation of this point is Pandora. You can’t use the Pandora service without knowing they are doing personalization on your behalf. They have no separation between giving taste input and controlling the service.
      • Related, users are more inclined to engage with rating early in their use of your service. If they do engage early, they are more likely to rate later as well. Make sure you make the system prominent in their first experience.
      • Make rating easy, lightweight, and ideally fun. Way, way back in 1999 Netflix was designing its first ratings and recommendation system. I was working on it with a more senior and brilliant engineer. I will call him Stan. Stan was playing with a fun little language that was still maturing at the time, called Javascript. Stan’s insight was that a rating widget shouldn’t involve a server-side refresh of a web page. This is blindingly obvious today but at the time this was amazing stuff. He was experimenting with different metaphors: thermometers, star systems, he played with sound effects and a bunch of other stuff. We eventually landed on a (thankfully silent) 5 star rating scale.
      • Give immediate feedback when a user rates. Remove content they rate low. Show them similar content when they rate something highly.
      • Ask users to rate content as or immediately after they engage with it.
      • Quora shows us how powerful it can be to tie rating content with your social graph. It can provide a strong motivation to rate (“Wow, this content is great, and I want my cohort to be exposed to it”) and powerful evidence for content discovery (“I don’t think of myself as interested in [topic X], but I know [followed Quoran Y] writes and upvotes great answers so I’ll read it”).
      • If you want to make ratings core to your service, invest heavily in similars algorithms. Similars algorithms compute which content is similar to which other content. It is straight forward to get decent-to-good with similars, and they tend to be very intuitive for users. So you can easily set up a rating -> similar -> show-with-evidence cycle that is powerful.
        Your response is private.
        Is this answer still relevant and up to date?
        James Rubinstein
        The OP's question is an interesting one - and one I think about nearly every day of my life. I am actually the product owner for the user surveys on ebay and for part of our human judgment program. As a result, rating systems consume rather a large portion of my waking thought. The following are some of my thoughts on the subject.
        1. There are several problems with the conventional 5 star rating system. The first is that you can often end up with "range compression" which is where most people give a similar positive or negative answer leaving your average at 4.5 out of 5.
        2. Ties. If everything gets an average rating of 4.5 out of 5, then you can't differentiate between items being rated. Not everything can be above average, but a 5 star rating may make it look that way.
        3. One of the main reasons for the above is that people only give you feedback if they are motivated to do so. Usually that motivation comes in the form of love or hate. People don't leave feedback about 'meh' experiences. As a result (while averages can suffer from range compression) a deeper look reveals a bimodal distribution - where there are two distinct clusters of answers.
        4. No clear objective (or even subjective) standard for rating. See Yelp.com for examples. So many times I read reviews of how great a place was, only to see a 4 star rating given by the same reviewer. What gives? Interestingly, one strategy to solve this issue is to remove any notion of objective ranking and allow people to calibrate internally.
        5. Expertise, previous experience, and taste are all an issue. If you don't know anything about the subject you are being asked to rate. If I ask you to rate a wheel for a 65 Mustang, then a distributor cap for the same - would you be able to differentiate? (note: this applies to basically all rating strategies)
        So, given that the 5 star rating system has its flaws, what are some of the alternatives? It depends on the task you are trying to accomplish. If you are trying to elect a mayor in San Francisco, you could choose instant runoff voting http://en.wikipedia.org/wiki/Ins.... You could also ask someone to simply pick their favorite choice from a list of choices (this is a great way to find the top choice of many people, but majority voting suffers from its own drawbacks).

        If you have some noted expert doing the rating, you could give them a very fine-grained rating scale of, say, 100 points - which works well for Wine Aficionado.

        Another option would be to simplify - given that most ratings are strongly positive or negative, you could just have a + or - rating system. One nice feature of that system is that you could end up with an average of 0 (as many people like it as dislike it) or tending positive or negative, telling you that people were mostly favorable or unfavorable at a glance.

        What Glen Maddern is suggesting is adding criteria, which isn't really changing the scale, per se, but using these additional criteria to tease out other effects. Asking people "was this movie good" implies some objective sense of "goodness" while asking "did you like this movie" only asks for your personal opinion. For instance "Live Free or Die Hard" is a terrible movie, but I love it. Glen's rating strategy attempts to bridge that gap.

        Finally, what I've been leaning towards recently a pairwise comparison ranking system. If I were John Ciancutti, this is what I'd be thinking about doing at Netfilx. The nice thing about pairwise comparison is that it virtually eliminates ties. Pairwise comparison gives you an infinitely scalable rank-ordered list of preference. The problem with pairwise comparison is that it can create a combinatorial explosion of pairs to be ranked. Fortunately, by the transitive property if we rank A>B and B>C, then we know A>C.
        http://en.wikipedia.org/wiki/Pai...

        Other advantages of pairwise comparison are that it's fun check out Matchin on GWAP.com by Luis Von Ahn

        There are several other methods one could choose to rank items, hopefully that gives you some introduction. Once again, it's all about the expected outcome of the ranking tool. To decide the best ranking strategy, first you have to decide on your desired outcome.
        Your response is private.
        Is this answer still relevant and up to date?
        David Cole
        One of my favorite alternatives is Jack Cheng's slider for rating tea on Steepster:


        It captures a lot of data with one interaction:

        1. The four sections (there's a third face hidden by that tool tip) provide a quasi 4-star system, keeping it simple.
        2. The actual faces themselves ask for emotional input.
        3. The slider has 100 points, like beverage reviews tend to use, continuing an established practice.
        4. The notches represent other teas the user has rated, so relative ratings are baked in.

        Plus, it's adorable!

        More details here: http://blog.steepster.com/post/2...
        Your response is private.
        Is this answer still relevant and up to date?
        Don Turnbull
        I authored a paper about a rating system I developed (called OpenChoice) and we tried to look at some of the issues related to rating, voting and ranking (mentioning Netflix in fact).

        http://donturn.com/rating-voting...

        Rating, Voting & Ranking: Designing for Collaboration & Consensus

        The OpenChoice system, currently in development, is an open source, open access community rating and filtering service that would improve upon the utility of currently available Web content filters. The goal of OpenChoice is to encourage community involvement in making filtering classification more accurate and to increase awareness in the current approaches to content filtering. The design challenge for OpenChoice is to find the best interfaces for encouraging easy participation amongst a community of users, be it for voting, rating or discussing Web page content. This work in progress reviews some initial designs while reviewing best practices and designs from popular Web portals and community sites.


        …popular community sites feature common interface elements and functionality:
        • Overall voting and rank status easy to read
        • Dynamically updated interaction
        • Thumbnail, abstract or actual content of item on same page as voting interface
        • Rating information for community at large for the item
        • Suggestions or lists for additional items to rate
        • Textual description of (proposed) item category with link to category
        • Links to related and relevant discussions about item (or item category)
        • Standard interface objects (where appropriate) to leverage existing Web interaction (e.g. purple & blue links colors, tabbed navigation metaphor, drop-down lists)
        • Show history of ratings or queue of items to vote on
        • Aggregate main page or display element that shows overall community ratings (to encourage virtuous competition for most ratings)
        • Task flow for voting or rating clear with additional interactions not required (e.g. following links)
        …In addition to dynamic voting status, there is some consideration of simplifying the voting to include “allow” vs. “block” ratings only. Design issues such as the colors of the buttons may also overly influence certain votes.
        Paulo Buchsbaum
        I have carefully read all the answers. There are many differing
        opinions, which is normal in a subject so much controversial. Some answers are thinking about reviewers too smart for the average person.

        For me, the visual slide system proposed in David Cole response is the
        smartest, because it is interactive, easy and facing any kind of public.

        I would just make two small changes.

        1) I would set limits on the scale in order to the value corresponding
        to a mouse click don't represent more than 10 different grades. It's easy
        review 10 different grades, when it is visual including funny faces expressions.

        2) I would divide the scale into 2 parts, the first half is related to
        dislike (thumbs down) and second half would be equivalent to like (thumbs up).

        However, it's not bad a five star system like Amazon. I'm not against half-star (10 grades) since it's not so impositive like number grade.

        I think the Holy Grail of reviews is how to compute the statistics of reviews in
        order to offer some whole sensation to the final user:

        a) I like Rotten Tomatoes system that shows the grade, but shows also
        the percentage of those who liked it, based on the division of the scale in two
        parts.

        b) The simple grade average (Like Amazon) is inevitable because it is
        simple. Amazon lets you read the reviews of each grade kind (1 star only, 2
        stars only, etc) which is very useful. Amazon also allows you to sort the written
        reviews by review evaluation.

        c) I think Amazon does a great job of evaluating the reviewers. I just
        think they lack also computed a statistical grade average, where the weight of
        each reviewer was based on

        • Number of reviews,
        • Reviewer quality (user opinions about the written reviews usefulness)
        • Range of notes that the reviewer uses (if a reviewer only rate 4 to 5 stars,
                 its  weight should fall, because he doesn't disperse his grades)
        • Amount of written reviews from a reviewer

        In short, it is a kind of "review ranking", playing with Google "page
        ranking" system, used to sort the Google search results.
        In my opinion most rating systems in social network apps like YouTube compare better to statistical surveys, than to content quality rating. They answer questions like “How many people like that movie?” better than “How much or why they like it?”. When you see 100k likes, it does not necessarily mean that you also like the movie or that it is good quality (rated by a film expert).

        It’s about the context a rating is placed in. In the broad context offered by YouTube, with thousands of visitors, the better accuracy of a 5-star rating probably won’t have a big advantage over the simpler thumbs up/down on average. But in the context of your personal situation you might want to have more values to work with. In the first you more or less “vote” for or against an option, in the second you rate multiple arguments. Imagine you are only allowed to use thumps up/down in your mind, when you build an opinion over different options. Don't you maybe like that one much more than the other? And don’t you want to express that somehow?

        Expressing your opinion was also a goal when planning the rating controls for Choosle: http://choosle.ch/DFIK and http://blog.choosle.ch/2011/05/r... . 5-stars allow a quick, rough rating to be fine-tuned later with a +/- slider. It is still very early to measure the success of this approach, but it seems to make sense in the contexts of many Choosles so far. I have also extended a copy (http://choosle.ch/DYqJ) with thoughts I have learned here, thank you! Maybe another copy can help you to build your context and collect arguments for the optimal rating system from your point of view.

        Even though the social opinion is difficult to measure accurately, also a rough, statistical result might still be an important factor in your more private context. It also helps you to be objective, because of the mixture of different point of views. With the statistics filtered somehow (only ratings from similar minded people,...) the context becomes even more helpful (and less objective ;) .

        With a kind of multi-user rating feature we plan to integrate this social rating aspect into Choosle soon (if we’d only be able to clone ourselves ;) . A user should be able to compare his rating with the ones from selected friends or a public average (maybe different groups, if available).
        Stan Zeltner
        Why not imagine a different approach when it comes to ratings and reviews?

        5, 7 or 10 stars systems have major drawback : they average people perceptions. I will show you two situations :

        100 people giving their opinion on a product quality (sample). 50 are absolutely satisfied, 50 are absolutely not satisfied. The star representation of that situation is :

        Now imagine we have 100 people with mixed feelings. With the star's system we get this one :


        So, we got two different situations resulting in the same representation with the stars rating system. It's a pity.

        A powerfull feedback system should absolutely :

        • Do not average individual words into the global group thinking
        • Allow navigation into reviews depending on what the people are thinking (I want to read the comments from people absolutely not satisfied on this item)

        At lidoli (like don't like) we propose differents ways to collect and display people feelings.

        First we use a color scale :


        Then we define a set of items to caracterize the evaluated object. For instance we could have 5 items to assess a wine (it's only a sample, purists will forgive) :

        • Mouth
        • Nose
        • Packaging
        • Price
        • Word of mouth
        If 50 people give their opinion on 5 items we obtain a colored matrix. If we sort this matrix, moving the positive feeling on the top and the negative on the bottom, we obtain something like that :


        (Sorry items are in French).

        What do we gain here ?

        • Immediate ranking from pro (up) to cons (down)
        • Readability of opinion's distribution
        • Ability to use this widget as a remote control to navigate into reviews (i want the comments from people feeling 'Red' on 'Nose'
        • Possibility to use the items as search criteria into a qualitative search engine
        The stars same stars representation give this one :


        We believe that the colored picture is more powerfull.

        The system can be used for social commerce purpose or other fields like citizen participation, foresight, team building.

        We think it is more transparent than the 5 stars rating system.

        Sorry for my english :(
        Brendan Ross
        It's not about the rating system per se, it's about the relationship you can develop with the rating system.

        Two examples: RottenTomatoes & Amazon.

        I can use RottenTomatoes to very accurately determine if I'll like a movie.  I've used it for years, and I know that an action movie should get at least a 60% for me to like it, but a slightly dreary British remake of a 100 year old novel needs a 95% unless it has a director I like, which makes about 10 points of difference.  This is a relationship.

        With Amazon, I give credit for a higher number of reviews, and I need at least 4.5 stars, but then I turn to the bad reviews to read first.  For me, the shape of the review curve (is there a mode around 1 star in addition to the mode at 5 stars) and the quality of writing of the bad reviews is everything.

        I could go on and on:  for CNET Downloads I look mostly at the quantity of downloads and the text of the negative reviews.

        The ratings systems that useless are those that features ANY kind of simple rating with no text reviews and allow anyone to rate.

        The bottom line is that you can't trick an anonymous group of the hoi poloi into providing intelligent advice by varying the number of stars.  You need to either limit it to experts (i.e. move critics) who have a reputation to maintain, or you need to require text.