Quora uses cookies to improve your experience. Read more
Sriraman Madhavan
Sriraman Madhavan, Stanford Statistics | Facebook Engineer

To those unfamiliar: Researchers at Stanford University recently wrote a paper describing how they built a classifier that could detect sexual orientation from facial features, whose accuracy (81% for gays, 74% for lesbians) was better than even human judges at detecting sexual orientation (57% for gays, 58% for lesbians). And this became a huge controversy — you can imagine why.

The paper doesn’t mention the classifier code being made public. To confirm, I have emailed one of the researchers, and will update this answer if he replies. I’m sure they’re swamped with emails about this.

Let me explain their 47-page paper (even though the question didn’t ask :P), but I won’t be going into the pros, cons and ethics of such research. Feel free to comment your opinion on that.

Step 1: Data and Pre-processing

Public images of 36,630 men and 38,593 women were obtained from a U.S. dating website. Half of them were gay. The sexual orientation was determined based on what gender they were looking for, on the website.

A face-detection software called Face++ was used to find the location of facial features on the images. This way, images with multiple faces, small faces, partially hidden faces and faces that weren’t facing the camera directly were all removed.

We now have 35,326 images of faces, all facing the camera and fully visible.

Step 2: Representing Facial Features

In simple terms, each face was converted into a list of 4,096 scores representing the facial features. This was done using a widely employed neural network called VGG-Face. However, these scores are not interpretable, i.e., we cannot say that the 126th score corresponds to the nose shape, and so on.

Now, we have 4,096 numbers for each of the 35,326 images. Finally, those 4096 numbers were reduced to 500 numbers for each image, using a dimensionality reduction technique called Singular Value Decomposition. Don’t worry, we’re not losing a lot of information.

Step 3: Training a Classifier

We now have 500 numbers representing an image as well as the sexual orientation of that image. So, a simple classifier called Logistic Regression was built to use those 500 numbers as features, to predict the sexual orientation. This is where the 81% accuracy for gays and 74% accuracy for lesbians come from.

Step 4: Which facial areas were important?

Now, we need to figure out what facial features were most ‘important’ for the classifier. Remember, those 500 numbers are not interpretable. So, the study looked at how much the classification outcome changes when we ‘mask’ certain facial areas. And the results look like this, with red representing the most informative features:

For men: nose, eyes, eyebrows, cheeks, hairline, and chin.
For women: nose, mouth corners, hair, and neckline.

Step 5: Facial differences between gay and straight people

This is slightly controversial. The study looked a bunch of faces most likely to be tagged gay and least likely to be tagged gay, and created a “composite” face for gay and straight men and women.

It was observed that gay men had narrower jaws, larger foreheads and longer noses than heterosexual men, while lesbians had larger jaws and smaller foreheads than heterosexual women.

The results suggest that gay faces tend to be gender-atypical, which is consistent with PHT (prenatal hormone theory) of sexual orientation which predicts the existence of links between facial appearance and sexual orientation.

You can read the full paper here: https://psyarxiv.com/hv28a/

<Edit>

Some limitations of the study, which are acknowledged or addressed in the author notes here (which I highly recommend reading).

  • Images are of openly gay people. It is possible that gay individuals with more discernibly gender-atypical faces are more likely to “come out.”
  • Images do not include any non-white, non-American individuals.
  • Facial features in images from the dating website may have been biased in ways the researchers didn’t account for.

</Edit>

P.S. Please don’t report me or downvote the answer just because you didn’t like the research. Don’t shoot the messenger. I was just being straight with y’all. :P

About the Author

Sriraman Madhavan

Sriraman Madhavan

Carbon-based life form
Data Engineer at Facebook2018-present
M.S. Statistics, Stanford UniversityGraduated 2018
Lives in Menlo Park, CA
Knows Tamil
7.2m answer views37.2k this month
Top Writer2018