To those unfamiliar: Researchers at Stanford University recently wrote a paper describing how they built a classifier that could detect sexual orientation from facial features, whose accuracy (81% for gays, 74% for lesbians) was better than even human judges at detecting sexual orientation (57% for gays, 58% for lesbians). And this became a huge controversy — you can imagine why.
The paper doesn’t mention the classifier code being made public. To confirm, I have emailed one of the researchers, and will update this answer if he replies. I’m sure they’re swamped with emails about this.
Let me explain their 47-page paper (even though the question didn’t ask :P), but I won’t be going into the pros, cons and ethics of such research. Feel free to comment your opinion on that.
Step 1: Data and Pre-processing
Public images of 36,630 men and 38,593 women were obtained from a U.S. dating website. Half of them were gay. The sexual orientation was determined based on what gender they were looking for, on the website.
A face-detection software called Face++ was used to find the location of facial features on the images. This way, images with multiple faces, small faces, partially hidden faces and faces that weren’t facing the camera directly were all removed.
We now have 35,326 images of faces, all facing the camera and fully visible.
Step 2: Representing Facial Features
In simple terms, each face was converted into a list of 4,096 scores representing the facial features. This was done using a widely employed neural network called VGG-Face. However, these scores are not interpretable, i.e., we cannot say that the 126th score corresponds to the nose shape, and so on.
Now, we have 4,096 numbers for each of the 35,326 images. Finally, those 4096 numbers were reduced to 500 numbers for each image, using a dimensionality reduction technique called Singular Value Decomposition. Don’t worry, we’re not losing a lot of information.
Step 3: Training a Classifier
We now have 500 numbers representing an image as well as the sexual orientation of that image. So, a simple classifier called Logistic Regression was built to use those 500 numbers as features, to predict the sexual orientation. This is where the 81% accuracy for gays and 74% accuracy for lesbians come from.
Step 4: Which facial areas were important?
Now, we need to figure out what facial features were most ‘important’ for the classifier. Remember, those 500 numbers are not interpretable. So, the study looked at how much the classification outcome changes when we ‘mask’ certain facial areas. And the results look like this, with red representing the most informative features:
For men: nose, eyes, eyebrows, cheeks, hairline, and chin.
For women: nose, mouth corners, hair, and neckline.
Step 5: Facial differences between gay and straight people
This is slightly controversial. The study looked a bunch of faces most likely to be tagged gay and least likely to be tagged gay, and created a “composite” face for gay and straight men and women.
It was observed that gay men had narrower jaws, larger foreheads and longer noses than heterosexual men, while lesbians had larger jaws and smaller foreheads than heterosexual women.
The results suggest that gay faces tend to be gender-atypical, which is consistent with PHT (prenatal hormone theory) of sexual orientation which predicts the existence of links between facial appearance and sexual orientation.
You can read the full paper here: https://psyarxiv.com/hv28a/
<Edit>
Some limitations of the study, which are acknowledged or addressed in the author notes here (which I highly recommend reading).
</Edit>
P.S. Please don’t report me or downvote the answer just because you didn’t like the research. Don’t shoot the messenger. I was just being straight with y’all. :P