Making Decisions with Classifiers

TL;DR How to balance false positive rates and false negative rates is a problem that confronts anyone who does anything with machine learning, or even predicting more generally In this post, I explore how to make an optimal choice of FPR and FNR given your relative valuation of a false positive and a false negative I found it surprising that the optimal choice is determined not only by your preferences over false positives and false negatives, but also by the probability distribution of the positive class Specifically, I show that if \(\pi\) is the prevalence of the positive class and \(u_{TP}, u_{TN}, u_{FP}, u_{FN}\) are utilities associated with True Positives, True Negatives, False Positives, and False Negatives, then you should choose a point on the ROC curve which satisfies \[\text{ROC Slope} = \frac{1-\pi}{\pi} \frac{u_{TN}-u_{FP}}{u_{TP}-u_{FN}}\]

Projects

Radlibrary Radlibrary is an R package for querying the Facebook Ad library API. This is useful for researchers who want data on political ads. Queries Queries is a package that I use for templating SQL queries.

In 2020, we’re doing our own websites

I’ve written a lot in other places—mostly Medium, but I think in 2020 we’re doing our own websites. Blogdown has gotten so good that it feels like a big waste of time to write posts in RMarkdown and translate them over to Medium like I’ve been doing.

Radlibrary

My main job at Facebook is working on finding and getting rid of hate speech on the platform, but Radlibrary is a side project that I maintain. Radlibrary is an R package for querying the Facebook Ad Library API, and transforming the results into tidy dataframes suitable for statistical analysis.