Gender classification and calculating precision, recall, and f1-score to compare two different classification methods
This is an illustrative example of how we can compare gender classification predicted by a machine, algorithm, or method with the reported gender.
We would need to calculate precision, recall, and f1-score to show the coverage, completeness, and relative comparison of these two measures, i.e., f1-score.
Here, I create a toy example with some fake data and show the calculation in Python by writing the formulas and the implementation from Scikit-Learn that does this conveniently with functions.
See the example Python script with in-line comments describing the steps in 2_illustrative_example_precision_recall.py, a Jupyter notebook in 2_illustrative_example_precision_recall.ipynb, its exported HTML in 2_illustrative_example_precision_recall.html and PDF in 2_illustrative_example_precision_recall.pdf.