Compute Fairness Metrics for Binary Classification
get_fairness_metrics.Rd
Computes a comprehensive set of fairness metrics for binary classification models, disaggregated by a sensitive attribute (e.g., race, gender). Optionally, conditional fairness can be evaluated using a second attribute and a specified condition. The function also computes corresponding performance metrics used in the fairness calculations.
Usage
get_fairness_metrics(
data,
outcome,
group,
group2 = NULL,
condition = NULL,
probs,
confint = TRUE,
cutoff = 0.5,
bootstraps = 2500,
alpha = 0.05,
digits = 2
)
Arguments
- data
A data frame containing the outcome, group, and predicted probabilities.
- outcome
The name of the column containing the true binary outcome.
- group
The name of the column representing the sensitive attribute (e.g., race, gender).
- group2
Define if conditional statistical parity is desired. Name of a secondary group variable used for conditional fairness analysis.
- condition
Define if conditional statistical parity is desired. If the conditional group is categorical, the condition supplied must be a character of the levels to condition on. If the conditional group is continuous, the conditions supplied must be a character containing the sign of the condition and the value to threshold the continuous variable (e.g. "<50", ">50", "<=50", ">=50").
- probs
The name of the column with predicted probabilities.
- confint
Logical indicating whether to calculate confidence intervals.
- cutoff
Numeric threshold for classification. Default is 0.5.
- bootstraps
Number of bootstrap samples. Default is 2500.
- alpha
Significance level for confidence intervals. Default is 0.05.
- digits
Number of digits to round the metrics to. Default is 2.
Value
A list containing:
- performance
Data frame with performance metrics by group.
- fairness
Data frame with computed fairness metrics and optional confidence intervals.
Details
The results are returned as a list of two data frames:
performance
: Contains performance metrics (e.g., TPR, FPR, PPV) by group.fairness
: Contains group-level fairness metrics (e.g., disparities or ratios), confidence intervals (if specified).
Fairness Metrics Included:
Statistical Parity: Difference in positive prediction rates across groups.
Conditional Statistical Parity (if group2 and condition are specified): Parity conditioned on a second group and value.
Equal Opportunity: Difference in true positive rates (TPR) across groups.
Predictive Equality: Difference in false positive rates (FPR) across groups.
Balance for Positive Class: Checks whether the predicted probability distributions for positive outcomes are similar across groups.
Balance for Negative Class: Same as above, but for negative outcomes.
Positive Predictive Parity: Difference in positive predictive values (precision) across groups.
Negative Predictive Parity: Difference in negative predictive values across groups.
Brier Score Parity: Difference in Brier scores across groups.
Overall Accuracy Parity: Difference in overall accuracy across groups.
Treatment Equality: Ratio of false negatives to false positives across groups.
Examples
# \donttest{
library(fairmetrics)
library(dplyr)
library(randomForest)
library(magrittr)
data("mimic_preprocessed")
set.seed(123)
train_data <- mimic_preprocessed %>%
dplyr::filter(dplyr::row_number() <= 700)
# Fit a random forest model
rf_model <- randomForest::randomForest(factor(day_28_flg) ~ ., data = train_data, ntree = 1000)
# Test the model on the remaining data
test_data <- mimic_preprocessed %>%
dplyr::mutate(gender = ifelse(gender_num == 1, "Male", "Female"))%>%
dplyr::filter(dplyr::row_number() > 700)
test_data$pred <- predict(rf_model, newdata = test_data, type = "prob")[, 2]
# Fairness evaluation
# We will use sex as the sensitive attribute and day_28_flg as the outcome.
# We choose threshold = 0.41 so that the overall FPR is around 5%.
# Get Fairness Metrics
get_fairness_metrics(
data = test_data,
outcome = "day_28_flg",
group = "gender",
group2 = "age",
condition = ">=60",
probs = "pred",
confint = TRUE,
cutoff = 0.41,
alpha = 0.05
)
#> $performance
#> Metric GroupFemale GroupMale
#> 1 Positive Prediction Rate 0.17 0.08
#> 2 Positive Prediction Rate 0.34 0.21
#> 3 False Negative Rate 0.38 0.62
#> 4 False Positive Rate 0.08 0.03
#> 5 Avg. Predicted Prob. 0.46 0.37
#> 6 Avg. Predicted Prob. 0.15 0.10
#> 7 Positive Predictive Value 0.62 0.66
#> 8 Negative Predictive Value 0.92 0.90
#> 9 Brier Score 0.09 0.08
#> 10 Accuracy 0.87 0.88
#> 11 (False Negative)/(False Positive) Ratio 1.03 3.24
#>
#> $fairness
#> Metirc Difference 95% Diff CI Ratio 95% Ratio CI
#> 1 Statistical Parity 0.09 [0.05, 0.13] 2.12 [1.48, 3.05]
#> 2 Conditional Statistical Parity 0.13 [0.05, 0.21] 1.62 [1.18, 2.22]
#> 3 Equal Opportunity -0.24 [-0.39, -0.09] 0.61 [0.44, 0.86]
#> 4 Predictive Equality 0.05 [0.02, 0.08] 2.67 [1.39, 5.1]
#> 5 Balance for Positive Class 0.09 [0.04, 0.14] 1.24 [1.09, 1.41]
#> 6 Balance for Negative Class 0.05 [0.03, 0.07] 1.50 [1.29, 1.74]
#> 7 Positive Predictive Parity -0.04 [-0.21, 0.13] 0.94 [0.72, 1.23]
#> 8 Negative Predictive Parity 0.02 [-0.15, 0.19] 1.02 [0.79, 1.33]
#> 9 Brier Score Parity 0.01 [-0.01, 0.03] 1.12 [0.88, 1.43]
#> 10 Overall Accuracy Parity -0.01 [-0.05, 0.03] 0.99 [0.94, 1.04]
#> 11 Treatment Equality -2.21 [-4.5, 0.08] 0.32 [0.15, 0.69]
#>
# }