Skip to contents

Computes a comprehensive set of fairness metrics for binary classification models, disaggregated by a sensitive attribute (e.g., race, gender). Optionally, conditional fairness can be evaluated using a second attribute and a specified condition. The function also computes corresponding performance metrics used in the fairness calculations.

Usage

get_fairness_metrics(
  data,
  outcome,
  group,
  group2 = NULL,
  condition = NULL,
  probs,
  confint = TRUE,
  cutoff = 0.5,
  bootstraps = 2500,
  alpha = 0.05,
  digits = 2
)

Arguments

data

A data frame containing the outcome, group, and predicted probabilities.

outcome

The name of the column containing the true binary outcome.

group

The name of the column representing the sensitive attribute (e.g., race, gender).

group2

Define if conditional statistical parity is desired. Name of a secondary group variable used for conditional fairness analysis.

condition

Define if conditional statistical parity is desired. If the conditional group is categorical, the condition supplied must be a character of the levels to condition on. If the conditional group is continuous, the conditions supplied must be a character containing the sign of the condition and the value to threshold the continuous variable (e.g. "<50", ">50", "<=50", ">=50").

probs

The name of the column with predicted probabilities.

confint

Logical indicating whether to calculate confidence intervals.

cutoff

Numeric threshold for classification. Default is 0.5.

bootstraps

Number of bootstrap samples. Default is 2500.

alpha

Significance level for confidence intervals. Default is 0.05.

digits

Number of digits to round the metrics to. Default is 2.

Value

A list containing:

performance

Data frame with performance metrics by group.

fairness

Data frame with computed fairness metrics and optional confidence intervals.

Details

The results are returned as a list of two data frames:

  • performance: Contains performance metrics (e.g., TPR, FPR, PPV) by group.

  • fairness: Contains group-level fairness metrics (e.g., disparities or ratios), confidence intervals (if specified).

Fairness Metrics Included:

  • Statistical Parity: Difference in positive prediction rates across groups.

  • Conditional Statistical Parity (if group2 and condition are specified): Parity conditioned on a second group and value.

  • Equal Opportunity: Difference in true positive rates (TPR) across groups.

  • Predictive Equality: Difference in false positive rates (FPR) across groups.

  • Balance for Positive Class: Checks whether the predicted probability distributions for positive outcomes are similar across groups.

  • Balance for Negative Class: Same as above, but for negative outcomes.

  • Positive Predictive Parity: Difference in positive predictive values (precision) across groups.

  • Negative Predictive Parity: Difference in negative predictive values across groups.

  • Brier Score Parity: Difference in Brier scores across groups.

  • Overall Accuracy Parity: Difference in overall accuracy across groups.

  • Treatment Equality: Ratio of false negatives to false positives across groups.

Examples

# \donttest{
library(fairmetrics)
library(dplyr)
library(randomForest)
library(magrittr)
data("mimic_preprocessed")
set.seed(123)
train_data <- mimic_preprocessed %>%
  dplyr::filter(dplyr::row_number() <= 700)
# Fit a random forest model
rf_model <- randomForest::randomForest(factor(day_28_flg) ~ ., data = train_data, ntree = 1000)
# Test the model on the remaining data
test_data <- mimic_preprocessed %>%
  dplyr::mutate(gender = ifelse(gender_num == 1, "Male", "Female"))%>%
  dplyr::filter(dplyr::row_number() > 700)

test_data$pred <- predict(rf_model, newdata = test_data, type = "prob")[, 2]

# Fairness evaluation
# We will use sex as the sensitive attribute and day_28_flg as the outcome.
# We choose threshold = 0.41 so that the overall FPR is around 5%.

# Get Fairness Metrics
get_fairness_metrics(
 data = test_data,
 outcome = "day_28_flg",
 group = "gender",
 group2 = "age",
 condition = ">=60",
 probs = "pred",
 confint = TRUE,
 cutoff = 0.41,
 alpha = 0.05
)
#> $performance
#>                                     Metric GroupFemale GroupMale
#> 1                 Positive Prediction Rate        0.17      0.08
#> 2                 Positive Prediction Rate        0.34      0.21
#> 3                      False Negative Rate        0.38      0.62
#> 4                      False Positive Rate        0.08      0.03
#> 5                     Avg. Predicted Prob.        0.46      0.37
#> 6                     Avg. Predicted Prob.        0.15      0.10
#> 7                Positive Predictive Value        0.62      0.66
#> 8                Negative Predictive Value        0.92      0.90
#> 9                              Brier Score        0.09      0.08
#> 10                                Accuracy        0.87      0.88
#> 11 (False Negative)/(False Positive) Ratio        1.03      3.24
#> 
#> $fairness
#>                            Metirc Difference    95% Diff CI Ratio 95% Ratio CI
#> 1              Statistical Parity       0.09   [0.05, 0.13]  2.12 [1.48, 3.05]
#> 2  Conditional Statistical Parity       0.13   [0.05, 0.21]  1.62 [1.18, 2.22]
#> 3               Equal Opportunity      -0.24 [-0.39, -0.09]  0.61 [0.44, 0.86]
#> 4             Predictive Equality       0.05   [0.02, 0.08]  2.67  [1.39, 5.1]
#> 5      Balance for Positive Class       0.09   [0.04, 0.14]  1.24 [1.09, 1.41]
#> 6      Balance for Negative Class       0.05   [0.03, 0.07]  1.50 [1.29, 1.74]
#> 7      Positive Predictive Parity      -0.04  [-0.21, 0.13]  0.94 [0.72, 1.23]
#> 8      Negative Predictive Parity       0.02  [-0.15, 0.19]  1.02 [0.79, 1.33]
#> 9              Brier Score Parity       0.01  [-0.01, 0.03]  1.12 [0.88, 1.43]
#> 10        Overall Accuracy Parity      -0.01  [-0.05, 0.03]  0.99 [0.94, 1.04]
#> 11             Treatment Equality      -2.21   [-4.5, 0.08]  0.32 [0.15, 0.69]
#> 
# }