Skip to contents

This function evaluates conditional statistical parity, which measures fairness by comparing positive prediction rates across sensitive groups within a defined subgroup of the population. This is useful in scenarios where fairness should be evaluated in a more context-specific way—e.g., within a particular hospital unit or age bracket. Conditional statistical parity is a refinement of standard statistical parity. Instead of comparing prediction rates across groups in the entire dataset, it restricts the comparison to a specified subset of the population, defined by a conditioning variable.

Usage

eval_cond_stats_parity(
  data,
  outcome,
  group,
  group2,
  condition,
  probs,
  cutoff = 0.5,
  bootstraps = 2500,
  alpha = 0.05,
  message = TRUE,
  digits = 2
)

Arguments

data

Data frame containing the outcome, predicted outcome, and sensitive attribute

outcome

Name of the outcome variable, it must be binary

group

Name of the sensitive attribute

group2

Name of the group to condition on

condition

If the conditional group is categorical, the condition supplied must be a character of the levels to condition on. If the conditional group is continuous, the conditions supplied must be a character containing the sign of the condition and the value to threshold the continuous variable (e.g. "<50", ">50", "<=50", ">=50").

probs

Name of the predicted outcome variable

cutoff

Threshold for the predicted outcome, default is 0.5

bootstraps

Number of bootstrap samples, default is 2500

alpha

The 1 - significance level for the confidence interval, default is 0.05

message

Whether to print the results, default is TRUE

digits

Number of digits to round the results to, default is 2

Value

A list containing the following elements:

  • Conditions: The conditions used to calculate the conditional PPR

  • PPR_Group1: Positive Prediction Rate for the first group

  • PPR_Group2: Positive Prediction Rate for the second group

  • PPR_Diff: Difference in Positive Prediction Rate

  • PPR_Ratio: Ratio in Positive Prediction Rate If confidence intervals are computed (confint = TRUE):

  • PPR_Diff_CI: A vector of length 2 containing the lower and upper bounds of the 95% confidence interval for the difference in Positive Prediction Rate

  • PPR_Ratio_CI: A vector of length 2 containing the lower and upper bounds of the 95% confidence interval for the ratio in Positive Prediction Rate

Details

The function supports both categorical and continuous conditioning variables. For continuous variables, you can supply a threshold expression like "<50" or ">=75" to the condition parameter.

Examples

# \donttest{
library(fairmetrics)
library(dplyr)
library(magrittr)
library(randomForest)
data("mimic_preprocessed")
set.seed(123)
train_data <- mimic_preprocessed %>%
  dplyr::filter(dplyr::row_number() <= 700)
# Fit a random forest model
rf_model <- randomForest::randomForest(factor(day_28_flg) ~ ., data = train_data, ntree = 1000)
# Test the model on the remaining data
test_data <- mimic_preprocessed %>%
  dplyr::mutate(gender = ifelse(gender_num == 1, "Male", "Female")) %>%
  dplyr::filter(dplyr::row_number() > 700)

test_data$pred <- predict(rf_model, newdata = test_data, type = "prob")[, 2]

# Fairness evaluation
# We will use sex as the sensitive attribute and day_28_flg as the outcome.
# We choose threshold = 0.41 so that the overall FPR is around 5%.

# Evaluate Conditional Statistical Parity

eval_cond_stats_parity(
  data = test_data,
  outcome = "day_28_flg",
  group = "gender",
  group2 = "service_unit",
  condition = "MICU",
  probs = "pred",
  cutoff = 0.41
)
#> There is not enough evidence that the model does not satisfy
#>             statistical parity.
#>   Metric GroupFemale GroupMale Difference   95% Diff CI Ratio 95% Ratio CI
#> 1    PPR        0.15       0.1       0.05 [-0.01, 0.11]   1.5  [0.87, 2.6]
# }