Abstract:
This cumulative PhD thesis explores innovative approaches to enhancing fairness in machine learning through advanced reweighting techniques. It addresses critical issues in the fairness of predictive models by proposing methods to mitigate bias and ensure equitable treatment of minority groups in datasets.
To tackle bias issues, the research first introduces a novel adversarial reweighting method designed to address the disparate impact that minority groups often face in biased datasets. Traditional machine learning models typically optimize for predictive utility and fairness metrics, but the under-representation of minorities makes it challenging to address these biases effectively. The proposed approach utilizes the Wasserstein distance to identify and preferentially sample majority group data points that are more similar to the minority group, thereby balancing the data distribution and enhancing fairness. Theoretical analyses confirm the method’s effectiveness, and empirical results on both image and tabular benchmark datasets demonstrate significant mitigation of disparate impact without sacrificing classification accuracy. This approach outperforms related state-of-the-art methods, highlighting its practical utility in real-world applications.
Building on the importance of achieving fairness from a causal perspective, the thesis leverages Pearl’s causal framework to propose a reweighting scheme that integrates causal relationships among variables into the data reweighting process. This method employs two neural networks that mirror the structures of a causal graph and an interventional graph, respectively. These networks approximate the causal model of the data and the effects of interventions, guiding a discriminator-based reweighting process to achieve various fairness notions. Experiments conducted on real-world datasets demonstrate that this approach effectively achieves causal fairness while preserving the integrity of the data for downstream tasks. This method represents a significant step forward in addressing biases that stem from underlying causal relationships in the data.
Furthermore, the research enhances the empirical risk minimization (ERM) process in model training through a refined reweighting scheme aimed at improving fairness. This approach adheres to the sufficiency rule in fairness by ensuring that optimal predictors are consistent across different sub-groups. It introduces a bilevel formulation to explore sample reweighting strategies, shifting the focus from the size of the model to the space of sample weights. To enhance training efficiency, the method discretizes these weights. Empirical validations reveal that this approach consistently improves the balance between prediction performance and fairness metrics, demonstrating its effectiveness and robustness across various experimental settings. This innovative method offers a practical solution for integrating fairness considerations into the core of the model training process.
Collectively, the studies in this thesis contribute significant advancements in the field of fair machine learning by introducing novel reweighting techniques that address multiple facets of bias and fairness. The methods developed not only improve fairness without compromising predictive utility but also provide robust frameworks for incorporating causal considerations and optimizing fairness during the ERM process. The findings have broad implications for the development of fair and equitable machine learning models across diverse applications, paving the way for more inclusive and just AI systems.