(Fairness is a key research area at CeRAI and Dr. Krishnan heads various projects in this area. This essay is a review of the key concepts and problems along with some pointers to the relevant work being done at CeRAI.)
Bias can creep into AI models at various stages of their development, causing critical concerns with respect to the fairness of the solution or service performed by the AI models. It can begin with the very datasets used to train the models. Data used for training can have imbalances with respect to representations of categories of users in terms of demographics that can cause issues in models. E.g., A hiring model trained primarily on male resumes could potentially cause bias against women candidates[1]; training a clinical decision support model on data of patients from primarily one gender or racial community[2]; training a re-offence prediction model on biased arrest or crime records with respect to identity attributes such as race; etc., can result in bias issues from trained AI/ML models[3]. The design and training phase of the AI lifecycle can also introduce bias in AI/ML models. Algorithmic choices, such as choices among models, cost/loss/reward functions, preprocessing techniques, and the selection of attributes and features during model design can inadvertently favor certain communities or user groups[4]. Additionally, during training, the model learns patterns from the data, which may include inherent biases if not carefully addressed. Finally, bias can manifest during deployment, when the model is used in real-world situations. If the deployment context differs significantly from the training data (out of distribution / unseen scenarios), the models could end up causing robustness issues leading to critical fairness concerns.
Therefore, the impressive capabilities of AI models are accompanied by a critical imperative – ensuring fairness in the real world post deployment. Bias in these models can have worrisome concerns and real-world consequences, impacting areas as diverse as Finance, Healthcare and Justice. Mitigating this bias necessitates a comprehensive understanding of various fairness concepts, corresponding sectors and a societal and policy angle of the fairness problem that can arise in a particular sector. This article explores the intricacies of group, individual, and counterfactual fairness, examining their application in detecting and quantifying fairness within deployment domains.
Types of Fairness:
Group Fairness[5,6]: This principle ensures that the model does not exhibit systematic favoritism or disfavor towards specific groups based on sensitive attributes like race, gender, or age. Consider a loan approval model that consistently rejects applications from a particular demographic group, even when their financial profiles qualify them for a loan. This scenario violates group fairness. Several metrics, based on the idea of probability distribution with respect to the question: “Does the model treat different groups equally?”, have been proposed to measure group fairness5. Some of them include Demographic Parity, Conditional statistical parity, Equalized odds, Equal opportunity, etc. to indicate how fair/biased a model is, given a particular identity or protected attribute5.
Individual Fairness[5,6]: Here, the focus shifts to treating similar individuals similarly, irrespective of their group affiliation. For example, two loan applicants with identical financial profiles and credit scores should not receive different approval outcomes solely based on their race. Individual fairness guarantees that the model makes decisions based on relevant factors, not irrelevant group memberships. Metrics such as Counterfactual Fairness help quantify the individual fairness inhibited by a model for certain given protected attributes.
The concept of Counterfactual Fairness generally revolves around the question: "Would the model arrive at the same decision for an individual if a specific characteristic (e.g., race) were altered?". Higher counterfactual fairness ensures more equitable or (individually) fair outcomes indicating that the model's decisions wouldn't be swayed by irrelevant biases present in the training data[6].
Nuances of AI Fairness in a Domain:
Causes of Bias: Bias can infiltrate AI models at various stages of development. Imbalanced training data with respect to representation from user groups can lead to skewed results[7]. E.g., A hiring model trained on a larger number of male candidate resumes than female candidate resumes can cause bias against the female gender candidates. Historical biases in terms of societal biases reflected in the data itself can also be problematic, resulting in biased AI models replicating the same societal biases we are trying to eradicate from society. E.g., biased arrest records in the past used for training in a recidivism prediction model may potentially cause racial bias. Algorithmic biases can also lead to societal harms or biases. These happen due to certain design choices with respect to learning model, optimization/reward functions, parameters, etc., which can potentially end up in a situation where the trained AI model favors or is biased against a certain user group or category[8]. Identifying potential sources of bias is crucial for effective mitigation strategies. Understanding the plausible causes of biases with respect to a particular sector can significantly help in detecting and quantifying bias inhibited by an AI model to be deployed in that sector.
Policy Perspective: Understanding a domain's regulations and goals is paramount to defining fairness metrics. For instance it might be prudent to understand what exactly qualifies as a bias in a particular sector before delving into model development and deployment. E.g., in a loan approval model, ensuring equal opportunity (fairness for qualified individuals regardless of background) might be a higher priority than achieving statistical parity (equal approval rates across all groups). In other words, individual fairness might be preferred more than group fairness in this particular use case. Regulatory frameworks and fair lending practices in the financial domain can provide information on how fairness could be measured and addressed in such AI models. All this motivates the need for a participatory framework[9] that can be used to follow during the AI lifecycle stages of design, development and deployment and be inclusive of all the domain stakeholders as well to ensure that AI is safe for use in that domain and solution task.
Fairness Through the Lens of Domain Task Performance: It is important to link the concepts of fairness (group, individual, etc.) or bias (data representation, historic, algorithmic, etc.) to domain tasks and its performances. For instance, let’s say the overall accuracy in terms of F1 score of an AI model for a heart disease prediction model is decent. However, it is observed that the same metric for women is far less than the male, indicating a bias against the female gender by the AI model. In such situations it is important to not only detect and quantify the biases, but also to understand the source of the bias issue to ensure that there is a de-biasing operation performed to make the deployed model safer for all patients alike.
De-biasing Techniques: Several approaches/techniques have been attempted to help mitigate bias and they are mostly categorized into pre-processing (transforming datasets), in-processing (during the training) or post-processing (adjusting model outputs) techniques[10,11]. The specific technique chosen for de-biasing mostly depends on the domain and task being developed, the type of fairness being addressed, and the acceptable level of impact on model accuracy. Furthermore, advanced techniques like adversarial de-biasing can be employed to train models that are inherently less susceptible to bias.
The Fairness-Accuracy Trade-off: It is often challenging to both achieve perfect fairness and maintain superior model accuracy and this is termed as Fairness-accuracy trade-off[12]. This could potentially be caused due to bias present in the dataset or algorithmic bias learned during training. For instance, de-biasing a loan approval model might lead to a slight increase in defaults across all or certain groups (decrease in performance). However, if this change in performance is minimal or negligible and the model becomes fairer overall, it might be an acceptable trade-off. The acceptable level of this trade-off depends on the specific domain and its regulations. In healthcare, where misdiagnosis can have severe consequences, a higher accuracy might be prioritized even if it results in slightly less fairness.
Fairness-Accuracy Balance: Fairness metrics need to be carefully chosen to consider both group fairness and the model's ability to perform its core task effectively. For example, a loan approval model that achieves perfect group fairness might reject all applications to avoid any bias accusations, rendering it useless in its primary function. Therefore, striking a balance between fairness and model effectiveness is essential. It can also be considered as a crucial factor in determining a model’s deployability in a real world application in a crucial sector such as the legal sector in which individual and group fairness concepts are equally important to domain task performances such as legal judgment prediction, statute identification, etc.
By understanding these nuances of fairness, developers and policymakers can work towards building and deploying AI models that are both effective and just. This approach ensures that the benefits of AI reach everyone equitably, promoting a more just and ethical future.
Additional Considerations:
While fairness remains to be one of the key aspects to be considered while deploying an AI model, there are a few additional factors to consider when one deems the deployability of an AI model for real world solutions. These factors can further help in ensuring fairness in AI models and a few such factors are described briefly below:
Explainability and Interpretability: Even with fair AI models, it is crucial to understand the reasons for the decisions/predictions made by the models. Explainable AI (XAI) techniques can help us to an extent in comprehending the model's reasoning abilities and identify potential fairness/bias issues lurking within the algorithmic and logic boundaries of the AI models. While providing explanations are important, it is also important to ensure that the explanations provided are interpretable by domain experts and users[14]. For instance, a loan approval prediction model should be able to explain as to why a certain loan was approved or rejected in a manner that is understandable by the organization as well as the customer.
Human Supervision: Though AI models are effective in solving a lot of problems, they are not prescribed to operate in complete autonomy. Supervision by humans remains essential to ensure that models are used responsibly and ethically, and that any unintended consequences or concerns caused due to the AI models’ predictions are noted and addressed. Human-in-the-loop approaches have been quite an interesting avenue of exploration to figure out the effective workflows to ensure efficacy and safety in AI models[15].
Monitoring of AI Models: Ensuring fairness or de-biasing in AI models is not a one-time fix solution. As AI models evolve and encounter new data, especially out-of-distribution data, it is vital to continuously monitor and evaluate their task as well as fairness performances. Providing such feedback for potential fixes such as data processing or model retraining or model re-calibration needs to be an iterative process to ensure fairness and therefore needs to be considered an important phase in the entire AI lifecycle.
While new legal regulations and frameworks are being crafted around developing and deploying AI models and solutions, it remains our responsibility to come together from various disciplines to ensure that AI is developed, deployed and used responsibly and safely in such a manner that it is useful to all beings alike and help them thrive rather than causing harms.
References
Insight - Amazon scraps secret AI recruiting tool that showed bias against women | Reuters, https://www.reuters.com/article/idUSKCN1MK0AG/
Chen, I. Y., Szolovits, P., & Ghassemi, M. (2019). Can AI help reduce disparities in general medical and mental health care?. AMA journal of ethics, 21(2), 167-179.
Fairness in Machine Learning — Labelia (ex Substra Foundation), https://www.labelia.org/en/blog/fairness-in-machine-learning
Kheya, T. A., Bouadjenek, M. R., & Aryal, S. (2024). The Pursuit of Fairness in Artificial Intelligence Models: A Survey. arXiv preprint arXiv:2403.17333.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6), 1-35.
Kheya, T. A., Bouadjenek, M. R., & Aryal, S. (2024). The Pursuit of Fairness in Artificial Intelligence Models: A Survey. arXiv preprint arXiv:2403.17333.
Kheya, T. A., Bouadjenek, M. R., & Aryal, S. (2024). The Pursuit of Fairness in Artificial Intelligence Models: A Survey. arXiv preprint arXiv:2403.17333.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6), 1-35.
Participatory AI Approaches in AI Development and Governance
Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., ... & Zhang, Y. (2019). AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5), 4-1.
Bird, S., Dudík, M., Edgar, R., Horn, B., Lutz, R., Milan, V., ... & Walker, K. (2020). Fairlearn: A toolkit for assessing and improving fairness in AI. Microsoft, Tech. Rep. MSR-TR-2020-32.
Cooper, A. F., Abrams, E., & Na, N. (2021, July). Emergent unfairness in algorithmic fairness-accuracy trade-off research. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 46-54).
Tripathi, Y., Donakanti, R., Girhepuje, S., Kavathekar, I., Vedula, B. H., Krishnan, G. S., ... & Kumaraguru, P. (2024). InSaAF: Incorporating Safety through Accuracy and Fairness| Are LLMs ready for the Indian Legal Domain?. arXiv preprint arXiv:2402.10567.
Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (pp. 80-89). IEEE.
Zanzotto, F. M. (2019). Human-in-the-loop artificial intelligence. Journal of Artificial Intelligence Research, 64, 243-252.
(You can contact the author at gokul@cerai.in)