MSiA Introduces Curricular Effort Focused on Bias, Privacy, and Fairness
Data science as a field is responding to broader societal calls for accountability in data usage and transparency in collection methods. Algorithims and models, too easily thought inherently amoral, are subject to the biases of their creators. As private industry places greater emphasis on bias, privacy, and fairness (or, b/f/p), the modern data science student requires understanding and knowledge of these concepts, and be able to apply them to practical problems. To this end, in the fall of 2021 MSiA introduced a new, extra-curricular series of lectures and practical exercises dedicated to b/f/p alongside specially designed projects for curricular coursework.
Analytics instructors Veena Mendiratta and Ed Malthouse took the lead on introducing students to three emerging topics: differential privacy, homomorphic encryption, and data anonymization. Mendiratta provides the following definitions:
Differential Privacy (DP) provides a robust concept of privacy through a mathematical framework for quantifying and managing privacy risks. It is studied in the context of the collection, analysis, and release of aggregate statistics ranging from simple statistical estimations to machine learning. It is an emerging topic with growing interest as an approach for satisfying legal requirements for privacy protection of personal information. Differential privacy can be viewed as a technical solution for protecting individual privacy to meet legal or policy requirements for disclosure limitation while analyzing and sharing personal data. Using examples and some mathematical formalism, this lecture introduced differential privacy concepts including the definition of differential privacy, how differentially private analyses are constructed, and how these can be used in practice.
Homomorphic Encryption (HE) is a type of encryption system that allows operations to be performed directly on encrypted data without first requiring access to a secret key. HE differs from other types of encryption schemes by randomizing data so that basic operations of interest ⏤ finding the mean or mode ⏤ are not computable over encrypted data. HE systems, on the other hand, allow certain kinds of computation to be securely performed directly on encrypted data without requiring access to a secret key. Unlike standard encryption, the result of the computation when decrypted is identical to the result of the same computation on the input data in unencrypted form.
Data Anonymization refers to removing personal identifiers from data, such as converting personally identifiable information into aggregated data. For example, how do you publicly release a database without compromising individual privacy? Data anonymization minimizes the risk of identity disclosure through: attribute suppression, record suppression, character masking, generalization, swapping, data perturbation, and data aggregation.
Beyond this lecture series, several course project assignments now include tasks for analyzing bias/fairness in data and models. Sensitive features such as age, gender, race, etc., are identified from a bias/fairness perspective, and the response variable is analyzed by the sensitive features and other variables that can be a proxy for sensitive features. Students also assess the fairness of models by applying the fairness-related metrics that are available in the Python Fairlearn package and/or the R Fairness package or with other similar tools.
While b/f/p work addresses the needs of the corporate world and the wave of environmental, social, and governance-focused initiatives, it has a broader societal impact as well as society writ large is becoming increasingly aware of bias and fairness. Machine learning and AI have big roles to play in all these arenas. Beyond providing value to employers through these studies, MSiA students will make important societal contributions by sharing and applying their knowledge in all walks of life.