Advancing Trustworthy and Reliable Data Science
The massive datasets that power machine learning algorithms and systems are complex, noisy, and vulnerable to various kinds of errors, contamination, and adversarial corruptions. As data science and machine learning are increasingly deployed across the decision-making pipeline, designing provably reliable and trustworthy methods and systems is imperative.
Through workshops, graduate courses, and a weekly reading group, the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL) fall 2023 special program aims to bring together multidisciplinary researchers to explore methods and algorithms for data science that are reliable and trustworthy under various settings, including the failure of model assumptions due to contamination or modeling errors, adversarial behavior in the system, or a distribution shift from natural variations in data.
Kick-off event
On September 15, investigators in computer science, economics, electrical engineering, law, mathematics, operations research, and statistics from the five participating universities of IDEAL — Northwestern University, Illinois Institute of Technology, Toyota Technological Institute at Chicago (TTIC), University of Chicago, and University of Illinois Chicago (UIC) — attended the special program kick-off meeting hosted at Northwestern University’s Pritzker School of Law.
IDEAL site director Aravindan Vijayaraghavan, associate professor of computer science and (by courtesy) industrial engineering and management sciences at Northwestern Engineering, provided the welcome remarks and introduced the goals of the special program.
“The IDEAL Fall 2023 special program aims to bring together researchers from the broader Chicago area to tackle the important challenge of understanding and ensuring the reliability and trustworthiness of methods commonly used in data science and machine learning,” said Vijayaraghavan. “At the kickoff event, it was wonderful seeing researchers interested in this topic from across multiple disciplines like computer science, electrical engineering, mathematics, statistics, and even law and policy. We are excited about the workshops, reading groups, and all the research activity during the rest of the fall special program."
During the kick-off event, Daniel W. Linna Jr., senior lecturer and director of law and technology initiatives at Northwestern, discussed trustworthy and reliable systems through the lens of computer science and law. Linna is a co-organizer of the upcoming IDEAL workshop “Trust Perspectives in Machine Learning, Law, and Public Policy” on October 26–27.
Avrim Blum, professor and chief academic officer at TTIC, gave an overview of the concept of robustness through machine learning theory and theoretical computer science perspectives. Robustness describes how much a system or algorithm can withstand and tolerate misspecification, contamination, and data errors during modeling, training, and testing.
"This special program will be a great opportunity to bring together perspectives on robustness coming from different fields, and hopefully develop new cross-disciplinary research collaborations," Blum said.
In addition, Cong Ma, assistant professor of statistics at the University of Chicago, outlined learning under distribution shift, or the scenario in which the training and test distribution for a model are mismatched.
"One of the amazing things about IDEAL is that it has connected researchers across the Chicago area, as well as within our own campuses,” said Samir Khuller, Peter and Adrienne Barris Chair of Computer Science at the McCormick School of Engineering and co-principal investigator of IDEAL. “PhD students have access to such amazing expertise in so many areas. When I was in graduate school, I could not even dream of such opportunities. As the saying goes, ‘It takes a village’ — and here, we have our village ready now."
Graduate courses
As part of the special program, the IDEAL member institutions are offering cross-campus enrollment or auditing of select graduate courses related to reliable and trustworthy data science and machine learning systems. Northwestern Engineering faculty members are instructing three such courses this term.
As part of the core Master of Science in Artificial Intelligence (MSAI) program, Linna is teaching MSAI 448: Law and the Governance of Artificial Intelligence, which introduces engineers to the legal, regulatory, ethical, and policy questions raised by advancements in artificial intelligence and its increasing use.
Miklos Racz, assistant professor of computer science at Northwestern Engineering and assistant professor of statistics at Northwestern University’s Weinberg College of Arts and Sciences, is instructing COMP_SCI 496: Learning in Networks, which focuses on fundamental statistical and computational limits to highlight recent research progress as well as ideas and techniques.
Vijayaraghavan’s COMP_SCI 496: Foundations of Reliable Machine Learning course examines different theoretical frameworks to model and reason about various kinds of noise, corruptions and errors in machine learning and statistical tasks. Students will also learn about different algorithmic techniques that are both efficient and robust in the context of both supervised and unsupervised learning.
Upcoming workshops and activities
Additional upcoming workshops include “Trustworthiness in the Presence of Adversaries and Strategic Agents in ML” on October 12 and “New Perspectives on Data Science with Imperfect Data” on November 16.
Liren Shan (CS PhD ’23), a research assistant professor at TTIC, is also organizing a weekly virtual reading group focused on learning with untrusted data.
This special program on trustworthy and reliable data science is part of IDEAL Phase II, which aims to accelerate transformative advances in the theoretical foundations of data science through research and education programs on machine learning and optimization; high-dimensional data analysis and inference; and emerging topics including reliability, interpretability, privacy, and fairness.