Encouraging Undergraduate Students to Pursue Data Science Research
A cohort of 30 first- and second-year undergraduate students from colleges and universities across the country attended the three-day “Get Ready for Research Workshop” this month, hosted by the Institute for Data, Econometrics, Algorithms, and Learning (IDEAL) at the University of Illinois Chicago (UIC).
The workshop introduced students to the process and practicalities of research; presented examples of research focus areas in the fields of computer science, data science, electrical engineering, mathematics, and statistics; and provided guidance for students to apply and thrive in the National Science Foundation (NSF) Research Experiences for Undergraduates (REU) program.
Event co-organizers included Julia Gaudio, assistant professor of industrial engineering and management sciences at Northwestern Engineering; Samir Khuller, Peter and Adrienne Barris Chair of Computer Science at the McCormick School of Engineering; and Will Perkins, associate professor of computer science at the Georgia Institute of Technology.
"Meeting all the undergraduate students from different universities was really exciting," Khuller said. "Not only did the workshop expose them to different topics of research related to IDEAL but also to the actual research process, as they got a chance to brainstorm in small groups mentored by graduate students. I hope that over the next 5-7 years I meet them again as they pursue research projects passionately as researchers and join PhD programs."
Through a mix of topic overviews, research highlights, panel discussions, and hands-on group research activities guided by graduate student mentors, the program organizers aimed to provide students both with a broad view of cutting-edge research topics aligned with the themes of IDEAL and an opportunity to build a network of their peers and connect with researchers in these fields.
"For students who haven't had research mentors yet, the landscape can seem out of reach and intimidating,” Perkins said. “This workshop gave these students the information, experience, and confidence to make their first forays into research. We're looking forward to running it again next year!”
IDEAL Phase 2 is accelerating transformative advances in the theoretical foundations of data science through research and education programs on machine learning and optimization, high-dimensional data analysis and inference, and emerging topics including reliability, interpretability, privacy, and fairness. Supported by a National Science Foundation Harnessing the Data Revolution: Transdisciplinary Research in Principles of Data Science award, IDEAL is a multidisciplinary collaboration across Northwestern, Google Research, the Illinois Institute of Technology (IIT), the Toyota Technological Institute at Chicago (TTIC), UIC, and the University of Chicago.
Sharing practical research experience
A panel of undergraduate and graduate students shared their perspectives about and lessons-learned from conducting research, including Amil Dravid (BS ’23), an incoming PhD student in artificial intelligence at the University of California, Berkeley; Xiaochun Niu, a PhD candidate in industrial engineering and management sciences at Northwestern Engineering; Shishir Adhikari, a PhD student in computer science at UIC; and Duan Tu, a PhD student in applied mathematics at UIC.
"I learned about different pathways in data science, computer science, and math research, and got a lot more insight than I had previously had as to what graduate school is like," said a workshop participant.
A second panel of students who participated in the NSF REU program provided insight into their experience, including Yifan Wu, a third-year PhD student in computer science at Northwestern Engineering advised by professor of computer science Jason Hartline; Frederic Koehler, a postdoctoral fellow at Stanford University; Jingling Li, a PhD student in computer science at the University of Maryland, College Park; Riley Murray, a project scientist at Lawrence Berkeley National Laboratory and principal investigator at the International Computer Science Institute; and Pascal Sturmfels, a PhD student at the University of Washington.
"I learned about the broad scope of research, and it was eye-opening to hear talks from people that have gone so far in their education because I got much more motivated to keep pursuing an advanced degree after my undergraduate degree completion," said a workshop participant.
Exploring research topics
Speakers presented technical talks on topics including auctions, collective behaviors, data analysis, graph algorithms, and networks.
Quanquan Liu, a postdoctoral scholar in the Northwestern CS Theory Group advised by Khuller, presented “Scalable Graph Algorithms: From Theory to Practice,” an overview of her research in parallel dynamic and static graph algorithms as well as differentially private graph algorithms. Liu described how modern graph algorithms are performed over massive datasets containing potentially sensitive information and must achieve several simultaneous goals, including efficiency, scalability, privacy, and robustness against adversaries.
Miklos Z. Racz, an assistant professor of computer science at Northwestern Engineering with a joint appointment in the Department of Statistics and Data Science at Northwestern’s Weinberg College of Arts and Sciences, discussed learning in networks. He highlighted research areas centered around understanding the structure and dynamics of networks, such as recovering communities in networks, the navigability of small-world networks, and the dynamics of viral information cascades.
Hartline, who is also director of Northwestern’s Online Markets Lab and a founding codirector of IDEAL Phase I, outlined the game theoretic analysis of auctions and the central question of whether auctions achieve good outcomes when bidders are strategic. He introduced an approach whereby the welfare of auctions in equilibrium can be analyzed without solving for equilibrium, which is especially important in complex auction environments like those of internet advertising.
Additional speakers and topics included:
- Adhikari — “Causal Inference for Policy Evaluation in Network Data” (with additional mentoring from Ahmed Sayeed Farukh, PhD student at UIC)
- Saba Ahmadi, postdoctoral researcher at TTIC — “Strategic Classification”
- Aditya Bhaskara, associate professor of computing at the University of Utah — “Online Algorithms for Data Analysis”
- Nick Christo, PhD student at UIC — “How to Find and Apply for Research Opportunities”
- Niu — “Community Detection on Geometric Random Graphs”
- Tian Wang, PhD student at UIC — “Approximating the Prime-Counting Function”
- Ming Zhong, assistant professor in applied mathematics at IIT — “Learning Collective Behavior in Networks”