Faculty Projects
GPU Servers for CS449 Deep Learning

Joe Hummel

Project Manager

Zach Wood-Doughty, Assistant Professor of Instruction, Computer Science

Amount Requested

$42,000

Summary

As neural networks and large language models have demonstrated transformational potential in a wide array of real-world applications, McCormick undergraduates have sought out opportunities to stay at the cutting edge of these technologies. Enrollments in CS449 Deep Learning have spiked accordingly — where the computer science department used to offer this course once a year, we now offer it every quarter (serving 197 students in 2023-2024) and still struggle to meet demand.

One particular challenge of teaching deep learning is the need for specialized hardware — GPUs — that can train large neural network models. Right now, students use whatever they can find: the small number of computer lab GPUs, free cloud credits, or their personal computers. Of particular concern, some students are able to achieve more than others simply because they are willing to spend more of their own money. While paying for GPUs is certainly not required to succeed in class, students have reported in anonymous surveys that they find it much more convenient than the free options available. We would like to provide a comprehensive solution that both teaches students how to work with high-quality GPUs as part of a server array, and allows for a level playing field of access to compute resources.

In the past, students have decided to abandon certain final project topics because of the lack of computational resources. Students, in anonymous surveys, have said the following:

  1. I think [better compute resources are] very much needed from a student's perspective. Not everyone has a PC with strong GPU power at home. After I finished up the 10 dollars computational units from Google Colab, I did not want to spend more money, I had to use the free version.
  2. I primarily used my own Nvidia RTX 4090 GPU for most of the training. While It definitely would be really expensive for a normal student, It has a lot of memory and a fast training speed that would benefit the students greatly for this class.
  3. I think [better compute resources] would even the playing field more for homework and just overall improve the experience of having to train models (which is the most tedious part of the course).
  4. I didn't spend any of my own money because I didn't want to, but if I had been able to it would've been hugely helpful. I honestly think it's ridiculous that Northwestern doesn't have resources set up for this. I tried the GPUs in the Wilkinson Lab, but [I couldn’t install necessary packages]. There should really be resources more easily available to students.
  5. I'm also not sure if cloud compute credits are worth it as they seem to run out really fast.

This project seeks to buy GPUs that students will have access to for all assignments in the class. This will ensure that all students have experience with the benefits (and challenges) of managing how their code makes use of GPUs, rather than only using compute resources through a web interface. We will introduce a homework assignment in which students will explore how decisions they make regarding model training (e.g., size of model, batch size) influence the GPU utilization and overall training time.

Planned Activities/Investments

Funding and vendor availability will influence our specific purchasing decisions, but we will buy at least six GPUs. Our goal is to maximize throughput and we would prefer 12 good GPUs to four great GPUs, so that more students can take advantage of the resources simultaneously. 

We are considering standalone tower servers from Dell or Lambda Labs, or rack-mounted servers purchased through the Quest HPC cluster. We will use the CS department’s matching funds to purchase GPUs over the summer to be used in my fall 2024 offering of 449; if provided, Murphy Society funding will allow us to dramatically expand the resources we can offer to students in the winter and spring quarters of next year, and for many years to come.

Impact

This project will impact all students in the CS449 deep learning class by allowing them to more rigorously explore the models they work with in their homework assignments and pursue more ambitious final projects with higher computational needs.

While this is a 400-level course, it is very much open to undergraduates. It was previously numbered 396 (special topics in CS) but was given the permanent course number of 449 to indicate that it follows 349 Machine Learning. The only prerequisite for 449 is 349, and we have had McCormick undergraduates take both courses in their first year. Based on statistics from CAESAR rosters across four quarters serving 300 students, roughly 56 percent of students in 449 are undergraduates.

We will evaluate the impact of this project through surveys of students’ experience and analysis of the GPU usage statistics. I taught 449 in Winter, Spring, and Fall 2023; each quarter I collected feedback on students’ usage and opinions of available computational resources. We will make the new GPUs purchased through this project as accessible as we can and follow-up on that usage with additional surveys. Because this class involves final projects that can sometimes require unique use cases of computational resources, we may conduct additional follow-up discussions with individual groups.

Sustainability

While GPU servers require maintenance, the options we are considering for purchase come with warranties that last at least three years. We will use departmental funds for future maintenance. We are interested in buying additional GPUs in the future to expand our computing resources (especially as we expand our course offerings) but this will be in addition to the current project, rather than a required maintenance of it.

Deliverables

We will purchase and install several GPUs, write a detailed instructional guide for students in 449, and help students use these GPUs for their homework assignments and final projects.

Budget Overview

 $84,000: GPU servers with included installation and warranty.

Note: we are requesting $42,000 because that is the amount for which we have matching departmental funds. If that amount of funding is not available, a smaller amount would still be helpful.

Total Budget Amount: $84,000

Matching Funds

$42,000 — Computer science department funds