Connecting Deep Learning Developers with Sound Artists
TEAMuP received a $1.8 million Future of Work at the Human-Technology Frontier award from NSF
Digital Audio Workstations (DAWs) such as Audacity, ProTools, and GarageBand are the primary platforms used in the recording, editing, mixing, and producing of sound art. While these tools enable musicians and hobbyists to self-produce without requiring expensive sessions at professional recording studios, it can be incredibly time consuming or impractical for artists to work within DAWs to manually accomplish certain tasks, such as separating individual sounds out of recordings.
Deep learning — a subset of machine learning based on artificial neural networks — has greatly expanded audio generation and manipulation tools, including some that would be infeasible with traditional digital signal processing such as melody co-creation, automated upmixing, and replacing the timbre of one sound (e.g. human voice) in an existing audio recording with another (e.g. a violin).
Making deep learning models for audio processing easily available on DAWs has the potential to dramatically improve an artist’s abilities, musical expression, and productivity.
Northwestern Engineering’s Bryan Pardo and collaborators at the University of Rochester received a combined $1.8 million National Science Foundation Future of Work at the Human-Technology Frontier award to accelerate research in artificial intelligence (AI) for music production by enabling musicians to better leverage deep learning audio tools in the creation, performance, and dissemination of their music.
The project, titled “Toward an Ecosystem of Artificial-intelligence-powered Music Production (TEAMuP),” unites a multi-disciplinary team of investigators who specialize in music, audio-engineering, AI, learning sciences, business and entrepreneurship, ethics, and inclusion. TEAMuP aims to enable musicians to use deep learning tools to produce lower-cost, higher-quality music products to meet the growing demand for digital content.
Pardo is head of the Interactive Audio Lab (IAL), a codirector of Center for Human-Computer Interaction + Design, and professor of computer science in the McCormick School of Engineering and of radio, television, film in Northwestern’s School of Communication. The Northwestern arm of the TEAMuP project also includes IAL members Hugo Flores Garcia and Patrick O'Reilly, PhD students in computer science; and Aldo Aguilar, research assistant and undergraduate student in computer science.
Pardo explained that deep learning technology rarely makes its way into DAW-hosted plugins, due to barriers related to both the skill require to develop a novel deep learning model and an inability of many end-users to run open-sourced code produced by deep learning developers.
“Simplifying the deployment pipeline so that deep model developers can easily make models available to be used directly in the DAW could have a transformative impact on the range of creative tools available to sound artists, producers, and musicians,” Pardo said.
Building on their Deep Learning Tools for Audacity prototype as well as prior work in music-unmixing and transcription, the IAL team is creating an open-access software framework for Audacity — a free, accessible, open-source DAW — that connects deep learning audio domain researchers directly with end users of sound design tools, including speech detection, instrument labeling, and audio upscaling.
The framework allows researchers to easily share their models with users, both enriching the audio processing toolkit and streamlining otherwise time-intensive tasks. Musicians who apply the AI tools can then share their music productions with developers for further refinement of the models.
“Researchers building the next generation of machine learning models for media creation will be able to put these models directly into the hands of the artists,” Pardo said. “We hope this work will foster a virtuous feedback loop between model builders and the artists that apply these models to create sound art. The ability to quickly share models between model builders and artists should encourage a conversation that will deepen the work of both groups.”
The University of Rochester team, which includes principal investigators Raffaella Borasi, Zhiyao Duan (Northwestern PhD ’13), Jonathan Herington, and Rachel Roberts, will collaborate with the Northwestern group in developing new deep learning models for music transcription and the interactive composition of music.
After conducting and analyzing interviews and surveys with a diverse group of musicians, the University of Rochester team will also develop an education and outreach program to help practicing musicians incorporate deep learning tools into their workflow. The education component of TEAMuP includes a two-semester music and technology course, a summer camp designed for underrepresented students, and online instructional materials.
TEAMuP ultimately aims to better understand the factors and barriers that may affect musicians’ adoption of AI in their work and how this insight could be generalized to other occupations at the human-technology frontier.