Andrew Crotty Receives Google Research Scholar Program Award
The award will support research advancing deep learning algorithms for compressing large datasets
In the big data era, algorithms for efficient data compression have become increasingly crucial, both for transmission over the network and long-term archival. Most traditional compression techniques, however, treat data merely as sequences of bytes, missing the inherent relationships that exist in real-world datasets.
Northwestern Engineering’s Andrew Crotty, assistant professor of computer science, has received a 2024 Google Research Scholar Program Award for his work developing novel deep learning compression algorithms designed to leverage semantic relationships inherent in datasets, such as functional dependencies or correlations between attributes.
Consider a dataset containing customer shipping addresses, for example. The city and state can be uniquely inferred from the ZIP code, meaning these fields can be safely discarded without losing any information. Crotty’s work aims to identify these types of relationships automatically, even when they are subtle or complex, to help compress datasets in ways that were previously impossible.
The Google Research Scholar Program supports the advancement of world-class research by early-career faculty members in fields including algorithms and optimization, human-computer interaction, machine learning and data mining, natural language processing, privacy, quantum computing, security, and systems. Through the program, Google aims to facilitate connections among junior faculty and encourage the formation of long-term collaborative relationships.
Crotty will use the award during the 2024 academic year for his project titled “Deep Semantic Modeling for Compressing and Querying Big Data,” which builds on the DeepSqueeze deep semantic compression framework that he began working on as a postdoctoral researcher at Brown University.
DeepSqueeze uses a type of deep neural network called an autoencoder to efficiently capture complex relationships among categorical and numerical attributes in large datasets. The tool also supports guaranteed error bounds for lossy compression of numerical data and integrates seamlessly with common columnar compression formats.
“Our experimental evaluation used real-world datasets to demonstrate that DeepSqueeze can achieve more than a 4X reduction in size compared to state-of-the-art alternatives,” Crotty said.
Crotty aims to advance the functionality of DeepSqueeze by allowing users to directly query the compressed data without first having to decompress it, which can be expensive.
“Our work could have tangible benefits for widely used object storage and data warehousing systems, including Google Cloud Storage and BigQuery,” Crotty said.
Prior to joining Northwestern in fall 2022, Crotty was a postdoctoral researcher jointly appointed in the computer science departments at Carnegie Mellon University and Brown University. Previously, he served as a postdoctoral researcher in the Data Science Initiative at Brown University, where he also earned a PhD in computer science in 2019.