Photo by David Monje on Unsplash

Why Creativity is Essential in Deep Learning

Christopher McBride
5 min readDec 10, 2021

Deep learning is as much about creativity as it is about technical know-how. This has been an on-going theme in my own personal career and was reaffirmed by a recent Kaggle competition requiring a creative approach to data processing. After completing Jeremy Howard’s FastAI course, I wanted to test the package on a challenging competition data set from Kaggle, which was running the G2Net Gravitational Wave Detection competition¹, sponsored by the European Gravitational Observatory. Given my background in Physics I was immediately interested and committed to developing a solution with FastAI.

Overview

The premise of the competition was fairly simple; given a data set of simulated gravitational wave detector data from three different detectors and corresponding labels, build a model that can classify an observation (made simultaneously across the three detectors) of 2-second wave data as either noise or a valid gravitational wave observation. A single 2 second instance for one of the detectors sampled at 2048 Hz is shown in Figure 1.

G2Net Simulated Gravitational Detector Wave. Image by author.

The Approach

I already had some experience with building neural nets on top of waveform data previously while collaborating on a research paper using bird song to pre-train audio models (Ryan et al., Using Self-Supervised Learning of Birdsong for Downstream Industrial Audio Classification (2020), ICML 2020). The approach from that paper was to convert the wave forms into “images” using the Constant-Q Transform (CQT) and then utilize a convolutional neural network (CNN) architecture, well designed for classifying images. So that was going to be my approach for this task as well and I began by pre-processing my waveforms into CQT images. The Librosa package in Python has a function for this, but I found the code from the nnAudio package to be much faster in comparison. Figure 2 shows an example of what the waveform from Figure 1 looks like in CQT format.

Figure 2. G2Net Gravitational Wave CQT Image. Image by author.

There was one aspect of this approach that required a novel solution; the information necessary to correctly classify the data existed across all three detectors. A neural net seeing only one of these images at a time would perform extremely poorly, if at all. And this is where a creative solution was required; I could have stacked the images on top of each other, or performed some mathematical transformation of the data to combine the three detector stream, or some other solution. There are probably dozens of reasonable approaches that would make sense, and it isn’t immediately clear which may be the “best” if indeed it makes sense to apply that terminology in that case. Because, ultimately, any solution will also depend on other factors such as the neural network architecture and data augmentation transforms you may apply which will change how the data is processed and interpreted. Even if you were to attempt multiple different approaches, the number of permutations to test can quickly get out of hand and delay the development of a useful model; you end up getting stuck in the weeds.

For me, I wanted my solution to have a degree of elegance, symmetry, and simplicity, not out of technical necessity, but because of more subjective and philosophical reasoning. I decided it would make the most sense to interpret the three separate wave detector CQT images as different channels of the same image, assigning each to a different RGB color channel. Because each waveform was output in grayscale (Figure 2 shows a grayscale image as a heatmap), and the network architecture I had planned on using from FastAI was already expecting 3 channels, I figured having the data being interpreted as a three-channel RGB image would make a lot of sense and, of course, satisfy my desire for an elegant solution. Not only was this assumption correct, but it also actually ended up making some beautiful images (shown in Figure 3).

Figure 3. Example images of G2Net Wave Forms Converted into RGB Representations with Image Augmentation. Image by author.

As a side note, these images are at a low resolution, but are attractive enough that I’d consider attempting to develop other images using mathematical transforms as an artistic medium. Perhaps in some combination with fractals.

Conclusion

It is undeniable that a large part of the success of my result is due to the power of FastAI. At the time of this writing, my competition score, judged using a RocAuc metric, sits at .8434 while the current leader sits at .8863. The model I developed was built with around 90 lines of code, with most of that being data exploration not used to build the model itself. The fact that I was able to get so close to the leading score with such few adaptations to the FastAI defaults shows just how powerful it is, and just how essential creative problem solving is relative to deep knowledge of training neural networks. Currently published Kaggle notebooks on the problem include several hundred if not thousands of lines of code, with custom models, loss functions, and training loops. This, to me, is evidence that the road to developing a state-of-the-art model is not necessarily in the technical details, but in creative solutions to data processing or augmentation.

Updates may be made to the code which can be found on my GitHub here.

[1] G2Net Gravitational Wave Detection [dataset]. (June 30, 2021). European Gravitational Observatory — EGO. https://www.kaggle.com/c/g2net-gravitational-wave-detection/data. Data Access & Use: Competition Use and Non-Commercial & Academic Research

--

--