Auditory Shapes

Daniel Levitin, the author of This Is Your Brain on Music, mentions that he once worked for a company that worked on audio recognition software for determining the content of differently labeled mp3 files. Back in the days of Napster especially, the titles of similar audio files would vary widely by user. For example, “One” by Metallica might have dozens of permutations of its title – one.mp3, metallica-one.mp3, one_metalika.mp3, etc. So the software was good at distinguishing identical audio files with different names, but there still is no software to do what the human brain can do, and that is to identify different versions of a song. Humans can recognize melodies irrespective of arrangement, timbre, key, or tempo. Computers have a really hard time with that.

It’s a complex task of course, especially given the vast range of interpretations that jazz musicians[1] offer us. But even when Ella Fitzgerald forgets the words to “Mack the Knife” we still consider it a legitimate version of the song. As yet, no computer can. There’s too much information to weed out, and the underlying question the book’s author presents is “what makes melody so special?”

As I drove back from the laundromat yesterday, I wondered if the answer is “shape.” A melody is a kind of shape. It’s a relationship between intervals, and not notes, or instruments or anything else. I think the brain might most easily process and store basic shapes before anything else. Visually, a triangle is still a triangle regardless of its color, location, shading or background. So the same is true with “Joy to the World” – you can change the instrument, key, arrangement, whatever, but as soon as you alter even one interval, it really ceases to be “Joy to the World.”

A similar visual parallel is CAPTCHA images that prevent spam. You can recognize the letters regardless of the colors, shape mutation and other visual distortions. But thus far computers have a hard time doing the same.

There’s still no solid consensus on how the brain does this. It actually ventures in the realm of philosophy and Wittgenstein’s famous problem with definitions and rules. We define things based on a loose set of characteristics, and computers just aren’t that loose yet. A great example came yesterday when Heath corrected my application of the term “tank” to this picture. It’s actually a self-propelled howitzer, although it carries all the conventional traits one would associate with a tank (treads, turret, armor, cannon). The key difference is their use. A howitzer is a long-range piece of artillery and it doesn’t perform the tasks that tanks perform. Still, there’s always a point at which strict definitions fail us. Nothing can ever be fully, explicitly defined.

So, how does the brain define anything? I wish I knew for sure, but I suspect from my own experience that the brain makes a vague constellation of features and works from that. The esteemed Dr. Odegard pointed me in the direction of what he referred to as “prototypes that represent the central tendencies of a category or stimulus set.” Not quite Platonic forms (one ideal against which all are judged), these are items that more or less resemble each other, and which may fit into multiple categories[2].

Complicating all this is the fact that our brains are great at filling in missing information and making assumptions based on previous experience. The famous email forward that points out that the brain can still read words whose interior letters have been scrambled is a great example. You can sitll udnersatnd tihs sentnece, for exmalpe. So, too, you might recognize “Mack the Knife” when the pianist has created an improvisational intro around the melody.

Simplistically put, I’m guessing our brains recognize general shapes first and add attributes later, factoring in variations from experience. Whether that shape is a triangle or a G# triad, maybe it’s still just a shape to the brain.

1.) Indeed, it seems as though the job of a jazz musician is to see just how much they can get away with in terms of playing around a melody or chord progression and still have people recognize the tune.
2.) This then reminded me of the shopping cart software that we used at Epoch Online. It allowed for the assigning of multiple categories to individual products, as well as various options assigned to each product that the user could select (color, size, version). The actual product exists in one place in the database, but has these variables attached to it.

2 thoughts on “Auditory Shapes”

  1. So, to add another layer to your very cool discussion, how about this:

    I have a better memory than many for music (and really sounds in general). I joke that my brain is like a tape recorder… an old one with really crappy frequency response, but a tape recorder nonetheless. 🙂

    Given that, you’d think I would be very good at recognizing songs, but I’m actually slower than many at recognizing familiar songs performed in unfamiliar formats/instrumentations. For example, there have been several times in the past when I’ve taken half a song to recognize a VERY familiar song being performed by a band at a marching competition. Is my tape-recorder-like memory for music actually working _against_ me here? I’ve pondered that one on many occasions.

  2. Interesting. You apparently have better timbral memory than structural/pitch memory. Something that I’m afflicted with, despite my sensitive hearing and apparent talent at reproducing accents, is an inability to instantly discern familiar voices on the telephone. If I call my aunt’s house, I can’t tell her voice from her daughter’s. Or if I call a client, I’m not always certain if it’s them answering the phone or someone else. That’s probably just something that would improve with repetition. BUT, I also have a really hard time hearing people on my cell phone in a loud room where others apparently don’t have as hard a time. I wonder if my brain is too sensitive sometimes.

Comments are closed.