Saturday, March 22, 2014

Singing as the optimal educational medium - Part I Engineering

Audio signal can be presented as what is called a spectrogram, which is a way to show the distribution of audio frequencies over time. Along the x-axis is time, and y-axis is frequency, where color codes for intensity. Think of it as the equalizer bars you have while playing your favorite mp3, smeared over a page. This representation is very common in speech recognition and other audio-based analysis.
Music

Speech

Now think of different types of audio signals that humans use to convey messages. The first is speech. Speech is compose of short utterances, called words (dah!). However, when viewed in the spectrogram they look like vertical stripes, i.e. short in time, but complex in frequency. On the other hand, music without words are the complete opposite. They are long in time but narrow in frequency, since they represent something closer to "pure tones". Singing, i.e. speech with melody, is just in the middle.

What all of this have to do with "optimal" and "education"? In signal processing there is a notion called a "compact set", which is the minimal set of features you need to have in order to convey information. In other words, with these features you can code the most information and then send it to other people. It is "compact" because just a few features can generate a lot of information. It is "optimal" in the sense that you cannot have the same number of different features that can convey more information; any change in the features will result in loss of information. What are you talking about???

The project I'm suggesting is to analyze singing in the context of a compact set of auditory human information. If information is coded via the spectrogram, i.e. the information content is frequency over time, then I believe that singing, which contains both long sequences and high frequency content, can serve as the optimal compact set to convey that information. I think that the reason is that complex frequency conveys meaning, e.g. words, whereas the long temporal domain conveys sentiment, e.g. emotion. In human communication both are important and I believe it can be quantified.

One such suggested experiment is to have a questionnaire on the information conveyed via several types of communications, for example: "what did this person think?" "what did this person feel?" "what did this person try to convey to you?" The three conditions will be: (i) speaking; (ii) musical instrument and (iii) singing. Each condition will be of equal duration. My hypothesis is that the best answers will be with singing.

This type of experiments suggest that the optimal way to communicate is not talking, but rather singing. Can this be used in education? Wait for the next blogpost…

No comments:

Post a Comment