tags:

views:

376

answers:

10

If we consider computer graphics to be the art of image synthesis where the basic unit is a pixel.

What is the basic unit of sound synthesis?

[This relates to programming as I want to generate this via a computer program.]

Thanks!

A: 

The byte, or word, depending on the bit-depth of the sound.

dhdean
WTF? "...But typically 16 or 24 bits are used" is worth 14 and mine is -1?
dhdean
+7  A: 

Computer graphics can also have vector shapes as basic units, not just pixels. Generally, vector graphics are generated via computer tools while captured data tends to appear as a grid of pixels (corresponding to an array of sensors in a camera or other capture device). Obviously there is considerable crossover between those classifications.

Similarly, there are sampled (such as .WAV) and generative (such as .MIDI) forms of computer audio. In the sampled case, the smallest unit is a single sample. Just like an array of pixels in the brightness, x- and y-dimensions come together to form an image, an array of samples in the loudness and time dimensions come together to form a sound. In the generative case, it will be something more like a single tone rendered in a particular voice just like vector graphics have paths drawn with particular textures.

David Winslow
The MIDI/vector analogy is an interesting one; I've never thought of it quite like that. With respect to the original post, though, the sample is a more direct representation of a 'unit of sound' (insofar as it's describing the actual waveform) The 3-byte MIDI word is beholden upon some tone generator to create the actual sound, as you pointed out. This is turning out to be a good question.
LesterDove
Tell me more about the vector case. I'm more interested in this and poorly phrased my original question.
anon
As usual, Wikipedia has lots of relevant facts and links. Here are a few pages to get you started: http://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Audio_synthesishttp://en.wikipedia.org/wiki/SVGhttp://en.wikipedia.org/wiki/MIDI
David Winslow
A: 

Probably the envelope. A tone/note has a shape described by: attack decay sustain release

Martin Beckett
Can you elaborate on this? I don't know any of the terms attack/decay/sustain/release in the sound world.
anon
this is a way of describing the life of a musical tone in synthesis, its amplitude shape over time
Justin Smith
+15  A: 

The basic unit is a sample

In a WAVE file, the sample is just an integer specifying where to move the speaker head to.

The sample rate determines how often a new sample is fed to the speakers (I'm not entirely sure how this part works, but it does get converted to an analog signal first). The samples are typically laid out in the file one right after another.

When you plot all the samples with x-axis being time and y-axis being sample_value, you can see the waveform.

In a wave file, samples can (theoretically) be any bit-size from 0-65535, which remains constant throughout the wave file. But typically 16 or 24 bits are used.

Wallacoloo
Sample is not a UOM, it's a unit of time relative to the sampling rate.
Aaronaught
I think 16 or 24 bits are typically used. I've never seen a 32-bit audio sample.
Gabe
@Gabe: 32-bit (integer and floating-point) samples are usually used in recording to retain as much quality as possible from the source audio before the final mixdown.
Jon Purdy
Jon: It's my understanding that the standard in recording is 24-bit/192kHz (the last couple bits are noise anyway, so there's no point in making a 32-bit ADC). It's only in processing that you would need 32-bit samples (so you can mix a thunderclap and a gentle breeze). Obviously they exist, but most users will never see them.
Gabe
A: 

Sound can be expressed as several different units, but the most common in synthesis/computer music is decibels (dB), which are a relative logarithmic measure of amplitude. Specifically they are normally relative to the maximum amplitude of the audio system.

When measuring sound in "real life", the units are normally A-weighted Decibels or dB(A).

The frequency of a sound (i.e. its pitch) is its amplitude over time, or in the digital world, its amplitude over samples. The number of samples per unit of real time is called the sampling rate; conventional hi-fi systems have sampling rates of 44 kHz (44,000 samples per second) and synthesis/recording software usually supports up to 96 kHz.

Everything sound in the digital domain can be represented as a waveform with the X-axis representing the time (or sample number) and the Y-axis representing the amplitude.

Aaronaught
A: 

frequency and amplitude of the wave are what make up sound. That is for a tone. Music or for that matter most noise is a composite of multiple simultaneous sound waves superimposed on one another.

That being said synthesis of music is a large field.

Romain Hippeau
+2  A: 

A pixel can have a value and be encoded in digital bitmap samples. The same properties apply to sound and digital audio samples.

A pixel is a physical device that can only render the amplitudes of 3 frequencies of light (Red, Green, Blue) at a time. A speaker is a physical device that can render the amplitudes of a wide range of frequencies (~40,000) at a time. The bit resolution of a sample (number of bits used to to store the value of a sample) mainly determines how many colors/tones can be rendered - the fidelity of the physical playback device.

Also, as patterns of pixels can be encoded or compressed, most patterns of sound samples are also encoded or compressed (or both).

Jeff Meatball Yang
A: 

Bitmapped graphics are based on sampling the amplitude of light in a 2D space, where each sample is digitized to a given bit depth and often converted to a logarithmic representation at a different bit depth. The samples are always positive, since you can't be darker than pure black. Each of these samples is called a pixel.

Sound recording is most often based on sampling the magnitude of sound pressure at a microphone, where the samples are taken at constant time intervals. These samples can be positive or negative with respect to perfect silence. Most often these samples are not converted to a logarithm, even though sound is perceived in a logarithmic fashion just as light is. There is no special term to refer to these samples as there is with pixels.

The Bels and Decibels mentioned by others are useful in the context of measuring peak or average sound levels. They are not used to describe the individual sound samples.

You might also find it useful to know how sound file formats compare to image file formats. WAVE is an uncompressed format specific to Windows and is analogous to BMP. MP3 is a lossy compression analogous to JPEG. FLAC is a lossless compression analogous to 24-bit PNG.

Mark Ransom
+1  A: 

The fundamental unit of signal processing (of which audio is a special case) would be the sample.

The frequency at which you need to sample a signal depends on the maximum frequency present in the waveform. Sampling theorem states that it is normally sufficient to sample at twice the frequency of the maximum frequency present in the signal.
http://en.wikipedia.org/wiki/Sampling_theorem
The human ear is sensitive to sounds up to around 20kHz (the upper frequency lowers with age). This is why music on CD is sampled at 44kHz.

It is often more useful to think of music as being comprised of individual frequencies.
http://www.phys.unsw.edu.au/jw/sound.spectrum.html
Most sound analysis and creation is based on this idea.

Related concepts:
Psychoacoustics: Human perception of sound. Relates to modern sound compression techniques such as mp3.
Fourier series: How complex waveforms are composed of individual frequencies.

Matthew S
A: 

If computer graphics are colored dots in 2 dimensional space representing a 3 dimensional space, then sound synthesis is amplitude values regularly partitioned in time representing musical events.

If you want your result to sound like music (the kind of music most people like at least), then you are either going to use some standard synthesis techniques, or literally waste decades of your life reinventing them from scratch.

The most basic techniques are additive synthesis, in which the individual elements are the frequencies, amplitudes, and phases of sine oscillators; subtractive synthesis, where you work with filter coefficients and a complex input waveform; frequency modulation synthesis, where you work with modulation depths and rates of stages of modulation; granular synthesis where short (hundredths to tenths of a second long) enveloped pieces of a recorded sound or an artificial waveform are combined in immense numbers. Each of these in practice uses parameters that evolve over the course of a note, and often you will mix elements of various techniques into a larger instrument.

I recommend this book, though it doesn't have the math for many concepts it at least lays the ground for the concepts used, and gives a nice overview of the techniques.

You wouldn't waste your time going sample by sample to do music in practice any more than you would waste your time going pixel by pixel to render 3d (in other words yeah go sample by sample if making a tool for other people to make music with, but that is way too low a level if you are interested in the task of making music).

Justin Smith