views:

419

answers:

10

How can you determine the best audio quality in a list of audio files of the same audio clip, with out looking at the audio file's header. The tricky part is that all of the files came from different formats and bit rates and they where all transcoded to the same format and bit rate. How can this be done efficiently?

+4  A: 

If I understand correctly, you have a bunch of audio files that started in different formats with varying quality. They've all been converted to the same format, so you can't use the header to figure out which ones were originally high quality and which ones weren't.

This is a hard problem. There are potentially a few tricks that could catch some quality problems, but detecting, say, something that was converted from a low-bitrate compression algorithm like MP3 would be very hard.

Some easy tricks:

  • Check the maximum amplitude - if it's low, the quality won't be good.
  • Measure the highest frequency - if it's low, the original might have had a lower sample rate.
dmazzoni
+1 you do understand the problem, and i agree it is difficult.
Rook
+1 for fairly simple tests for finding likely indicators of low quality
Stephen Denne
+4  A: 

If you have the original you can estimate how it was altered by estimating a transfer function. You will need to assume some model, maybe start with a low-pass filter, add some smudging (convolution) and then run an estimator to produce a measure of quality. You could look around on the wikipedia article on Estimation_theory

disown
+1 perhaps a hamming distance (http://en.wikipedia.org/wiki/Hamming_distance).
Rook
+1  A: 

I think that disown's answer is good, assuming that you are just trying to estimate a set of parameters. Unfortunately, you also have to define a comparison function for the parameters you have estimated.

What happens if two compressions have both applied a band-pass filter with equally large frequency ranges, but one of the admits higher frequencies than the other. Is one of them better? Which one?

The answer probably depends on which frequencies are being used more in the files you are working with.

An objective measure would be to see which file has lost less entropy. Unfortunately, this is not easy to do correctly.

Jørgen Fogh
Adding pure noise would even increase the entropy - that measure does not seem appropriate for perceptual audio quality.
Eamon Nerbonne
+9  A: 

I'm not a software developer (I'm an audio engineer) and what you hear when you compress with mp3 algorithms is: - less high frequencies: so you can check a loss in the energy of the higher range - distorted stereo: so you can make a Mid/Side matrix, and check for the THD in the Side - less phase coherency: maybe you can check that with a correlation meter

Hope it helps, it's a difficult task for a computer!

Luis Herranz
+1  A: 

I'm not too sure about this, but here's a good place to start:

http://en.wikipedia.org/wiki/Signal-to-noise_ratio

I don't think you can calculate SNR from one signal, but if you have a collection of signals then you might be able to work out the SNR comparing all of them.

There are some interesting links at the bottom of the page which could provide some routes of interest as well if that isn't possible.

Also, I'm not an audio engineer, but I know a little about signal processing, is there any way you can measure quantisation levels in audio signals? Perhaps something to look into.

Tom Gullen
+11  A: 

Many of the answers outlined here refer to common audio measurements such as THD+N, SNR, etc. However, these do not always correlate well with human hearing of audio artifacts. Lossy audio compression techniques typically function by increasing THD+N and SNR, but aim to do so in ways that are difficult for the human ear to detect. A more traditional audio measurement technique may find decreased SNR in a certain frequency band, but does that matter if there's so much energy in adjacent bands that no one would ever notice the difference?

The research paper titled "A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation" outlines an algorithm for quantifying the ability of the human ear to detect audible differences, based on a model of how the ear hears. It takes into factors that do correlate with audio quality as perceived by humans. The paper includes a study comparing their algorithm's results to subjective double-blind testing, to give you an idea of how well their model works.

I could not find a free copy of this paper but a decent university library should have it on file.

Implementing the algorithm would require some knowledge of audio signal processing in the frequency domain. An undergraduate with DSP experience should be able to implement it. If you don't have the reference waveform, you could use information in this paper to quantify how objectionable artifacts may be.

The algorithm would work on PCM audio, preferably time-aligned, and certainly does not require knowledge of the file type or header.

jbarlow
This only helps if you have the original file - right?
Eamon Nerbonne
The algorithm assumes a reference waveform is available, but many of the measurements don't necessarily require a reference waveform and could be applied generally. Another option would be to compute the quality measurement using every waveform as a trial reference. The waveform that yields the largest quality differences is the best one.
jbarlow
+8  A: 

First, I'm not an audio engineer, but I've been trying to keep in touch about audio compression in general because I have a big mp3 collection and I have some thoughts to share about the subject.

Is the best audio quality that you're looking for from an human perspective? If so, you can't measure by "objective means" like comparing spectograms and such.

If a spectogram is ugly, it doesn't necessarily mean the quality is terrible. What matters is if someone can distinguish an encoded file from an original source doing a blind test. Period. If you want to check the quality of an encoded audio track you have to conduct a blind ABX test.

LAME (and all other kinds of lossy MP3, AAC, AC3, DTS, ATRAC... compressors) is so called perceptual coder. It exploits certain facts about the nature of human audio perception. So, you cannot rely simply on spectrograms to evaluate its quality.

Source

Now, if your objectives are from objective manners/perspectives, you could use EAQUAL, which stands for Evaluation Of Audio Quality:

It's an objective measurement technique used to measure the quality of encoded/decoded audio files (very similiar to PEAQ)

(...)

The results, however when using objective testing methodologies are still inconclusive and mostly only used by codec developers and researchers.

...or Friedman statistical analysis tool.

(...) performs several statistical analysis on data sets, which is particularly suited for listening test data.

I'm not saying that spectrum analyzers are useless. That's why I posted some utilities. I'm just saying to be careful with all these statistical methods: as someone at the Hydrogenaudio community said once, You don't listen with your eyes. (check this thread I posted as well, it's a great resource). To really prove audio quality from an human perspective, you should test ears and not graphs.

This is a complicated subject, and IMHO I suggest you to look for a specialized audio community like Hydrogenaudio.

GmonC
+1 interesting, this looks like a viable solution.
Rook
+1  A: 

If the transcoded file is in a lossless format then you can revert the audio back to it's original sample and bit rate.

If the files have been transcoded from lossless to lossy, then it's probably impossible to understand it's previous sample and bit rates. I think either the algorithms and/or observation techniques for identifying any lossy compressed sample and bit rates from their originals are asking too much for a task that in the end isn't that useful anyway.

Unfortunately too in this case that lossless compression techniques are not widely used because the amount of compression they produce is relatively small.

Check out this really good article on the subject!

He says that: "A lossless compression technique is one that yields a compressed signal from which the original signal can be reconstructed perfectly."

Assuming that you are probably using lossy compressed audio data I'd say what your asking is very nearly impossible without much research and the developoment of a complex algorithm for detecting sample rate and bit rate correlations.

Plus I have to say: why do you want to do this? Is it to convert lossy mp3s back into high quality audio?

edit: That article also says "it is necessary to know, or be able to figure out, the sampling rate, resolution, signedness, and endianness of the data" in regard to interpreting audio data.

If the compression you're using is mp3 then it's virtually impossible unless it's a lossless compression used and no bit data were removed from the sample frames.

Good luck though I could be wrong!! :)

AlexW
+1  A: 

If you do not have the original audio, this is probably a lot of work; it's almost certainly fundamentally impossible in an absolute sense since you can't tell which track's peculiarities are intentional and which bogus. You may even have encodings from different recordings or mixes, in which case plain comparison is fairly meaningless in any case.

Thus, assuming you do not have the original, the best you can probably do is a heuristic approach - which will probably work quite well, but be a lot of effort to implement.

  • Invest in some audio-processing software and skill; use this to build software to identify common encoder defects heuristically based solely on the output. Such defects might be poor temporal locality of sound hits (suggestion overlarge windows in the compression), high correlation between left and right signals, limited frequency range, etc. (a person with the right experience can probably list dozens).
  • Rate the quality of the audio on each heuristic on some sliding scale.
  • Use common sense and as much time+people for testing as you have to weigh the various factors for relevance. For example, while it might be nice to have frequency reproduction up to 24Khz, it's not very important; on the other had lack of sharpness may be more annoying.

If you're lucky, someone's done the job before you, because this sounds like an expensive proposition.

Eamon Nerbonne
+1  A: 

A New Perceptual Quality Measure for Bit Rate Reduced Audio http://citeseer.ist.psu.edu/cache/papers/cs/15888/http:zSzzSzwww-ft.ee.tu-berlin.dezSzPublikationenzSzpaperszSzAES1996Copenhagen.pdf/a-new-perceptual-quality.pdf

Perceptual audio coding algorithms perform a drastic irrelevancy reduction in order to achieve a high coding gain. Signal components that are assumed to be unperceivable are not transmitted and the coding noise is spectrally shaped according to the masking threshold of the audio signal. Simple quality measures (e.g. signal to noise ratio, harmonic distortions), which can not separate these inaudible artefacts from audible errors, can not be used to assess the performance of such coders.

For the quality evaluation of perceptual audio codecs, appropriate measurement algorithms are needed, which detect and assess audible artefacts by comparing the output of the codec with the uncoded reference. A filter bank based perceptual model is presented, which yields better temporal resolution than FFT-based approaches and thus allows a more precise modelling of pre- and post-masking and a refined analysis of the envelopes within each filter channel.

See Also
http://academic.research.microsoft.com/Paper/201987.aspx?viewType=1

Robert Harvey
Wow, its good to see some solid research come from Microsoft.
Rook