A program to display, replay and convert waveforms using popular file formats; written by Mark Huckvale, Department of Phonetics and Linguistics, University College London, Gower Street, London WC1E 6BT, U.K.
The executable file and help files may be freely distributed providing they remain unmodified.
In the waveform display, time is displayed left to right, amplitude up and down. This is a 'cartoon' at screen resolution of the actual waveform. Use the left and right mouse buttons to position cursors.
Use the scroll bar to move earlier and later in a zoomed waveform. The arrows move by about one quarter of the current screen time, the scroll bar region by about one screen.
Files may be opened through the use of the Open dialogue or by dragging and dropping files from the File Manager. Files are always opened for read only and are not changed by zooming, format selection, replay or exiting the program.
Files may be saved to new names in new formats through the use of the Save dialogue. Here you need to specify the output file format required when you specify the file name.
Waveforms or parts of waveforms may be replayed if you have a Windows compatible sound card and an installed multimedia wave driver. Time cursors may be positioned with the left and right mouse buttons to delimit the region to replay. Waveforms are replayed directly from disk, so there is no limit to the amount of data that can be replayed at one time. On very slow disks (e.g. CD-ROM or floppy disk) or at very high replay rates gaps in replay may arise.
This topic discusses the representation of analogue waveforms as a sequence of digital samples.
Waveforms are stored as quantised instantaneous amplitude values at regularly spaced intervals of time. Each quantised amplitude value is called a sample. The time duration between samples is called the sampling interval. The number of sampling intervals per second is called the sampling rate. Waveforms may be monophonic with only one amplitude value per sampling interval, or stereophonic with two amplitude values per sampling interval (one for the left and one for the right channel).
Amplitudes are quantised (stored as integers) with a given dynamic range. Typically a range of 256 values is used for telephone-quality signals and computer games. This is 8-bit quantisation with a numerical range of -128 to +127 and giving about 48dB signal to noise ratio at best. For FM radio quality signals, a 12-bit quantisation is preferred, with a numerical range of -2048 to +2047 and giving 72dB signal to noise at best. For CD quality signals, a 16-bit quantisation is preferred, with a numerical range of -32768 to +32767, and providing a 96dB signal to noise at best.
8-bit quantisation provides a compact storage with 1 byte per sample per channel. 12-bit and 16-bit quantisation must use 2 bytes per sample per channel.
An approximation to 12-bit quantisation quality may be obtained by converting 12-bit samples to an 8-bit logarithmic representation of the amplitude. This allows larger amplitudes to be quantised more coarsely than small amplitudes, which fits well with our perception of sound. The 8-bit logarithmic form of 12-bit samples is called A-law or mu-law quantisation (minor variations on the theme).
When analogue signals are converted to digital form by taking samples at each sampling interval, information about how the signal has changed within the interval is discarded. As a consequence there are an infinite number of possible waveforms that we can draw though a given set of sample values. These all go through the sampled amplitudes but vary somewhere else (see figure below). This folding of different signals to a common digital representation is known as aliasing - it must be avoided when signals are converted.
To avoid aliasing, the sampling frequency must be large enough to capture all the highest frequency components in the signal. It can be shown that if the highest frequency component is at a frequency F, then we can ensure that all aliases have frequency components greater than F (and hence can be filtered out) providing that we sample at a rate of at least twice F.
So to avoid aliasing, the signal is low-pass filtered to remove frequencies higher than some cut-off value F determined by the quality of the sound required by the application. The signal is then sampled at a rate at least twice the size of F. On playback, the signal is low-pass filtered at F again, and since the signal is known to have only components with frequencies below F, the digital signal is a unique representation and the exact analogue signal may be reconstructed from the digital samples. By this means a digital stream can provide a perfectly faithful representation of an analogue waveform.
For the highest quality signals, it is clear that we require frequencies up to the limit of human hearing to be represented. Thus F is chosen to be about 20,000 Hz (cycles per second). CDs choose a sampling rate of 44,100 samples per second as a consequence. DAT tape uses the higher rate of 48,000 samples per second.
A sampling rate of 8,000 samples per second is commonly found in telephone applications, where the highest frequency represented is only 3,500 Hz. For computer games, the rates of 11,025 (one quarter CD) and 22,050 (one half CD) are becoming the norm.
Many sound cards will support a wide range of sampling rates from 8,000 up to 44,100 samples per second; although some have only a handful of rates pre-programmed in to the card. The Browse program can not know which rates are supported by your card other than 11025, 22050, and 44100. Thus selecting rates in the program may not result in that actual frequency being used. To test your card, generate a waveform of, say, 1,000,000 samples and time its replay at various sampling rates.
There is no single agreed format for the
storage of waveform data in files, much as there is no single format for bitmap
images.
This topic discusses waveform
file formats, and lists which formats are supported by the program.
Apart from the waveform samples themselves, a waveform file needs to contain information about the sampling rate at the very least. If a program can determine the number of samples from the size of the file, and providing the sample format is known, then there is a good chance that the signal can be replayed.
On the other hand, information that potentially could be stored includes:
Such information is typically stored in a file header along with some pointer to the start of the waveform data.
The browse program can automatically identify the following formats:
The browse program can currently save waveforms in the following formats:
The ESPS format is the storage scheme used by Entropic Research for its X-Windows signal processing package.
The HTK format is the storage scheme used by the Hidden Markov Modelling Toolkit developed for speech recognition research by Cambridge University Engineering Department, England.
The SFS format is the storage scheme developed by the Department of Phonetics and Linguistics, UCL, for its SFS speech analysis tools.
The TIMIT format is the storage scheme developed by Texas Instruments and MIT for a database of spoken recordings for speech research.
The Department of Phonetics and Linguistics at UCL has been at the forefront of teaching and research in Speech and Hearing Science for over 40 years. If the subject interests you, you may like to consider taking a taught course, such as:
Information about these can be found on our web site at http://www.phon.ucl.ac.uk/.