Browse

Waveform Browser Version 2.1

A program to display, replay and convert waveforms using popular file formats; written by Mark Huckvale, Department of Phonetics and Linguistics, University College London, Gower Street, London WC1E 6BT, U.K.

The executable file and help files may be freely distributed providing they remain unmodified.

Introduction to Waveform File Browsing
Introduction to Waveform Formats
Waveform Format Dialogue Box
Introduction to File Formats
Speech and Hearing Science at UCL

Support

This program comes with no warranty and no support. Bug reports only may be sent to sfs@pals.ucl.ac.uk.

Introduction to Waveform File Browsing

Open: Opens a new waveform file, replacing the currently loaded waveform. Files may also be dragged and dropped onto the waveform window from the system file manager/explorer.
Save: Saves the whole contents of the current file to a new file. Format conversion operates inside here.
Form: Changes the programs understanding of the waveform format. This is useful if the file is straight binary or if the program can not work out the format from the file header.
Play: Replays the waveform region between the cursors (or edge of window if no cursors are displayed) through a Windows multimedia supported sound card.
Exit: Exits the program. Nothing will be saved.
Help: Brings up this help.
Zoom: Zooms the waveform display such that the edges match the current cursor positions. This displays more waveform detail and reduces the amount of waveform replayed. Zooms may be repeated, up to some resolution limit.
Pop: Undoes a zoom.

Waveform display

In the waveform display, time is displayed left to right, amplitude up and down. This is a 'cartoon' at screen resolution of the actual waveform. Use the left and right mouse buttons to position cursors.

Use the scroll bar to move earlier and later in a zoomed waveform. The arrows move by about one quarter of the current screen time, the scroll bar region by about one screen.

Files may be opened through the use of the Open dialogue or by dragging and dropping files from the File Manager. Files are always opened for read only and are not changed by zooming, format selection, replay or exiting the program.

Files may be saved to new names in new formats through the use of the Save dialogue. Here you need to specify the output file format required when you specify the file name.

Waveforms or parts of waveforms may be replayed if you have a Windows compatible sound card and an installed multimedia wave driver. Time cursors may be positioned with the left and right mouse buttons to delimit the region to replay. Waveforms are replayed directly from disk, so there is no limit to the amount of data that can be replayed at one time. On very slow disks (e.g. CD-ROM or floppy disk) or at very high replay rates gaps in replay may arise.

Introduction to Waveform Formats

This topic discusses the representation of analogue waveforms as a sequence of digital samples.

Definitions

Waveforms are stored as quantised instantaneous amplitude values at regularly spaced intervals of time. Each quantised amplitude value is called a sample. The time duration between samples is called the sampling interval. The number of sampling intervals per second is called the sampling rate. Waveforms may be monophonic with only one amplitude value per sampling interval, or stereophonic with two amplitude values per sampling interval (one for the left and one for the right channel).

Quantisation

Amplitudes are quantised (stored as integers) with a given dynamic range. Typically a range of 256 values is used for telephone-quality signals and computer games. This is 8-bit quantisation with a numerical range of -128 to +127 and giving about 48dB signal to noise ratio at best. For FM radio quality signals, a 12-bit quantisation is preferred, with a numerical range of -2048 to +2047 and giving 72dB signal to noise at best. For CD quality signals, a 16-bit quantisation is preferred, with a numerical range of -32768 to +32767, and providing a 96dB signal to noise at best.

8-bit quantisation provides a compact storage with 1 byte per sample per channel. 12-bit and 16-bit quantisation must use 2 bytes per sample per channel.

An approximation to 12-bit quantisation quality may be obtained by converting 12-bit samples to an 8-bit logarithmic representation of the amplitude. This allows larger amplitudes to be quantised more coarsely than small amplitudes, which fits well with our perception of sound. The 8-bit logarithmic form of 12-bit samples is called A-law or mu-law quantisation (minor variations on the theme).

Sampling rate

When analogue signals are converted to digital form by taking samples at each sampling interval, information about how the signal has changed within the interval is discarded. As a consequence there are an infinite number of possible waveforms that we can draw though a given set of sample values. These all go through the sampled amplitudes but vary somewhere else (see figure below). This folding of different signals to a common digital representation is known as aliasing - it must be avoided when signals are converted.

To avoid aliasing, the sampling frequency must be large enough to capture all the highest frequency components in the signal. It can be shown that if the highest frequency component is at a frequency F, then we can ensure that all aliases have frequency components greater than F (and hence can be filtered out) providing that we sample at a rate of at least twice F.

So to avoid aliasing, the signal is low-pass filtered to remove frequencies higher than some cut-off value F determined by the quality of the sound required by the application. The signal is then sampled at a rate at least twice the size of F. On playback, the signal is low-pass filtered at F again, and since the signal is known to have only components with frequencies below F, the digital signal is a unique representation and the exact analogue signal may be reconstructed from the digital samples. By this means a digital stream can provide a perfectly faithful representation of an analogue waveform.

For the highest quality signals, it is clear that we require frequencies up to the limit of human hearing to be represented. Thus F is chosen to be about 20,000 Hz (cycles per second). CDs choose a sampling rate of 44,100 samples per second as a consequence. DAT tape uses the higher rate of 48,000 samples per second.

A sampling rate of 8,000 samples per second is commonly found in telephone applications, where the highest frequency represented is only 3,500 Hz. For computer games, the rates of 11,025 (one quarter CD) and 22,050 (one half CD) are becoming the norm.

Many sound cards will support a wide range of sampling rates from 8,000 up to 44,100 samples per second; although some have only a handful of rates pre-programmed in to the card. The Browse program can not know which rates are supported by your card other than 11025, 22050, and 44100. Thus selecting rates in the program may not result in that actual frequency being used. To test your card, generate a waveform of, say, 1,000,000 samples and time its replay at various sampling rates.

Waveform Format Dialogue Box

File Format: The format name of the automatically detected file type is displayed here, along with the offset in bytes where the waveform data actually starts in the file. The latter may be edited if required, but the detected format always remains unchanged.
Channels: The channels field identifies single channel (monophonic) and two channel (stereophonic) waveforms.
Resolution: The resolution of a sample refers to its size in samples. For 8-bit signed samples means the range -128 to +127, with 0=silence, whereas for 8-bit unsigned, the range is 0 to 255, with 128=silence. 16-bit samples are always signed in the range -32768 to +32767, but may be stored as two bytes in low-high byte order (Intel) or in high-low byte order (Motorola). 12-bit A-law or mu-law resolution refers to 12-bit resolution compressed into 8-bits with a logarithmic coding.
Sample rate: The sample rate refers to the number of waveform samples to be replayed in one second. Limits are usually set by the design and performance of the sound card and PC. Typical range is 8000 to 48000 samples per second, with 11025, 22050 and 44100 being the most popular on PCs.
Save as default: Use the values specified when opening binary (headerless) files.

Introduction to File Formats

There is no single agreed format for the storage of waveform data in files, much as there is no single format for bitmap images. This topic discusses waveform file formats, and lists which formats are supported by the program.

Waveform parameters

Apart from the waveform samples themselves, a waveform file needs to contain information about the sampling rate at the very least. If a program can determine the number of samples from the size of the file, and providing the sample format is known, then there is a good chance that the signal can be replayed.

On the other hand, information that potentially could be stored includes:

Sampling rate
Number of samples
Number of bits per sample
Number of bytes per sample
Byte order (big-endian or little endian)
Number of channels
Compression scheme
Minimum required transfer rate (bytes/sec)
Titles, track information, annotations

Such information is typically stored in a file header along with some pointer to the start of the waveform data.

Supported Waveform File formats

The browse program can automatically identify the following formats:

RIFF (.WAV) files: This is the most common format for audio files on Microsoft Windows. Older files contain uncompressed data which can be read by the browse program. If the file can't be read, it is probably in a compressed format. You can change between WAV formats using the Sound Recorder application that comes with Windows.
Creative Labs (.VOC) files: The VOC format was developed by Creative Labs for its SoundBlaster cards. It is a simple format that includes a sampling rate coding designed to fit the timer chip on the SoundBlaster board!
Sun/Next audio (.AU) files: The AU format was developed by Sun and Next for their Unix systems. It is a simple fixed length header scheme. Many AU format files contain mu-law encoded waveforms (12-bit quantisation compressed logarithmically to 8-bits) since the older Suns had an 8-bit telephony converter.
SAM format description (.SAM) files: The SAM format is the scheme developed under ESPRIT project 2589 for databases of spoken recordings for speech research. This format comprises a text description file and one or more binary waveform files. You can either open the description file or a signal file. If the description file describes more than one audio file, only the first is opened.
NIST format (.NST) files: The NIST SPHERE format is the scheme developed by the the US National Bureau of Standards (as was then) for databases of spoken recordings for speech research.
Binary (.RAW) files: Binary or headerless files require the user to determine and set the waveform parameters manually, using the waveform format dialog box.

The browse program can currently save waveforms in the following formats:

RIFF (.WAV) format
Binary (.RAW) format

Other Waveform File formats

The ESPS format is the storage scheme used by Entropic Research for its X-Windows signal processing package.

The HTK format is the storage scheme used by the Hidden Markov Modelling Toolkit developed for speech recognition research by Cambridge University Engineering Department, England.

The SFS format is the storage scheme developed by the Department of Phonetics and Linguistics, UCL, for its SFS speech analysis tools.

The TIMIT format is the storage scheme developed by Texas Instruments and MIT for a database of spoken recordings for speech research.

Speech and Hearing Science at UCL

The Department of Phonetics and Linguistics at UCL has been at the forefront of teaching and research in Speech and Hearing Science for over 40 years. If the subject interests you, you may like to consider taking a taught course, such as:

BSc Speech Communication: a 3 year undergraduate programme in human spoken communication
MSc Speech and Hearing Sciences: a 1 year taught graduate programme in the science of normal human spoken communication.

Information about these can be found on our web site at http://www.phon.ucl.ac.uk/.