Home | Overview | Documentation | Download | MATLAB API | HowTo... | FAQ | Feedback |
Speech Filing System
Frequently Asked Questions
Contents
1. Capabilities
- What audio file formats are supported?
- What environment variables are available?
- What does 'labels file out of date' mean?
- What do all these '-i' switches mean?
2. How To ...
- How can I display/print spectrograms?
- How can I view part of large file?
- How can I annotate a signal file?
- How can I get a fundamental frequency trace?
- How can I get an energy trace?
- How can I get a set of formant estimates?
- How can I filter the signal?
- How can I change the sampling rate?
- How can I change the speed of the signal?
- How can I import data from some other source?
- How can I export a signal or annotations?
- How can I perform semi-automatic annotation?
1. Capabilities
What audio file formats are supported?
SFS maintains its own file format for data. It needs this because it maintains a processing history of each data set; this allows a user to keep track of the origin and processing of any piece of data. SFS also tries to keep data sets together in a single file, to try and make the user interface simpler. This means that the SFS file format must allow multiple copies of multiple types of data in a single file; and this precludes the use of other file formats.
To deal with other data file formats, SFS provides utilities for importing and exporting data. For importing signals, it is often unnecessary to make a new physical copy of the signal; instead, a command 'slink' simply records in an SFS file the instructions for how and where to access the data in its original format. This makes access to large read-only databases of data very convenient.
SFS can link to or read speech signals in the following file formats:
- binary files
- WAV format (RIFF format)
- VOC format
- AU format
- ILS format
- AIFF format
- HTK format
The easiest way to import data in SFSWin is to start a new document then select Item/Import/Speech (Copy). Locate the file and SFS will try to determine the type of file automatically. If this fails, use Item/Import/Speech (Link) and enter the appropriate file type options.
SFS can write speech signals to files in the following file formats (using *list programs for different data types):
- binary files
- WAV format (RIFF format)
- VOC format
- AU format
- AIFF format
- ILS format
- ESPS format
- HTK files (waveform, coefficient and annotations)
- common label file formats
In SFSWin look under Tools/Speech/Export. SFS can also read and write many other data sets from/to a textual representation.
What environment variables do I need to set?
SFSWin does not require any environment variables to be set. However the following may be useful when running individual programs from the command line.
Variable | Example Settings | Meaning |
SFSBASE | /progra~1/sfs | Installation directory for SFS |
GTERM | winprint | Send display to printer |
bitmap | Send display to bitmap (DIG.GIF) | |
GPRINT | LPT1 | Send postscript to hardware port LPT1 |
bitmap | Send print output to bitmap file (DIG.GIF) | |
eps | Send print output to EPS file (DIG.EPS) | |
GSIZE | widthxheight | Set bitmap output to width pixels by height pixels (e.g. GSIZE=1024x768) |
What does 'labels file out of date' mean?
SFS uses a text file to convert processing histories into English descriptions. By default this is $(SFSBASE)/data/labels. This file is indexed to make it fast to access. When it is installed on a new machine, its date may be updated and SFS thinks that the file has been changed but not re-indexed.
Solution: run the prolab program on the labels file:
-
prolab $(SFSBASE)/data/labels
The labels file is described in the User Manual. Check that this text file is in Unix format for Unix machines, and in DOS format for DOS machines - formats have been confused in the past.
What do all these -i switches mean?
An SFS file can contain many different data sets; it can contain multiple speech signals, annotations, formant or fundamental frequency data, etc. SFS uses this grouping of data to maintain a 'processing history', a record of the antecedents of each data set (or 'item'). To refer to a particular piece of data within an SFS file, every SFS program understands a common 'item numbering', and the '-i' switches specify the item number to the program.
Item number are made up from two components: a major data type code and a simple count code. The most common major types are listed below:
Major type | Mnemonic | Description |
1 | SP | Speech pressure waveform |
2 | LX | Laryngograph waveform |
3 | TX | Larynx period data |
4 | FX | Fundamental frequency data |
5 | AN | Annotations |
7 | SY | Synthesizer control data |
9 | DI | Grey-level display data |
10 | CO | Spectral coefficients |
12 | FM | Formant estimates |
16 | TR | Parameter tracks |
The count code simply records the index number of the data type in the file. If there are two speech items then they will have count codes of 1 and 2.
An item number then, consists of a major type, a period and a count code; e.g. 1.01 or 10.05, corresponding to the first speech item in the file and the fifth coefficient item. Since numbers are hard to remember, the major type numbers may also be replaced by the two-letter mnemonics in the table above; e.g. sp.01 or co.05. Note that the use of a leading zero for the count code is optional.
A given SFS program, then, that processes a single data set needs to be able to identify which data set from a given file to use as input. If there is only one data set in the file of the appropriate type for the program, then the program uses that automatically. If there is more than one data set of the input type, the program will usually select the last item of the appropriate type. However if this is not what you want, you need to tell the program which item to process using the -i <item number> switch.
Take as an example that you want to compare a piece of speech low-pass filtered at 2000Hz with it high-pass filtered at 2000Hz. The file starts with a single speech item, numbered 1.01. This is then processed by genfilt:
-
genfilt -l 2000 file.sfs
Which generates an item 1.02 in the file. However the command
-
genfilt -h 2000 file.sfs
will not generate the second filtered signal as you might have wished. Genfilt in this instance will take as its input item 1.02, which (we know) has been filtered already. Instead we need the command.
-
genfilt -i1.01 -h 2000 file.sfs
Which processes the original signal instead as we required.
There exists a short hand for the first and the last items of a given type. The first item in the file of a given type may be selected by using an item number made up from the major type followed by a period only, the last item may be selected by using the major type only. Thus 'sp.' refers to the first speech item, 'sp' refers to the last. Thus the last command could equally have ben written:
-
genfilt -isp. -h 2000 file.sfs
How to ...
How do I display/print spectrograms?
The main display program Eswin has the capability of calculating, displaying and printing spectrograms as you work. To start up Eswin from the command line with display of a speech waveform and a spectrogram, use:
-
Eswin -isp -gsp file.sfs
Eswin has menu options to produce a hard-copy of the signal displayed on the screen.
The program sprint will also print spectrograms directly to the printer. The programs esection, espect and esform will display and print spectrograms with spectral cross-sections.
How do I view part of large file?
Use the '-s' and '-e' switches on Eswin to specify the initial starting and ending times displayed. Since Eswin attempts to read an entire data set into memory before displaying it, it is necessary to specify the initial times for very large files. It is still possible to scroll forwards and backwards in time, but impossible to zoom out to longer times than initially specified.
For example:
-
Eswin -s100 -e130 bigfile.sfs
How can I annotate a signal file?
Use the annotation capabilities of Eswin. Display the items you wish to annotate, then select Annotation/New and enter a suitably descriptive name for the set of annotations. Each set of annotations in a file should be given a different name.
Then position the left cursor at the time at which you want the annotation to be set. Press the 'A' key, type in the annotation text, and then press RETURN.
When the annotation box is closed, the annotations are saved to the SFS file.
To edit an existing set of annotations, start up Eswin with the -l labelname option; this is available under Tools/Speech/Edit in SFSWin. Alternatively, re-enter the current name for the annotation set under Annotation/New in Eswin itself.
How can I get a fundamental frequency trace?
The program fxanal is currently the best SFS program for this. It can be found under Tools/Speech/Analysis in SFSWin. The programs fxac and fxcep provide autocorrelation and cepstral methods for fundamental frequency estimation from speech signals. They have a default set of parameters that work pretty well on a clean signal, but they do not have the post-processing built into fxanal.
How can I get an energy trace?
The program envelope provides a method for generating a TRACK item from a speech signal. Look under Tools/Speech/Analysis/Energy Envelope
How can I get a set of formant estimates?
The program fmanal provides a set of formant estimates from a speech signal. Look under Tools/Speech/Analysis/Formants. This has a default set of parameters that work pretty well for clean 10kHz sampled speech signals.
How can I filter the signal?
The program genfilt provides general-purpose low-pass, high-pass, band-pass and band-stop filters using recursive digital filter designs. Look under Tools/Speech/Process/Filtering.
Low-pass at 100Hz:
- genfilt -l 100 file.sfs
High-pass at 2000Hz:
- genfilt -h 2000 file.sfs
Band-pass between 300 and 3500Hz:
-
genfilt -h 300 -l 3500 file.sfs
Band-stop between 3000 and 4000Hz:
- genfilt -l 3000 -h 4000 file.sfs
How can I change the sampling rate?
The program resamp provides a general purpose interpolation/decimation facility for changing sampling rates by small integer ratios. Look under Tools/Speech/Process/Resample.
For example:
- resamp -f 44100 file.sfs
How can I change the speed/pitch of the signal?
The program respeed provides a general purpose retiming facility for speeding-up or slowing down speech without changing the pitch. Look under Tools/Speech/Process/Speed Change.
The program repitch provides a special purpose method for changing speed AND pitch, but requires a set of pitch epoch annotations (try pp and txan if you don't have a Laryngograph). Look under Tools/Speech/Process/Pitch Change.
How can I import data from some other source?
There is a general purpose signal import program called cnv2sfs which knows about a wide range of file formats. You can access this in SFSWin under Item/Import/Speech (Copy). A disadvantage of this approach is that it copies the original data which may take up unnecessary disk space.
The program slink knows about a smaller range of file formats, but it does not copy the data. Instead it records a 'link' in the SFS file pointing to the original data file. Access this in SFSWin under Item/Import/Speech (Link).
To import into an SFS file from the command line, the empty SFS file must be created first - this is your opportunity to identify the speaker, source and utterance to the system. Use the 'hed' program to create an empty SFS file:
-
hed newfile.sfs
and answer the questions, or
-
hed -n newfile.sfs
for the truly lazy.
The cnv2sfs program has no options:
-
cv2sfs myfile.wav newfile.sfs
To use slink to link into a binary file with 16-bit samples in natural byte order at 20000 samples/sec:
-
slink -isp -f20000 ipfile.dat opfile.sfs
-
slink -isp -tWAV ipfile.wav opfile.sfs
How can I export a signal or annotations, etc?
The program splist allows the export of signals in binary files and other formats.
The program sfs2wav creates Windows compatible .WAV files and supports multiple channels.
The program anlist creates text representations of annotations.
The programs sylist, colist, fmlist, trlist, etc export other data sets.
Refer to the manual pages for these programs for details of export formats supported.
Look under Tools/Item/Export in SFSWin for guidance with each item type.
How can perform semi-automatic annotation?
The SFS programs andict and annotate allow you to align a known transcription to a signal. These programs provide a rather simple service but which is surprisingly effective for automatic annotation of controlled material.
The program andict builds a dictionary of annotated segments from a portion of training material. The program annotate takes the dictionary and a known transcription and finds the best alignment to the recording, saving the resulting annotations. Here is the procedure, step-by-step:
- Collect all the speech files and write down transcriptions of all the recordings. Agree a suitable ASCII notation for all segment types.
- Process all the speech files through a suitable spectral analysis.
Typical SFS programs might be voc19, voc26 or mfcc. For example:
-
apply "mfcc -n12 -d1 -e -r200" *.sfs
- Annotate by hand a suitable representative sample of the material.
You need at least one annotation per segment type used in your transcriptions.
The more the better. Use Es or Eswin for the annotating. For example
-
Eswin -l segments file.sfs
- Load the hand annotated samples into the dictionary. First create an empty SFS file to act as the dictionary, then use andict on all the hand-annotated files:
-
hed -n dict.sfs
apply "andict -3 dict.sfs" *.sfs - Create a batch file/shell script for each of the files to be annotated,
specifying the transcription to be used and call the annotate program for each file:
-
annotate -d dict.sfs -t "D I s I z @ t e s t" test.sfs
Alternatively you can save the transcriptions in the SFS file header or in a separate text file.
Automatic annotation works best for separate sentences. If you manually correct the automatic annotations, why not add those corrected annotations to the dictionary!