Books+ Search Results

CSR-II (WSJ1) Sennheiser

Title
CSR-II (WSJ1) Sennheiser / Linguistic Data Consortium.
ISBN
1585630314
9781585630318
Publication
[Philadelphia, Pennsylvania] : [Linguistic Data Consortium], [1994]
Physical Description
1 online resource
Local Notes
Access is available to the Yale community.
Notes
Title from publisher website.
Data source: Microphone speech.
Applications: Speech recognition.
LDC number: LDC94S13B.
Digital audio in the compressed archive file (TAR).
Content and documentation in English.
Access and use
Access restricted by licensing agreement.
Summary
Speech data mining. LDC94S13A - Complete CSR-II corpus LDC94S13B - CSR-II Sennheiser speech LDC94S13C - CSR-II Other speech *Data* The complete WSJ1 corpus contains approximately 78,000 training utterances (73 hours of speech), 4,000 of which are the result of spontaneous dictation by journalists with varying degrees of experience in dictation. The corpus contains approximately 8,200 conventional development test utterances (eight hours of speech), 6,800 of which are from spontaneous dictation. As with the pilot corpus, the entire corpus was collected using two microphones, so the amount of speech in the entire corpus is about 162 hours. In early 1993, a Hub and Spoke test paradigm was designed, calling for eleven test sets, each a specific variation of the basic or hub condition. The eleven Hub and Spoke Development and Evaluation Test sets each contain approximately 7,500 waveforms (eleven hours of speech). WSJ1 waveforms have been compressed by about 2:1 using the SPHERE-embedded Shorten compression algorithm developed at Cambridge University.
Format
Audio / Data Sets / Online
Language
English
Added to Catalog
May 13, 2020
Genre/Form
Speech corpora.
Sound recordings.
Databases.
Sound recordings.
Speech corpora.
Sound recordings.
Data sets.
Also listed under
Linguistic Data Consortium, issuing body.
Citation

Available from:

Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?