Books+ Search Results

Wikipedia Spanish speech and transcripts

Title
Wikipedia Spanish speech and transcripts / Linguistic Data Consortium.
ISBN
1585639729
Publication
[Philadelphia, PA] : [Linguistic Data Consortium], [2021]
Physical Description
1 online resource
Local Notes
Access is available to the Yale community.
Notes
Authors: Carlos Daniel Hernández Mena, Iván Vladimir Meza Ruiz.
Data source: microphone speech.
Data type: software, text.
Applications: speech recognition.
LDC number: LDC2021S07.
Audio in Spanish with corresponding transcripts in Spanish.
Title from resource home page (LDC website, viewed September 14, 2021).
Access and use
Access restricted by licensing agreement.
Summary
"The audio is comprised of short recordings from Wikipedia articles read by 193 speakers (150 male, 43 female). Audio and transcripts were segmented and transcribed by native Spanish speakers. Audio is presented as 16kHz, 16-bit, single channel flac files. When uncompressed, they produce PCM wav files. Transcripts are contained in a single plain text file encoded as UTF-8. Speaker metadata is also provided."--LDC online catalog.
Format
Audio / Data Sets / Online
Language
Spanish
Added to Catalog
September 14, 2021
Contents
data file (contains the speech, transcripts and speaker information)
docs file (contains additional documentation and a file table).
Genre/Form
Data sets.
Speech corpora.
Text corpora.
Sound recordings.
Also listed under
Hernández Mena, Carlos Daniel, creator.
Linguistic Data Consortium, issuing body.
Citation

Available from:

Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?