LEADER 03941cim a2200649 i 4500001    15573935
005    20211223184630.0
006    m     o  h        
007    cr||na||||||||
007    sr||||||||||||
008    200924p2018    paunnn  o      nn   cze d
020    1585638501
024 8  0230685087970 |qISLRN
035    15573935
040    CtY |beng |erda |cCtY
041 0  czeslo
050  4 PG4074.5
090    yuldset
090    yuldsetsnd
245 00 Multi-Language conversational telephone speech 2011. |pCentral European / |cLinguistic Data Consortium.
264  1 [Philadelphia, PA] : |b[Linguistic Data Consortium], |c[2018]
300    1 online resource
336    computer dataset |bcod |2rdacontent
336    spoken word |bspw |2rdacontent
337    computer |bc |2rdamedia
338    online resource |bcr |2rdacarrier
347    audio file |2rdaft
347     |bFLAC
588    Title from resource home page (LDC website, viewed September 24, 2020).
506    Access restricted by licensing agreement.
590    Access is available to the Yale community.
500    Authors: Karen Jones, David Graff, Kevin Walker, Stephanie Strassel.
500    Data source: telephone conversations.
500    Applications: language identification.
500    LDC number: LDC2018S08.
546    In Czech and Slovak.
520    "Multi-Language Conversational Telephone Speech 2011 -- Central European was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 44 hours of telephone speech in two distinct language varieties of Central Europe: Czech and Slovak. The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related. Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected. All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data." --LDC online catalog.
650  0 Czech language |xSpoken Czech |xData processing.
650  0 Slovak language |xSpoken Slovak |xData processing.
650  0 Czech language |xDiscourse analysis.
650  0 Slovak language |xDiscourse analysis.
650  0 Automatic speech recognition.
650  0 Computational linguistics.
650  0 Corpora (Linguistics)
655  7 Data sets. |2lcgft
655  7 Speech corpora. |2lcgft
655  7 Sound recordings. |2lcgft
700 1  Jones, Karen, |ecreator.
710 2  Linguistic Data Consortium, |eissuing body.
852 80  |zOnline resource
856 40  |yOnline dataset |uhttps://ssrs.yale.edu/data/SSDA/ldc/LDC2018S08/
856 42  |3Documentation |uhttps://catalog.ldc.upenn.edu/docs/LDC2018S08/
901    PG4074.5
902    Yale Internet Resource |bYale Internet Resource >>  None|DELIM|15554187
905    online resource
907    2020-09-24T11:26:14.000Z
946    DO NOT EXPORT.