LEADER 03941cim a2200649 i 4500001 15573935 005 20211223184630.0 006 m o h 007 cr||na|||||||| 007 sr|||||||||||| 008 200924p2018 paunnn o nn cze d 020 1585638501 024 8 0230685087970 |qISLRN 035 15573935 040 CtY |beng |erda |cCtY 041 0 czeslo 050 4 PG4074.5 090 yuldset 090 yuldsetsnd 245 00 Multi-Language conversational telephone speech 2011. |pCentral European / |cLinguistic Data Consortium. 264 1 [Philadelphia, PA] : |b[Linguistic Data Consortium], |c[2018] 300 1 online resource 336 computer dataset |bcod |2rdacontent 336 spoken word |bspw |2rdacontent 337 computer |bc |2rdamedia 338 online resource |bcr |2rdacarrier 347 audio file |2rdaft 347 |bFLAC 588 Title from resource home page (LDC website, viewed September 24, 2020). 506 Access restricted by licensing agreement. 590 Access is available to the Yale community. 500 Authors: Karen Jones, David Graff, Kevin Walker, Stephanie Strassel. 500 Data source: telephone conversations. 500 Applications: language identification. 500 LDC number: LDC2018S08. 546 In Czech and Slovak. 520 "Multi-Language Conversational Telephone Speech 2011 -- Central European was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 44 hours of telephone speech in two distinct language varieties of Central Europe: Czech and Slovak. The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related. Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected. All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data." --LDC online catalog. 650 0 Czech language |xSpoken Czech |xData processing. 650 0 Slovak language |xSpoken Slovak |xData processing. 650 0 Czech language |xDiscourse analysis. 650 0 Slovak language |xDiscourse analysis. 650 0 Automatic speech recognition. 650 0 Computational linguistics. 650 0 Corpora (Linguistics) 655 7 Data sets. |2lcgft 655 7 Speech corpora. |2lcgft 655 7 Sound recordings. |2lcgft 700 1 Jones, Karen, |ecreator. 710 2 Linguistic Data Consortium, |eissuing body. 852 80 |zOnline resource 856 40 |yOnline dataset |uhttps://ssrs.yale.edu/data/SSDA/ldc/LDC2018S08/ 856 42 |3Documentation |uhttps://catalog.ldc.upenn.edu/docs/LDC2018S08/ 901 PG4074.5 902 Yale Internet Resource |bYale Internet Resource >> None|DELIM|15554187 905 online resource 907 2020-09-24T11:26:14.000Z 946 DO NOT EXPORT.