Books+ Search Results

BOLT Egyptian Arabic-English word alignment : SMS/Chat training

Title
BOLT Egyptian Arabic-English word alignment : SMS/Chat training / Linguistic Data Consortium.
ISBN
1585639125
Publication
[Philadelphia, PA] : [Linguistic Data Consortium], [2019]
Physical Description
1 online resource
Local Notes
Access is available to the Yale community.
Notes
Applications: automatic content extraction, machine translation.
Authors: Xuansong Li, Stephen Grimes, Stephanie Strassel.
Data source: text chat conversations.
Data type: text.
LDC number: LDC2019T18.
Egyptian Arabic and English.
Title from resource home page (LDC website, viewed October 6, 2020).
Access and use
Access restricted by licensing agreement.
Summary
"BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training was developed by the Linguistic Data Consortium (LDC) and consists of 349,414 words of Egyptian Arabic and English parallel text enhanced with linguistic tags to indicate word relations. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. This release consists of Egyptian Arabic source text message and chat conversations collected using two methods: new collection via LDC's collection platform, and donation of SMS or chat archives from BOLT collection participants. The source data is released as BOLT Egyptian Arabic SMS/Chat and Transliteration (LDC2017T07). The BOLT word alignment task was built on treebank annotation. Specifically, Egyptian Arabic source tree tokens were automatically extracted from tree files in LDC's BOLT Egyptian Arabic Treebank. Those tree files had been tagged for part-of-speech and syntactically annotated. That data was then aligned and annotated for the word alignment task." --LDC online catalog.
Variant and related titles
Broad Operational Language Translation Egyptian Arabic-English word alignment : SMS/Chat training
Format
Books / Data Sets / Online
Language
Arabic; English
Added to Catalog
October 06, 2020
Genre/Form
Data sets.
Text corpora.
Also listed under
Li, Xuansong, creator.
Linguistic Data Consortium, issuing body.
Citation

Available from:

Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?