GALE Phase 4 Arabic broadcast conversation : parallel sentences
Title
GALE Phase 4 Arabic broadcast conversation : parallel sentences / Linguistic Data Consortium.
ISBN
1585637505
Publication
[Philadelphia, PA]: Linguistic Data Consortium, 2016.
Physical Description
1 CD-ROM ; 4 3/4 in
Local Notes
Access is available to the Yale community.
Notes
"LDC2016T11."
Applications: Machine translation.
Authors: Song Chen, Gary Krug, Stephanie Strassel.
Data source: Broadcast conversation.
Data type: Text.
Title from disc label.
In Standard Arabic, Arabic, English.
Access and use
Access restricted by licensing agreement.
Summary
"GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences was developed by the Linguistic Data Consortium (LDC). Along with other corpora, the parallel text in this release comprised training data for Phase 4 of the DARPA GALE (Global Autonomous Language Exploitation) Program. This corpus contains Modern Standard Arabic source sentences and corresponding English translations selected from broadcast conversation data collected by LDC in 2007 and 2008 and transcribed and translated by LDC or under its direction. GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences includes 170 source-translation document pairs, comprising 44,064 words (Arabic source) of translated data. Data is drawn from 45 distinct Arabic broadcast conversation (BC) sources. Source data and translations are distributed in TDF format. TDF files are tab-delimited files containing one segment of text along with meta information about that segment. Each field in the TDF file is described in TDF_format.txt. All data are encoded in UTF-8."--LDC online catalog.
Variant and related titles
GALE Phase four Arabic broadcast conversation : parallel sentences