Books+ Search Results

Penn Korean universal dependency treebank

Title
Penn Korean universal dependency treebank / Linguistic Data Consortium.
Publication
[Philadelphia, PA] : [Linguistic Data Consortium], [2023]
Physical Description
1 online resource
Local Notes
Access is available to the Yale community.
Notes
Authors: Jinho D. Choi, Na-Rae Han, Jena D. Hwang, Hansaem Kim.
Data source: newswire.
Data type: Text.
Applications: automatic content extraction, discourse analysis, information detection, information extraction, morphology learning, parsing, part of speech tagging, syntactic parsing.
LDC number: LDC2023T05.
In Korean.
Title from resource home page (LDC website, viewed May 22, 2023).
Access and use
Access restricted by licensing agreement.
Summary
"Penn Korean Universal Dependency Treebank (LDC2023T05) contains 5,010 sentences and 132,041 tokens annotated in dependency format under the Universal Dependencies framework. It is a conversion of Korean Treebank Annotations Version 2.0 (LDC2006T009) which was produced in constituency format. In general, dependency grammar is based on the idea that the verb is the center of the clause structure and that other units in the sentence are connected to the verb as directed links or dependencies. This is a one-to-one correspondence: for every element in the sentence there is one node in the sentence structure that corresponds to that element. In constituency or phrase structure grammars, on the other hand, clauses are divided into noun phrases and verb phrases and in each sentence, one or more nodes may correspond to one element. Data. The source text is newswire stories from the Linguistic Data Consortium's Korean Press Agency collection contained in Korean Newswire (LDC2000T45). Sentences were automatically converted for dependency annotation; the output was manually checked. The corpus contains 112 files in CoNLL-U format, the Universal Dependencies standard, with a mapping to their counterpart in LDC2006T09." --LDC online catalog.
Format
Books / Data Sets / Online
Language
Korean
Added to Catalog
May 22, 2023
Contents
data file (contains annotation files)
docs file (contains additional documentation and a file table).
Genre/Form
Data sets.
Text corpora.
Also listed under
Choi, Jinho D., creator.
Linguistic Data Consortium, issuing body.
Citation

Available from:

Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?