Books+ Search Results

NXT Switchboard annotations

Title
NXT Switchboard annotations / Linguistic Data Consortium.
ISBN
158563526X
Publication
[Philadelphia, PA] : [Linguistic Data Consortium], [2009]
Physical Description
1 online resource
Local Notes
Access is available to the Yale community.
Notes
Applications: natural language processing.
Authors: Sasha Calhoun, Jean Carletta, Daniel Jurafsky, Malvina Nissim, Mari Ostendorf, Annie Zaenen.
Data source: telephone conversations.
Data type: text.
LDC number: LDC2009T26.
In English.
Title from resource home page (LDC website, viewed September 17, 2020).
Access and use
Access restricted by licensing agreement.
Summary
"NXT Switchboard Annotations, brings together in NITE XML, a single XML format, the multiple layers of annotation performed on a transcript subset from Switchboard 1- Release 2, LDC97S62. The original Switchboard corpus is a collection of spontaneous telephone conversations between previously unacquainted speakers of American English on a variety of topics chosen from a pre-determined list. The data in NXT Switchboard Annotations was converted from the Penn Treebank bracketed format in which the Switchboard corpus was originally distributed using an XML-based tool for syntactic query that comes with a ready-made Switchboard converter. Conversion was performed using a set of XSL stylesheets to extract each of the multiple XML files associated with one dialogue. The data was divided into separate XML files representing the orthographic transcription, syntax, turn structure, disfluencies and movement, or the relationship between traces and their sources. Transcription consists of a flat list of terminals: words, punctuation, traces, and so on. Syntax starts with a flat list of parses and works down through nonterminals, grounding in terminals (which are in the transcription file, but are referenced by pointers that indicate they are to be treated as if they were part of the tree itself)." --LDC online catalog.
Format
Books / Data Sets / Online
Language
English
Added to Catalog
September 17, 2020
Genre/Form
Data sets.
Text corpora.
Also listed under
Calhoun, Sasha, creator.
Linguistic Data Consortium, issuing body.
Citation

Available from:

Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?