Notes
Applications: natural language processing.
Authors: Sasha Calhoun, Jean Carletta, Daniel Jurafsky, Malvina Nissim, Mari Ostendorf, Annie Zaenen.
Data source: telephone conversations.
Data type: text.
LDC number: LDC2009T26.
In English.
Title from resource home page (LDC website, viewed September 17, 2020).
Summary
"NXT Switchboard Annotations, brings together in NITE XML, a single XML format, the multiple layers of annotation performed on a transcript subset from Switchboard 1- Release 2, LDC97S62. The original Switchboard corpus is a collection of spontaneous telephone conversations between previously unacquainted speakers of American English on a variety of topics chosen from a pre-determined list. The data in NXT Switchboard Annotations was converted from the Penn Treebank bracketed format in which the Switchboard corpus was originally distributed using an XML-based tool for syntactic query that comes with a ready-made Switchboard converter. Conversion was performed using a set of XSL stylesheets to extract each of the multiple XML files associated with one dialogue. The data was divided into separate XML files representing the orthographic transcription, syntax, turn structure, disfluencies and movement, or the relationship between traces and their sources. Transcription consists of a flat list of terminals: words, punctuation, traces, and so on. Syntax starts with a flat list of parses and works down through nonterminals, grounding in terminals (which are in the transcription file, but are referenced by pointers that indicate they are to be treated as if they were part of the tree itself)." --LDC online catalog.