conllu_dev_q11_2_nltk {finnsurveytext} | R Documentation |
Young People's Views on Development Cooperation 2012 q11_2 response data in CoNLL-U format with NTLK stopwords removed
Description
This data contains the responses to Development Cooperation q11_2 dataset in CoNLL-U format with ISO stopwords and punctuation removed.
Usage
conllu_dev_q11_2_nltk
Format
## 'conllu_dev_q11_2_nltk' A dataframe with 4407 rows and 14 columns:
- doc_id
the identifier of the document
- paragraph_id
the identifier of the paragraph
- sentence_id
the identifier of the sentence
- sentence
the text of the sentence for which this token is part of
- token_id
Word index, integer starting at 1 for each new sentence; may be a range for multi-word tokens; may be a decimal number for empty nodes.
- token
Word form or punctuation symbol.
- lemma
Lemma or stem of word form.
- upos
Universal part-of-speech tag.
- xpos
Language-specific part-of-speech tag; underscore if not available.
- feats
List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
- head_token_id
Head of the current word, which is either a value of token_id or zero (0).
- dep_rel
Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.
- deps
Enhanced dependency graph in the form of a list of head-deprel pairs.
- misc
Any other annotation.
Source
<https://urn.fi/urn:nbn:fi:fsd:T-FSD2821>