think {ndl}R Documentation

Finnish ‘think’ verbs.

Description

3404 occurrences of four synonymous Finnish ‘think’ verbs (‘ajatella’: 1492; ‘mietti\"a’: 812; ‘pohtia’: 713; ‘harkita’: 387) in newspaper and Internet newsgroup discussion texts

Usage

data(think)

Format

A data frame with 3404 observations on the following 27 variables:

Lexeme

A factor specifying one of the four ‘think’ verb synonyms

Polarity

A factor specifying whether the ‘think’ verb has negative polarity (Negation) or not (Other)

Voice

A factor specifying whether the ‘think’ verb is in the Passive voice or not (Other)

Mood

A factor specifying whether the ‘think’ verb is in the Indicative or Conditional mood or not (Other)

Person

A factor specifying whether the ‘think’ verb is in the First, Second, Third person or not (None)

Number

A factor specifying whether the ‘think’ verb is in the Plural number or not (Other)

Covert

A factor specifying whether the agent/subject of the ‘think’ verb is explicitly expressed as a syntactic argument (Overt), or only as a morphological feature of the ‘think’ verb (Covert)

ClauseEquivalent

A factor specifying whether the ‘think’ verb is used as a non-finite clause equivalent (ClauseEquivalent) or as a finite verb (FiniteVerbChain)

Agent

A factor specifying the occurrence of Agent/Subject of the ‘think’ verb as either a Human Individual, Human Group, or as absent (None)

Patient

A factor specifying the occurrence of the Patient/Object argument among the semantic or structural subclasses as either an Human Individual or Group (IndividualGroup), Abstraction, Activity, Communication, Event, an ‘etta’ (‘that’) clause (etta_CLAUSE), DirectQuote, IndirectQuestion, Infinitive, Participle, or as absent (None)

Manner

A factor specifying the occurrrence of the Manner argument as any of its subclasses Generic, Negative (sufficiency), Positive (sufficiency), Frame, Agreement (Concur or Disagree), Joint (Alone or Together), or as absent (None)

Time

A factor specifying the occurrence of Time argument (as a moment) as either of its subclasses Definite, Indefinite, or as absent (None)

Modality1

A factor specifying the main semantic subclasses of the entire Verb chain as either indicating Possibility, Necessity, or their absense (None)

Modality2

A factor specifying minor semantic subclasses of the entire Verb chain as indicating either a Temporal element (begin, end, continuation, etc.), External (cause), Volition, Accidental nature of the thinking process, or their absense (None)

Source

A factor specifying the occurrence of a Source argument or its absense (None)

Goal

A factor specifying the occurrence of a Goal argument or its absence (None)

Quantity

A factor specifying the occurrence of a Quantity argument, or its absence (None)

Location

A factor specifying the occurrence of a Location argument, or its absence (None)

Duration

A factor specifying the occurrence of a Duration argument, or its absence (None)

Frequency

A factor specifying the occurrence of a Frequency arument, or its absence (None)

MetaComment

A factor specifying the occurrence of a MetaComment, or its absence (None)

ReasonPurpose

A factor specifying the occurrence of a Reason or Purpose argument (ReasonPurpose), or their absence (None)

Condition

A factor specifying the occurrence of a Condition argument, or its absence (None)

CoordinatedVerb

A factor specifying the occurrence of a Coordinated Verb (in relation to the ‘think’ verb: CoordinatedVerb), or its absence (None)

Register

A factor specifying whether the ‘think’ verb occurs in the newspaper subcorpus (hs95) or the Internet newsgroup discussion corpus (sfnet)

Section

A factor specifying the subsection in which the ‘think’ verb occurs in either of the two subcorpora

Author

A factor specifying the author of the text in which the ‘think’ verb occurs, if that author is identifiable – authors in the Internet newgroup discussion subcorpus are anonymized; unidentifiable/unknown author designated as (None)

Details

The four most frequent synonyms meaning ‘think, reflect, ponder, consider’, i.e. ‘ajatella, miettia, pohtia, harkita’, were extracted from two months of newspaper text from the 1990s (Helsingin Sanomat 1995) and six months of Internet newsgroup discussion from the early 2000s (SFNET 2002-2003), namely regarding (personal) relationships (sfnet.keskustelu.ihmissuhteet) and politics (sfnet.keskustelu.politiikka). The newspaper corpus consisted of 3,304,512 words of body text (i.e. excluding headers and captions as well as punctuation tokens), and included 1,750 examples of the studied ‘think’ verbs. The Internet corpus comprised 1,174,693 words of body text, yielding 1,654 instances of the selected ‘think’ verbs. In terms of distinct identifiable authors, the newspaper sub-corpus was the product of just over 500 journalists and other contributors, while the Internet sub-corpus involved well over 1000 discussants. The think dataset contains a selection of 26 contextual features judged as most informative.

For extensive details of the data and its linguistic and statistical analysis, see Arppe (2008). For the full selection of contextual features, see the amph (2008) microcorpus.

Source

amph 2008. A micro-corpus of 3404 occurrences of the four most common Finnish THINK lexemes, ‘ajatella, miettia, pohtia, and harkita’, in Finnish newspaper and Internet newsgroup discussion texts, containing extracts and linguistic analysis of the relevant context in the original corpus data, scripts for processing this data, R functions for its statistical analysis, as well as a comprehensive set of ensuing results as R data tables. Compiled and analyzed by Antti Arppe. Available on-line at URL: http://www.csc.fi/english/research/software/amph/

Helsingin Sanomat 1995. ~22 million words of Finnish newspaper articles published in Helsingin Sanomat during January–December 1995. Compiled by the Research Institute for the Languages of Finland [KOTUS] and CSC – IT Center for Science, Finland. Available on-line at URL: http://www.csc.fi/kielipankki/

SFNET 2002-2003. ~100 million words of Finnish internet newsgroup discussion posted during October 2002 – April 2003. Compiled by Tuuli Tuominen and Panu Kalliokoski, Computing Centre, University of Helsinki, and Antti Arppe, Department of General Linguistics, University of Helsinki, and CSC – IT Center for Science, Finland. Available on-line at URL: http://www.csc.fi/kielipankki/

References

Arppe, A. 2008. Univariate, bivariate and multivariate methods in corpus-based lexicography – a study of synonymy. Publications of the Department of General Linguistics, University of Helsinki, No. 44. URN: http://urn.fi/URN:ISBN:978-952-10-5175-3.

Arppe, A. 2009. Linguistic choices vs. probabilities – how much and what can linguistic theory explain? In: Featherston, Sam & Winkler, Susanne (eds.) The Fruits of Empirical Linguistics. Volume 1: Process. Berlin: de Gruyter, pp. 1-24.

Examples

## Not run: 
data(think)
think.ndl = ndlClassify(Lexeme ~ Person + Number + Agent + Patient + Register,
   data=think)
summary(think.ndl)
plot(think.ndl)

## End(Not run)

[Package ndl version 0.2.18 Index]