BNCbiber {corpora} | R Documentation |
Biber's (1988) register features for the British National Corpus
Description
This data set contains a table of the relative frequencies (per 1000 words) of 65 linguistic features (Biber 1988, 1995) for each text document in the British National Corpus (Aston & Burnard 1998).
Biber (1988) introduced these features for the purpose of a multidimensional register analysis. Variables in the data set are numbered according to Biber's list (see e.g. Biber 1995, 95f).
Feature frequencies were automatically extracted from the British National Corpus using query patterns based on part-of-speech tags (Gasthaus 2007). Note that features 60 and 65 had to be omitted because they cannot be identified with sufficient accuracy by the automatic methods. For further information on the extraction methodology, see Gasthaus (2007, 20-21). The original data set and the Python scripts used for feature extraction are available from https://portal.ikw.uni-osnabrueck.de/~CL/download/BSc_Gasthaus2007/; the version included here contains some bug fixes.
Usage
BNCbiber
Format
A numeric matrix with 4048 rows and 65 columns, specifying the relative frequencies
(per 1000 words) of 65 linguistic features. Documents are listed in the same order
as the metadata in BNCmeta
and rows are labelled with text IDs, so it
is straightforward to combine the two data sets.
A. Tense and aspect markers | |
f_01_past_tense | Past tense |
f_02_perfect_aspect | Perfect aspect |
f_03_present_tense | Present tense |
B. Place and time adverbials | |
f_04_place_adverbials | Place adverbials (e.g., above, beside, outdoors) |
f_05_time_adverbials | Time adverbials (e.g., early, instantly, soon) |
C. Pronouns and pro-verbs | |
f_06_first_person_pronouns | First-person pronouns |
f_07_second_person_pronouns | Second-person pronouns |
f_08_third_person_pronouns | Third-person personal pronouns (excluding it) |
f_09_pronoun_it | Pronoun it |
f_10_demonstrative_pronoun | Demonstrative pronouns (that, this, these, those as pronouns) |
f_11_indefinite_pronoun | Indefinite pronounes (e.g., anybody, nothing, someone) |
f_12_proverb_do | Pro-verb do |
D. Questions | |
f_13_wh_question | Direct wh-questions |
E. Nominal forms | |
f_14_nominalization | Nominalizations (ending in -tion, -ment, -ness, -ity) |
f_15_gerunds | Gerunds (participial forms functioning as nouns) |
f_16_other_nouns | Total other nouns |
F. Passives | |
f_17_agentless_passives | Agentless passives |
f_18_by_passives | by-passives |
G. Stative forms | |
f_19_be_main_verb | be as main verb |
f_20_existential_there | Existential there |
H. Subordination features | |
f_21_that_verb_comp | that verb complements (e.g., I said that he went.) |
f_22_that_adj_comp | that adjective complements (e.g., I'm glad that you like it.) |
f_23_wh_clause | wh-clauses (e.g., I believed what he told me.) |
f_24_infinitives | Infinitives |
f_25_present_participle | Present participial adverbial clauses (e.g., Stuffing his mouth with cookies, Joe ran out the door.) |
f_26_past_participle | Past participial adverbial clauses (e.g., Built in a single week, the house would stand for fifty years.) |
f_27_past_participle_whiz | Past participial postnominal (reduced relative) clauses (e.g., the solution produced by this process) |
f_28_present_participle_whiz | Present participial postnominal (reduced relative) clauses (e.g., the event causing this decline) |
f_29_that_subj | that relative clauses on subject position (e.g., the dog that bit me) |
f_30_that_obj | that relative clauses on object position (e.g., the dog that I saw) |
f_31_wh_subj | wh relatives on subject position (e.g., the man who likes popcorn) |
f_32_wh_obj | wh relatives on object position (e.g., the man who Sally likes) |
f_33_pied_piping | Pied-piping relative clauses (e.g., the manner in which he was told) |
f_34_sentence_relatives | Sentence relatives (e.g., Bob likes fried mangoes, which is the most disgusting thing I've ever heard of.) |
f_35_because | Causative adverbial subordinator (because) |
f_36_though | Concessive adverbial subordinators (although, though) |
f_37_if | Conditional adverbial subordinators (if, unless) |
f_38_other_adv_sub | Other adverbial subordinators (e.g., since, while, whereas) |
I. Prepositional phrases, adjectives and adverbs | |
f_39_prepositions | Total prepositional phrases |
f_40_adj_attr | Attributive adjectives (e.g., the big horse) |
f_41_adj_pred | Predicative adjectives (e.g., The horse is big.) |
f_42_adverbs | Total adverbs |
J. Lexical specificity | |
f_43_type_token | Type-token ratio (including punctuation) |
f_44_mean_word_length | Average word length (across tokens, excluding punctuation) |
K. Lexical classes | |
f_45_conjuncts | Conjuncts (e.g., consequently, furthermore, however) |
f_46_downtoners | Downtoners (e.g., barely, nearly, slightly) |
f_47_hedges | Hedges (e.g., at about, something like, almost) |
f_48_amplifiers | Amplifiers (e.g., absolutely, extremely, perfectly) |
f_49_emphatics | Emphatics (e.g., a lot, for sure, really) |
f_50_discourse_particles | Discourse particles (e.g., sentence-initial well, now, anyway) |
f_51_demonstratives | Demonstratives |
L. Modals | |
f_52_modal_possibility | Possibility modals (can, may, might, could) |
f_53_modal_necessity | Necessity modals (ought, should, must) |
f_54_modal_predictive | Predictive modals (will, would, shall) |
M. Specialized verb classes | |
f_55_verb_public | Public verbs (e.g., assert, declare, mention) |
f_56_verb_private | Private verbs (e.g., assume, believe, doubt, know) |
f_57_verb_suasive | Suasive verbs (e.g., command, insist, propose) |
f_58_verb_seem | seem and appear |
N. Reduced forms and dispreferred structures | |
f_59_contractions | Contractions |
n/a | Subordinator that deletion (e.g., I think [that] he went.) |
f_61_stranded_preposition | Stranded prepositions (e.g., the candidate that I was thinking of) |
f_62_split_infinitve | Split infinitives (e.g., He wants to convincingly prove that ...) |
f_63_split_auxiliary | Split auxiliaries (e.g., They were apparently shown to ...) |
O. Co-ordination | |
f_64_phrasal_coordination | Phrasal co-ordination (N and N; Adj and Adj; V and V; Adv and Adv) |
n/a | Independent clause co-ordination (clause-initial and) |
P. Negation | |
f_66_neg_synthetic | Synthetic negation (e.g., No answer is good enough for Jones.) |
f_67_neg_analytic | Analytic negation (e.g., That's not likely.) |
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert); feature extractor by Jan Gasthaus (2007).
References
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.
Biber, Douglas (1988). Variations Across Speech and Writing. Cambridge University Press, Cambridge.
Biber, Douglas (1995). Dimensions of Register Variation: A cross-linguistic comparison. Cambridge University Press, Cambridge.
Gasthaus, Jan (2007). Prototype-Based Relevance Learning for Genre Classification. B.Sc.\ thesis, Institute of Cognitive Science, University of Osnabrück. Data sets and software available from https://portal.ikw.uni-osnabrueck.de/~CL/download/BSc_Gasthaus2007/.