R: Compute multiple features of surnames and given names.

compute_name_index {ChineseNames}

R Documentation

Compute multiple features of surnames and given names.

Description

Compute all available name features (indices) based on familyname and givenname. You can either input a data frame with a variable of Chinese full names (and a variable of birth years, if necessary) or just input a vector of full names (and a vector of birth years, if necessary).

Usage 1: Input a single value or a vector of name [and birth, if necessary].
Usage 2: Input a data frame of data and the variable name of var.fullname (or var.surname and/or var.givenname) [and var.birthyear, if necessary].

Caution. Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.

Usage

compute_name_index(
  data = NULL,
  var.fullname = NULL,
  var.surname = NULL,
  var.givenname = NULL,
  var.birthyear = NULL,
  name = NA,
  birth = NA,
  index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"),
  NU.approx = TRUE,
  digits = 4,
  return.namechar = TRUE,
  return.all = FALSE
)

Arguments

`data`	Data frame.
`var.fullname`	Variable name of Chinese full names (e.g., `"name"`).
`var.surname`	Variable name of Chinese surnames (e.g., `"surname"`).
`var.givenname`	Variable name of Chinese given names (e.g., `"givenname"`).
`var.birthyear`	Variable name of birth year (e.g., `"birth"`).
`name`	If no `data`, you can just input a vector of full name(s).
`birth`	If no `data`, you can just input a vector of birth year(s).
`index`	Which indices to compute? By default, it computes all available name indices: `NLen`: full-name length (2~4). `SNU`: surname uniqueness (1~6). `SNI`: surname initial (1~26). `NU`: name-character uniqueness (1~6). `CCU`: character-corpus uniqueness (1~6). `NG`: name gender (-1~1). `NV`: name valence (1~5). `NW`: name warmth (1~5). `NC`: name competence (1~5). For details, see https://psychbruce.github.io/ChineseNames/
`NU.approx`	Whether to approximately compute name-character uniqueness (NU) using the nearest two birth cohorts with relative weights (which would be more precise than just using a single birth cohort). Default is `TRUE`.
`digits`	Number of decimal places. Default is `4`.
`return.namechar`	Whether to return separate name characters. Default is `TRUE`.
`return.all`	Whether to return all temporary variables in the computation of the final variables. Default is `FALSE`.

Value

A new data frame (of class data.table) with name indices appended. Full names are split into name0 (surnames, with compound surnames automatically detected), name1, name2, and name3 (given-name characters).

Citation

Bao, H.-W.-S. (2023). ChineseNames: Chinese Name Database 1930-2008. R package version 2023.8. https://CRAN.R-project.org/package=ChineseNames

Bao, H.-W.-S., Cai, H., Jing, Y., & Wang, J. (2021). Novel evidence for the increasing prevalence of unique names in China: A reply to Ogihara. Frontiers in Psychology, 12, 731244. doi:10.3389/fpsyg.2021.731244

Note

For details and examples, see https://psychbruce.github.io/ChineseNames/

Examples

## Prepare ##
sn = familyname$surname[1:12]
gn = c(top100name.year$name.all.1960[1:6],
       top100name.year$name.all.2000[1:6],
       top100name.year$name.all.1960[95:100],
       top100name.year$name.all.2000[95:100])
demodata = data.frame(name=paste0(sn, gn),
                      birth=c(1960:1965, 2000:2005,
                              1960:1965, 2000:2005))
demodata

## Compute ##
newdata = compute_name_index(demodata,
                             var.fullname="name",
                             var.birthyear="birth")
newdata

[Package ChineseNames version 2023.8 Index]