compute_name_index {ChineseNames} | R Documentation |
Compute multiple features of surnames and given names.
Description
Compute all available name features (indices) based on
familyname
and givenname
.
You can either input a data frame
with a variable of Chinese full names
(and a variable of birth years, if necessary)
or just input a vector of full names
(and a vector of birth years, if necessary).
Usage 1: Input a single value or a vector of
name
[andbirth
, if necessary].Usage 2: Input a data frame of
data
and the variable name ofvar.fullname
(orvar.surname
and/orvar.givenname
) [andvar.birthyear
, if necessary].
Caution. Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.
Usage
compute_name_index(
data = NULL,
var.fullname = NULL,
var.surname = NULL,
var.givenname = NULL,
var.birthyear = NULL,
name = NA,
birth = NA,
index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"),
NU.approx = TRUE,
digits = 4,
return.namechar = TRUE,
return.all = FALSE
)
Arguments
data |
Data frame. |
var.fullname |
Variable name of Chinese full names (e.g., |
var.surname |
Variable name of Chinese surnames (e.g., |
var.givenname |
Variable name of Chinese given names (e.g., |
var.birthyear |
Variable name of birth year (e.g., |
name |
If no |
birth |
If no |
index |
Which indices to compute? By default, it computes all available name indices:
For details, see https://psychbruce.github.io/ChineseNames/ |
NU.approx |
Whether to approximately compute name-character uniqueness (NU)
using the nearest two birth cohorts with relative weights
(which would be more precise than just using a single birth cohort).
Default is |
digits |
Number of decimal places. Default is |
return.namechar |
Whether to return separate name characters.
Default is |
return.all |
Whether to return all temporary variables
in the computation of the final variables.
Default is |
Value
A new data frame (of class data.table
) with name indices appended.
Full names are split into name0
(surnames, with compound surnames automatically detected),
name1
, name2
, and name3
(given-name characters).
Citation
Bao, H.-W.-S. (2023). ChineseNames: Chinese Name Database 1930-2008. R package version 2023.8. https://CRAN.R-project.org/package=ChineseNames
Bao, H.-W.-S., Cai, H., Jing, Y., & Wang, J. (2021). Novel evidence for the increasing prevalence of unique names in China: A reply to Ogihara. Frontiers in Psychology, 12, 731244. doi:10.3389/fpsyg.2021.731244
Note
For details and examples, see https://psychbruce.github.io/ChineseNames/
Examples
## Prepare ##
sn = familyname$surname[1:12]
gn = c(top100name.year$name.all.1960[1:6],
top100name.year$name.all.2000[1:6],
top100name.year$name.all.1960[95:100],
top100name.year$name.all.2000[95:100])
demodata = data.frame(name=paste0(sn, gn),
birth=c(1960:1965, 2000:2005,
1960:1965, 2000:2005))
demodata
## Compute ##
newdata = compute_name_index(demodata,
var.fullname="name",
var.birthyear="birth")
newdata