R: Calculate the Isonymy, Isonymy between regions, Lasker...

fIsonymyAll {OnomasticDiversity}

R Documentation

Calculate the Isonymy, Isonymy between regions, Lasker distances, Euclidean distance and Nei's distances

Description

This function obtains the Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.

Usage

fIsonymyAll (x, n, location, union, measure)

Arguments

`x`	data frame with the data.
`n`	number of the locations in the data frame.
`location`	name of a variable which represents the location in the data.
`union`	variable to be used to search for matching surnames in two locations.
`measure`	name of a variable which represents the relative frequency for each surname.

Details

Values of Isonymy, Isonymy between regions, Lasker distance, Euclidean distance and Nei's distances and Hedrick's coefficient.

Surname (dis)similarity among regions can be quantified by different measures. Consider index i=1,\ldots,n for denoting a certain geographical region (for two regions, (i,j)). Each region has an associated collection S_i of surnames, and for a pair of regions, the collection of all the surnames in them is denoted by S_{ij} (S_{ij}=S_i\cup S_j). The total number of surnames in a certain region i is denoted by n_i. Surnames will be denoted by indices k and l.

Isonymy is defined as I_i=\sum \limits _{k\in S_i}p_{ki}^2 where p_{ki} denotes the relative frequency of surname k in region i. Isonymy can be also extended as a measure of population similarities between groups. Under the assumption of a common origin, isonymy between two regions i and j is defined as I_{ij}=\sum \limits_{k\in S_{ij}}p_{k_i}p_{k_j}.

Other different measures of the isonymic distance between a pair of locations can be derived from isonymy between. For instance, the Lasker distance is given by L = -\log(I_{ij}).

Lasker distance can be interpreted as a measure of similarity between to areas, where large distance indicate less similarity in surname composition. Nevertheless, Lasker distance is not the only option to quantify surname similarity. Other common coefficients are the Euclidean distance and Nei's distance, both of them given by E = \sqrt{1-\sum_{k\in S_{ij}}{\sqrt{p_{ki}p_{kj}}}}\quad\mbox{and}\quad N = -\log\left(\frac{I_{ij}}{\sqrt{I_iI_j}}\right), respectively. Finally, Hedrick's coefficient gives a standardized measure of isonymy using a procedure similar to that utilized in the calculation of a correlation coefficient. Specifically: H_{ij} = \frac{ 2 \sum \limits_{k \in S_{ij}} p_{ki} p_{kj}}{ \left(\sum \limits_{k \in S_{ij}} p_{ki}^2 + \sum \limits_{k \in S_{ij}} p_{kj}^2 \right) } \mbox{, with } i,j=1\ldots,n.

In diversity context, p_{ki} denotes the relative frequency of species k in community (\approx region onomastic context) i and S_i are all species in community i.

Value

A list containing the following components:

`isonymy`	data frame with two columns and number of rows the number of regions / communities (`n`). For each location, it returns the value of the isonymy.
`isonymy.btw`	the value of isonymy between. Matrix, `n \times n`.
`hedrick`	the value of Hedrick's coefficient. Matrix, `n \times n`.
`nei`	the value of Nei's distance. Matrix, `n \times n`.
`lasker`	the value of Lasker distance. Matrix, `n \times n`.
`distE`	the value of Euclidean distance. Matrix, `n \times n`.

Author(s)

Maria Jose Ginzo Villamayor

References

Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E., and Rodriguez–Larralde, A., (1996) Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Annals of Human Biology, 23, 431–455.

Cavalli-Sforza, L. L., and Edwards, A. W. F., (1967), Phylogenetic analysis models and estimation procedures. American Journal of Human Genetics, 19, 233 257.

Hedrick, P. W. (1971), A new approach to measuring genetic similarity. Evolution, 25: 276–280.

Lasker, G. W. (1977) A coefficicnt of relationship by isonymy: a method for estimating the genetic relationship between populations. Human Biology, 49, 489–493.

Mikerezi, I., Shina, E. Scapoli, C., Barbujani, G. Mamolini, E., Sandri, M., Carrieri, A., Rodriguez–Larralde, A. and Barrai, I. (2013). Surnames in Albania: a study of the population of Albania through isonymy. Annals of Human Genetics, 77, 232–243.

Nei, M.(1973). The theory and estimation of genetic distance. In Genetic Structure of Populations, edited by N. E. Morton, (Honolulu: University Press of Hawaii), 45–54.

Weiss, V. 1980. Inbreeding and genetic distance between hierarchically structured populations measured by surname frequencies. Mankind Quarterly, 21, 135–149.

Examples


data(surnamesgal14)
result = fIsonymyAll (x= surnamesgal14, n= 314, location = 'muni',
union = 'surname', measure = 'pki')
result

data(namesmengal16)
namesmengal16$pki <- (namesmengal16$number /
namesmengal16$population)
result = fIsonymyAll (x= namesmengal16, n= 313, location = 'muni',
union = 'name', measure = 'pki')
result

data(nameswomengal16)
nameswomengal16$pki <- (nameswomengal16$number /
nameswomengal16$population)
result = fIsonymyAll (x= nameswomengal16, n= 313, location = 'muni',
union = 'name', measure = 'pki')
result

[Package OnomasticDiversity version 0.1 Index]