data.Normalization {clusterSim}R Documentation

Types of variable (column) and object (row) normalization formulas

Description

Types of variable (column) and object (row) normalization formulas

Usage

data.Normalization (x,type="n0",normalization="column",...)

Arguments

x

vector, matrix or dataset

type

type of normalization:

n0 - without normalization

n1 - standardization ((x-mean)/sd)

n2 - positional standardization ((x-median)/mad)

n3 - unitization ((x-mean)/range)

n3a - positional unitization ((x-median)/range)

n4 - unitization with zero minimum ((x-min)/range)

n5 - normalization in range <-1,1> ((x-mean)/max(abs(x-mean)))

n5a - positional normalization in range <-1,1> ((x-median)/max(abs(x-median)))

n6 - quotient transformation (x/sd)

n6a - positional quotient transformation (x/mad)

n7 - quotient transformation (x/range)

n8 - quotient transformation (x/max)

n9 - quotient transformation (x/mean)

n9a - positional quotient transformation (x/median)

n10 - quotient transformation (x/sum)

n11 - quotient transformation (x/sqrt(SSQ))

n12 - normalization ((x-mean)/sqrt(sum((x-mean)^2)))

n12a - positional normalization ((x-median)/sqrt(sum((x-median)^2)))

n13 - normalization with zero being the central point ((x-midrange)/(range/2))

normalization

"column" - normalization by variable, "row" - normalization by object

...

arguments passed to sum, mean, min sd, mad and other aggregation functions. In particular: na.rm - a logical value indicating whether NA values should be stripped before the computation

Details

See file ../doc/dataNormalization_details.pdf for further details

Thanks Wolfgang Lederer (<wolfgang.lederer@gmail.com>) for reporting n4/vector error

Value

Normalized data The numeric shifts and scalings used (if any) are returned as attributes "normalized:shift" and "normalized:scale"

Author(s)

Marek Walesiak marek.walesiak@ue.wroc.pl, Andrzej Dudek andrzej.dudek@ue.wroc.pl

Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland

References

Anderberg, M.R. (1973), Cluster analysis for applications, Academic Press, New York, San Francisco, London. ISBN 9780120576500.

Gatnar, E., Walesiak, M. (Eds.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych [Multivariate statistical analysis methods in marketing research], Wydawnictwo AE, Wroclaw, 35-38.

Jajuga, K., Walesiak, M. (2000), Standardisation of data set under different measurement scales, In: R. Decker, W. Gaul (Eds.), Classification and information processing at the turn of the millennium, Springer-Verlag, Berlin, Heidelberg, 105-112. Available at: doi:10.1007/978-3-642-57280-7_11.

Milligan, G.W., Cooper, M.C. (1988), A study of standardization of variables in cluster analysis, "Journal of Classification", vol. 5, 181-204. Available at: doi:10.1007/BF01897163.

Mlodak, A. (2006), Analiza taksonomiczna w statystyce regionalnej, Difin, Warszawa. ISBN 83-7251-605-7.

Walesiak, M. (2014), Przeglad formul normalizacji wartosci zmiennych oraz ich wlasnosci w statystycznej analizie wielowymiarowej [Data normalization in multivariate data analysis. An overview and properties], "Przeglad Statystyczny" ("Statistical Review"), vol. 61, no. 4, 363-372.

See Also

cluster.Sim

Examples

library(clusterSim)
data(data_ratio)
z1 <- data.Normalization(data_ratio,type="n1",normalization="column",na.rm=FALSE)
z2 <- data.Normalization(data_ratio,type="n10",normalization="row",na.rm=FALSE)

[Package clusterSim version 0.51-4 Index]