utils_stats {metan}R Documentation

Useful functions for computing descriptive statistics

Description

[Stable]

desc_stat() is wrapper function around the above ones and can be used to compute quickly all these statistics at once.

Usage

av_dev(.data, ..., na.rm = FALSE)

ci_mean_t(.data, ..., na.rm = FALSE, level = 0.95)

ci_mean_z(.data, ..., na.rm = FALSE, level = 0.95)

cv(.data, ..., na.rm = FALSE)

freq_table(.data, var, k = NULL, digits = 3)

freq_hist(
  table,
  xlab = NULL,
  ylab = NULL,
  fill = "gray",
  color = "black",
  ygrid = TRUE
)

hmean(.data, ..., na.rm = FALSE)

gmean(.data, ..., na.rm = FALSE)

kurt(.data, ..., na.rm = FALSE)

n_missing(.data, ..., na.rm = FALSE)

n_unique(.data, ..., na.rm = FALSE)

n_valid(.data, ..., na.rm = FALSE)

pseudo_sigma(.data, ..., na.rm = FALSE)

range_data(.data, ..., na.rm = FALSE)

row_col_mean(.data, na.rm = FALSE)

row_col_sum(.data, na.rm = FALSE)

sd_amo(.data, ..., na.rm = FALSE)

sd_pop(.data, ..., na.rm = FALSE)

sem(.data, ..., na.rm = FALSE)

skew(.data, ..., na.rm = FALSE)

sum_dev(.data, ..., na.rm = FALSE)

ave_dev(.data, ..., na.rm = FALSE)

sum_sq_dev(.data, ..., na.rm = FALSE)

sum_sq(.data, ..., na.rm = FALSE)

var_pop(.data, ..., na.rm = FALSE)

var_amo(.data, ..., na.rm = FALSE)

cv_by(.data, ..., .vars = NULL, na.rm = FALSE)

max_by(.data, ..., .vars = NULL, na.rm = FALSE)

min_by(.data, ..., .vars = NULL, na.rm = FALSE)

means_by(.data, ..., .vars = NULL, na.rm = FALSE)

mean_by(.data, ..., .vars = NULL, na.rm = FALSE)

n_by(.data, ..., .vars = NULL, na.rm = FALSE)

sd_by(.data, ..., .vars = NULL, na.rm = FALSE)

var_by(.data, ..., .vars = NULL, na.rm = FALSE)

sem_by(.data, ..., .vars = NULL, na.rm = FALSE)

sum_by(.data, ..., .vars = NULL, na.rm = FALSE)

Arguments

.data

A data frame or a numeric vector.

...

The argument depends on the function used.

  • For ⁠*_by⁠ functions, ... is one or more categorical variables for grouping the data. Then the statistic required will be computed for all numeric variables in the data. If no variables are informed in ..., the statistic will be computed ignoring all non-numeric variables in .data.

  • For the other statistics, ... is a comma-separated of unquoted variable names to compute the statistics. If no variables are informed in n ..., the statistic will be computed for all numeric variables in .data.

na.rm

If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.

level

The confidence level for the confidence interval of the mean. Defaults to 0.95.

var

The variable to compute the frequency table. See Details for more details.

k

The number of classes to be created. See Details for more details.

digits

The number of significant figures to show. Defaults to 2.

table

A frequency table computed with freq_table().

xlab, ylab

The x and y labels.

fill, color

The color to fill the bars and color the border of the bar, respectively.

ygrid

Shows a grid line on the y axis? Defaults to TRUE. freq_hist <- function(table,

.vars

Used to select variables in the ⁠*_by()⁠ functions. One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables. Defaults to NULL (all numeric variables are analyzed)..

Details

The function freq_table() computes a frequency table for either numerical or categorical variables. If a variable is categorical or discrete (integer values), the number of classes will be the number of levels that the variable contains.

If a variable (say, data) is continuous, the number of classes (k) is given by the square root of the number of samples (n) if ⁠n =< 100⁠ or 5 * log10(n) if n > 100.

The amplitude (\(A\)) of the data is used to define the size of the class (\(c\)), given by

\[c = \frac{A}{n - 1}\]

The lower limit of the first class (LL1) is given by min(data) - c / 2. The upper limit is given by LL1 + c. The limits of the other classes are given in the same way. After the creation of the classes, the absolute and relative frequencies within each class are computed.

Value

Author(s)

Tiago Olivoto tiagoolivoto@gmail.com

References

Ferreira, Daniel Furtado. 2009. Estatistica Basica. 2 ed. Vicosa, MG: UFLA.

Examples


library(metan)
# means of all numeric variables by ENV
mean_by(data_ge2, GEN, ENV)

# Coefficient of variation for all numeric variables
# by GEN and ENV
cv_by(data_ge2, GEN, ENV)

# Skewness of a numeric vector
set.seed(1)
nvec <- rnorm(200, 10, 1)
skew(nvec)

# Confidence interval 0.95 for the mean
# All numeric variables
# Grouped by levels of ENV
data_ge2 %>%
  group_by(ENV) %>%
  ci_mean_t()

# standard error of the mean
# Variable PH and EH
sem(data_ge2, PH, EH)

# Frequency table for variable NR
data_ge2 %>%
  freq_table(NR)



[Package metan version 1.18.0 Index]