R: Frequency distribution table for numerical data

fdt {fdth}

R Documentation

Frequency distribution table for numerical data

Description

A S3 set of methods to easily perform frequency distribution table (‘⁠fdt⁠’) from vector, data.frame and matrix objects.

Usage

## S3 generic
fdt(x, ...)

## S3 methods
## Default S3 method:
fdt(x,
    k,
    start,
    end,
    h,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

## S3 method for class 'data.frame'
fdt(x,
    k,
    by,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

## S3 method for class 'matrix'
fdt(x,
    k,
    breaks=c('Sturges', 'Scott', 'FD'),
    right=FALSE,
    na.rm=FALSE, ...)

Arguments

`x`	a `vector`, `data.frame` or `matrix` object. If ‘⁠x⁠’ is `data.frame` or `matrix` it must contain at least one numeric column.
`k`	number of class intervals.
`start`	left endpoint of the first class interval.
`end`	right endpoint of the last class interval.
`h`	class interval width.
`by`	categorical variable used for grouping each numeric variable, useful only on `data.frame`.
`breaks`	method used to determine the number of interval classes, c(“Sturges”, “Scott”, “FD”).
`right`	right endpoints open (default = `FALSE`).
`na.rm`	logical. Should missing values be removed? (default = `FALSE`).
`...`	potencial further arguments (required by generic).

Details

The simplest way to run ‘⁠fdt⁠’ is done by supplying only the ‘⁠x⁠’ object, for example: nm <- fdt(x). In this case all necessary default values (‘⁠breaks⁠’ and ‘⁠right⁠’) (“Sturges” and FALSE respectively) will be used.

It can be provided also:

‘⁠x⁠’ and ‘⁠k⁠’ (number of class intervals);
‘⁠x⁠’, ‘⁠start⁠’ (left endpoint of the first class interval) and ‘⁠end⁠’ (right endpoint of the last class interval); or
‘⁠x⁠’, ‘⁠start⁠’, ‘⁠end⁠’ and ‘⁠h⁠’ (class interval width).

These options make the ‘⁠fdt⁠’ very easy and flexible.

The ‘⁠fdt⁠’ object stores information to be used by methods summary, print, plot, mean, median and mfv. The result of plot is a histogram. The methods summary, print and plot provide a reasonable set of parameters to format and plot the ‘⁠fdt⁠’ object in a pretty (and publishable) way.

Value

For fdt the method fdt.default returns a list of class fdt.default with the slots:

`\samp{table}`	A `data.frame` storing the ‘⁠fdt⁠’;
`\samp{breaks}`	A `vector` of length 4 storing ‘⁠start⁠’, ‘⁠end⁠’, ‘⁠h⁠’ and ‘⁠right⁠’ of the ‘⁠fdt⁠’ generated by this method;
`\samp{data}`	A vector of the data ‘⁠x⁠’ provided.

The methods fdt.data.frame and fdt.matrix return a list of class fdt.multiple. This list has one slot for each numeric (fdt) variable of the ‘⁠x⁠’ provided. Each slot, corresponding to each numeric variable, stores the same slots of the fdt.default described above.

Author(s)

Faria, J. C.
Allaman, I. B
Jelihovschi, E. G.

Examples

library(fdth)

#========
# Vector
#========
x <- rnorm(n=1e3,
           mean=5,
           sd=1)

str(x)

# x
(ft <- fdt(x))

# x, alternative breaks
(ft <- fdt(x,
           breaks='Scott'))

# x, k
(ft <- fdt(x,
           k=10))

# x, star, end
range(x)

(ft <- fdt(x,
           start=floor(min(x)),
           end=floor(max(x) + 1)))

# x, start, end, h
(ft <- fdt(x,
           start=floor(min(x)),
           end=floor(max(x) + 1),
           h=1))

# Effect of right
sort(x <- rep(1:3, 3))

(ft <- fdt(x,
           start=1,
           end=4,
           h=1))

(ft <- fdt(x,
           start=0,
           end=3,
           h=1,
           right=TRUE))

#================================================
# Data.frame: multivariated with two categorical
#================================================
mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, TRUE),
                  c2=as.factor(sample(1:10, 1e2, TRUE)),
                  n1=c(NA, NA, rnorm(96, 10, 1), NA, NA),
                  n2=rnorm(100, 60, 4),
                  n3=rnorm(100, 50, 4),
                  stringsAsFactors=TRUE)

head(mdf)

#(ft <- fdt(mdf))  # Error message due to presence of NA values

(ft <- fdt(mdf,
           na.rm=TRUE))

str(mdf)

# By factor
(ft <- fdt(mdf,
           k=5,
           by='c1',
           na.rm=TRUE))

# choose FD criteria               
(ft <- fdt(mdf,
           breaks='FD',
           by='c1',
           na.rm=TRUE))

# k
(ft <- fdt(mdf,
           k=5,
           by='c2',
           na.rm=TRUE))

(ft <- fdt(iris,
           k=10))

(ft <- fdt(iris,
           k=5,
           by='Species'))

#=========================
# Matrices: multivariated
#=========================
(ft <-fdt(state.x77))

summary(ft,
        format=TRUE)

summary(ft,
        format=TRUE,
        pattern='%.2f')

[Package fdth version 1.3-0 Index]