maxsimset {optpart}R Documentation

Maximally Similar Sets Analysis

Description

Maximally similar sets is an approach to deriving relatively homogeneous subsets of objects as determined by similarity of the composition of the objects. Maximally similar sets are a covering, as opposed to a partition, of objects. The sets so derived can be tested against random sets of the same size to determine whether a vector of independent data exhibits an improbably restricted distribution within the sets.

Usage

maxsimset(dist,size=NULL,alphac=NULL,mean=FALSE)
mss.test(mss, env, panel = 'all', main = deparse(substitute(env)), 
         ...)
## S3 method for class 'mss'
plot(x, ...)
## S3 method for class 'mss'
getsets(mss)

Arguments

dist

a dist object from dist, dsvdis, or vegdist

size

the size of desired sets

alphac

the alpha-cut to specify maximum dissimilarity for inclusion in a set

mean

if mean is FALSE (the default), the algorithm uses a furthest neighbor criterion; if mean is TRUE, it uses a mean similarity criterion

mss

an object of class ‘mss’

env

a quantitative environmental variable for analysis

main

a title for the plot of mss.test

panel

an integer switch to indicate which panel to draw

x

an object of class ‘mss’ from maxsimset

...

ancillary arguments for ‘plot’

Details

maxsimset starts with each sample as a seed, and adds the most similar plot to the set. Plots are added in turn to the set (up to the size specified, or to the maximum dissimilarity specified) in order of maximum similarity. If mean is FALSE, the sample most similar to set is the sample with the max-min similarity, that is, the sample whose minimum similarity to the set if highest, equivalent to furthest-neighbor or complete-linkage in cluster analysis. If mean is TRUE, the sample most similar to a set is the sample with highest mean similarity to the set. Once the sets are determined for each seed, the list is examined for duplicate sets, which are deleted, to return the list of unique sets.

If ‘alphac’ is specified, sets are grown to maximum size, or to maximum dissimilarity as specified by alphac, whichever is smaller.

The ‘mss.test’ function analyzes within-set variability in attributes of the objects other than those used to calculate the similarity relation. If maximally similar sets exhibit a narrower range of values than expected at random it may be that the variable analyzed has an underlying role in determining the attributes on which the similarity is calculated. The function ‘plot’ plots the sorted within-set range of values in red, and the sorted range of values of random sets of the same size in black. This followed by a boxplot of within-set values for the random replicates versus the observed sets, and calculates a Wilcoxon rank sum test of the difference.

‘getsets’ expands and pulls out the maximally similar sets as a list of logical membership vectors for use in other analyses.

Value

an object of class ‘mss’, a list with elements:

musubx

a matrix of sample membership in the sets where membership is given by the similarity with which a sample joined the set

member

a list of set members in the order they were added to the set

numset

the number of unique sets derived

size

the number of members in each set

distname

the name of the dissimilarity/distance object employed

Author(s)

David W. Roberts droberts@montana.edu

Examples

data(shoshveg)
data(shoshsite)
dis.bc <- dsvdis(shoshveg,'bray/curtis')
mss.10 <- maxsimset(dis.bc,10)
## Not run: mss.test(mss.10,shoshsite$elevation) 
      # plots graph and produces summary

[Package optpart version 3.0-3 Index]