R: Empirical parameterization via null distributions

null_fit_optim {netcom}

R Documentation

Empirical parameterization via null distributions

Description

Helper function to find the best fitting version of a mechanism by searching across the null distributions associated with a process + parameter combination.

Usage

null_fit_optim(
  parameter,
  process,
  network,
  net_size,
  iters,
  neighborhood,
  directed,
  DD_kind,
  DD_weight,
  net_kind,
  mechanism_kind,
  method,
  size_different,
  power_max,
  connectance_max,
  divergence_max,
  best_fit_sd,
  max_norm,
  cause_orientation,
  cores,
  null_dist_trim,
  ks_dither,
  ks_alternative,
  verbose = FALSE
)

Arguments

`parameter`	The parameter being tested for its ability to generate networks alike the input 'network'.
`process`	Name of mechanism. Currently only "ER", "PA", "DD", "DM" "SW", and "NM" are supported. Future versions will accept user-defined network-generating functions and associated parameters. ER = Erdos-Renyi random. PA = Preferential Attachment. DD = Duplication and Divergence. DM = Duplication and Mutation. SW = Small World. NM = Niche Model.
`network`	The network being compared to a hypothesized 'process' with a given 'parameter' value.
`net_size`	Number of nodes in the network.
`iters`	Number of replicates in the null distribution. Note that length(null_dist) = ((iters^2)-iters)/2.
`neighborhood`	The range of nodes that form connected communities. Note: This implementation results in overlap of communities.
`directed`	Whether the target network is directed.
`DD_kind`	A vector of network properties to be used to compare networks.
`DD_weight`	A vector of weights for the relative importance of the network properties in DD_kind being used to compare networks. Should be the same length as DD_kind.
`net_kind`	If the network is an adjacency matrix ("matrix") or an edge list ("list").
`mechanism_kind`	Either "canonical" or "grow" can be used to simulate networks. If "grow" is used, note that here it will only simulate pure mixtures made of a single mechanism.
`method`	This determines the method used to compare networks at the heart of the classification. Currently "DD" (Degree Distribution) and "align" (the align function which compares networks by the entropy of diffusion on them) are supported. Future versions will allow user-defined methods.
`size_different`	If there is a difference in the size of the networks used in the null distribution.
`power_max`	The maximum power of attachment in the Preferential Attachment process (PA).
`connectance_max`	The maximum connectance parameter for the Niche Model.
`divergence_max`	The maximum divergence parameter for the Duplication and Divergence/Mutation mechanisms.
`best_fit_sd`	Standard Deviation used to simulate networks with a similar but not identical best fit parameter. This is important because simulating networks with the identical parameter artificially inflates the false negative rate by assuming the best fit parameter is the true parameter. For large resolution and reps values this will become true, but also computationally intractable for realistically large systems.
`max_norm`	Binary variable indicating if each network property should be normalized so its max value (if a node-level property) is one.
`cause_orientation`	The orientation of directed adjacency matrices.
`cores`	The number of cores to run the classification on. When set to 1 parallelization will be ignored.
`null_dist_trim`	= Number between zero and one that determines how much of each network comparison distribution (unknown network compared to simulated networks, simulated networks compared to each other) should be used. Prevents p-value convergence with large sample sizes. Defaults to 1, which means all comparisons are used (no trimming).
`ks_dither`	The KS test cannot compute exact p-values when every pairwise network distance is not unique. Adding small amounts of noise makes each distance unique. We are not aware of a study on the impacts this has on accuracy so it is set to zero by default.
`ks_alternative`	Governs the KS test. Assuming best_fit_sd is not too large, this can be set to "greater" because the target network cannot be more alike identically simulated networks than they are to each other. In practice we have found "greater" and "less" produce numerical errors. Only "two.sided", "less", and "greater" are supported through stats::ks.test().
`verbose`	Defaults to TRUE. Whether to print all messages.

Details

Note: Currently each process is assumed to have a single governing parameter.

Value

A number measuring how different the input network is from the parameter + process combination.

References

Langendorf, R. E., & Burgess, M. G. (2020). Empirically Classifying Network Mechanisms. arXiv preprint arXiv:2012.15863.

Examples

# Import netcom
library(netcom)

# Adjacency matrix
size <- 10
network <- matrix(sample(c(0,1), size = size^2, replace = TRUE), nrow = size, ncol = size)

# Calculate how similar the input network is to Small-World networks with 
# a rewiring probability of 0.28.
null_fit_optim(
     parameter = 0.28, 
     process = "SW", 
     network = network, 
     net_size = 12, 
     iters = 20,
     neighborhood = max(1, round(0.1 * net_size)),
     net_kind = "matrix", 
     mechanism_kind = "grow", 
     power_max = 5, 
     connectance_max = 0.5, 
     divergence_max = 0.5, 
     cores = 1, 
     directed = TRUE, 
     method = "DD", 
     size_different = FALSE,
     cause_orientation = "row", 
     DD_kind = c(
         "in", "out", "entropy_in", "entropy_out", 
         "clustering_coefficient", "page_rank", "communities"
     ), 
     DD_weight = 1, 
     best_fit_sd = 0,
     max_norm = FALSE,
     null_dist_trim = 0,
     ks_dither = 0,
     ks_alternative = "two.sided",
     verbose = FALSE
)

[Package netcom version 2.1.7 Index]