find.tree.popset {poolfstat} | R Documentation |
Find sets of populations that may used as scaffold tree
Description
Find sets of populations that may used as scaffold tree
Usage
find.tree.popset(
fstats,
f3.zcore.threshold = -1.65,
f4.zscore.absolute.threshold = 1.96,
excluded.pops = NULL,
nthreads = 1,
verbose = TRUE
)
Arguments
fstats |
Object of class fstats containing estimates of fstats (see the function compute.fstats) |
f3.zcore.threshold |
The significance threshold for Z-score of formal test of admixture based on the F3-statistics (default=-2) |
f4.zscore.absolute.threshold |
The significance threshold for |Z-score| of formal test of treeness based on the F4-statistics (default=2) |
excluded.pops |
Vector of pop names to be exclude from the exploration |
nthreads |
Number of available threads for parallelization of some part of the parsing (default=1, i.e., no parallelization) |
verbose |
If TRUE extra information is printed on the terminal |
Details
The procedure first discards all the populations P that shows a significant signal of admixture with a Z-score for F3 statistics of the form F3(P;Q,R) < f3.zscore.thresholds. It then identifies all the sets of populations that pass the F4-based treeness with themselves. More precisely, for a given set E containing n populations, the procedure ensure that all the n(n-1)(n-2)(n-3)/8 possible F4 quadruplets have a |Z-score|<f4.zscore.absolute.threshold. The function aims at maximizing the size of the sets.
Value
A list with the following elements:
"n.sets": The number of sets of (scaffold) unadmixed populations identified
"set.size": The number of populations included in each set
"pop.sets": A character matrix of n.sets rows and set.size columns giving for each set identified the names of the included populations.
"Z_f4.range": A matrix of n.sets rows and 2 columns reported for each set the range of variation (min and max value) of the absolute F4 Z-scores for the quadruplets passing the treeness test. More precisely, for a given set consisting of n=set.size populations, a total of n(n-1)(n-2)(n-3)/8 quadruplets can be formed. Yet, any set of four populations A, B, C and D is represented by three quadruplets A,B;C,D (or one of its seven other equivalent combinations formed by permuting each pairs); A,C;B,D (or one of its seven other equivalent combinations) and A,D;B,C (or one of its seven other combinations). Among these three, only a single quadruplet is expected to pass the treeness test (i.e., if the correct unrooted tree topology is (A,C;B,D), then the absoulte value of the Z-scores associated to F4(A,B;C,D) and F4(A,D;B,C) or their equivalent will be high.
"passing.quadruplets": A matrix of n.sets rows and set.size columns reporting for each sets the n(n-1)(n-2)(n-3)/24 quadruplets that pass the treeness test (see Z_f4.range detail).
See Also
see compute.fstats
.
Examples
make.example.files(writing.dir=tempdir())
pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
res.fstats=compute.fstats(pooldata,nsnp.per.bjack.block = 50)
#NOTE: toy example (in practice nsnp.per.bjack.block should be higher)
popsets=find.tree.popset(res.fstats,f3.zcore.threshold=-3)