Title: | Inferring demographic history from genetic data |
---|---|
Description: | admixtools is a set of methods used to infer demographic history from genetic data using f-statistics. It was originally implemented as a command line program in C. This is a new implementation of the programs qpDstat, qpAdm, qpGraph, and contains several new features. It is efficient because it precomputes f2-statistics which can be re-used for many analyses. |
Authors: | Robert Maier [aut, cre], Nick Patterson [aut] |
Maintainer: | Robert Maier <[email protected]> |
License: | GPL-3 |
Version: | 2.0.8 |
Built: | 2024-11-02 00:54:12 UTC |
Source: | https://github.com/uqrmaie1/admixtools |
admixtools is a collection of several programs for calculating f-statistics
and using them to make inferences about demographic history from genetic data
Aside from the core functions qp3pop
, qpdstat
, qpadm
,
and qpgraph
, there are also wrapper functions and parsing functions around the original
ADMIXTOOLS software, as well as functions to read genotype files and precompute the data necessary
to quickly compute f statistics.
Robert Maier <[email protected]>
Nick Patterson
Patterson, N. et al. (2012) Ancient admixture in human history. Genetics
https://uqrmaie1.github.io/admixtools/index.html
This is intended for computing allele count data for a large number of individuals,
too many to do everything in working memory.
It assumes that the allele frequencies have already been computed and are stored in .rds
files,
split into consecutive blocks for a set of individuals.
afs_to_counts( genodir, outdir, chunk1, chunk2, overwrite = FALSE, verbose = TRUE )
afs_to_counts( genodir, outdir, chunk1, chunk2, overwrite = FALSE, verbose = TRUE )
genodir |
Directory with genotype files split in chunks created by |
outdir |
Directory where allele count data will be stored |
chunk1 |
Index of the first chunk of individuals |
chunk2 |
Index of the second chunk of individuals |
overwrite |
Overwrite existing files (default |
verbose |
Print progress updates |
extract_counts
Does the same thing in one step for smaller data.
This is intended for computing f2-statistics for a large number of populations,
too many to do everything in working memory.
It assumes that the allele frequencies have already been computed and are stored in .rds
files,
split into consecutive blocks for a set of populations. This function calls write_f2
,
which takes a (sub-)chunk of pairwise f2-statistics, and writes one pair at a time to disk.
afs_to_f2( afdir, outdir, chunk1, chunk2, blgsize = 0.05, snpwt = NULL, overwrite = FALSE, type = "f2", poly_only = FALSE, snpdat = NULL, apply_corr = TRUE, verbose = TRUE )
afs_to_f2( afdir, outdir, chunk1, chunk2, blgsize = 0.05, snpwt = NULL, overwrite = FALSE, type = "f2", poly_only = FALSE, snpdat = NULL, apply_corr = TRUE, verbose = TRUE )
afdir |
Directory with allele frequency and counts |
outdir |
Directory where data will be stored |
chunk1 |
Index of the first chunk of populations |
chunk2 |
Index of the second chunk of populations |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
snpwt |
A vector of scaling factors applied to the f2-statistics for each SNP. The length has to match the number of SNPs. |
overwrite |
Overwrite existing files (default |
verbose |
Print progress updates |
extract_f2
Does the same thing in one step for smaller data.
## Not run: afdir = 'tmp_af_dir/' f2dir = 'f2_dir' extract_afs('path/to/packedancestrymap_prefix', afdir) numchunks = length(list.files(afdir, 'afs.+rds')) # numchunks should be the number of split allele frequency files for(i in 1:numchunks) { for(j in i:numchunks) { afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = j) } } ## End(Not run) # Alternatively, the following code will do the same, while submitting each chunk as a separate job. # (if \code{\link[future]{plan}} has been set up appropriately) ## Not run: furrr::future_map(1:numchunks, ~{i=.; map(i:numchunks, ~{ afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = .) })}) ## End(Not run)
## Not run: afdir = 'tmp_af_dir/' f2dir = 'f2_dir' extract_afs('path/to/packedancestrymap_prefix', afdir) numchunks = length(list.files(afdir, 'afs.+rds')) # numchunks should be the number of split allele frequency files for(i in 1:numchunks) { for(j in i:numchunks) { afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = j) } } ## End(Not run) # Alternatively, the following code will do the same, while submitting each chunk as a separate job. # (if \code{\link[future]{plan}} has been set up appropriately) ## Not run: furrr::future_map(1:numchunks, ~{i=.; map(i:numchunks, ~{ afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = .) })}) ## End(Not run)
This function takes a allele frequency data and computes blocked f2 statistics for all population pairs,
which are written to outdir
. for each SNP is computed as
, where
and
are
allele frequencies in populations
and
, and
and
is the number of
non-missing haplotypes in populations
and
. See
details
afs_to_f2_blocks( afdat, maxmem = 8000, blgsize = 0.05, pops1 = NULL, pops2 = NULL, outpop = NULL, outdir = NULL, overwrite = FALSE, afprod = TRUE, fst = TRUE, poly_only = c("f2"), apply_corr = apply_corr, n_cores = 1, verbose = TRUE )
afs_to_f2_blocks( afdat, maxmem = 8000, blgsize = 0.05, pops1 = NULL, pops2 = NULL, outpop = NULL, outdir = NULL, overwrite = FALSE, afprod = TRUE, fst = TRUE, poly_only = c("f2"), apply_corr = apply_corr, n_cores = 1, verbose = TRUE )
afdat |
A list with three items with the same SNP in each row (generated by
|
maxmem |
split up allele frequency data into blocks, if memory requirements exceed |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
outpop |
If specified, f2-statistics will be weighted by heterozygosity in this population |
outdir |
Directory into which to write f2 data (if |
overwrite |
Should existing files be overwritten? Only relevant if |
poly_only |
Exclude sites with identical allele frequencies in all populations. Can be different for f2-statistics, allele frequency products, and fst. Should be a character vector of length three, with some subset of |
verbose |
Print progress updates |
pop1 |
|
pop2 |
|
For each population pair, each of the resutling values
(
is around 700 in practice) is the mean
estimate across all SNPs except the ones in block
.
is a correction term which makes the estimates
unbiased at low sample sizes.
## Not run: afdat = plink_to_afs('/my/geno/prefix') afs_to_f2_blocks(afdat, outdir = '/my/f2/data/') ## End(Not run)
## Not run: afdat = plink_to_afs('/my/geno/prefix') afs_to_f2_blocks(afdat, outdir = '/my/f2/data/') ## End(Not run)
agraph
is the format used by the admixturegraph
packge. igraph
is used by the admixtools
package
agraph_to_igraph(agraph)
agraph_to_igraph(agraph)
agraph |
An admixture graph in |
An admixture graph in igraph
format
## Not run: agraph_to_igraph(agraph) ## End(Not run)
## Not run: agraph_to_igraph(agraph) ## End(Not run)
Generate a list of bootstrap resampled arrays
boo_list(arr, nboot = dim(arr)[3])
boo_list(arr, nboot = dim(arr)[3])
arr |
3d array with blocked estimates, with blocks in the 3rd dimension |
nboot |
Number of bootstrap iterations |
A list of bootstrap resampled arrays, with 3rd dimension equal to nboot
loo_list
est_to_boo
qpgraph_resample_snps2
Takes the bootstrap score distribution of two fits on the same populations and tests whether the scores of one graph are significantly higher or lower than the scores of the other graph.
compare_fits(scores1, scores2)
compare_fits(scores1, scores2)
scores1 |
Scores for the first graph. Each score should be for same model evaluated on different bootstrap samples of the SNP blocks. See |
scores2 |
Scores for the second graph, evaluated on the same bootstrap samples as the first graph |
A list with statistics comparing the two models
p_emp
: The two-sided bootstrap p-value testing whether the scores of one model are higher or lower than the scores of the other model. It is two times the fraction of bootstrap replicates in which model 1 has a lower score than model 2 (or vice-vera, whichever is lower). This is simply a bootstrap test for testing whether some quantity (the the score difference of two models in this case) is significantly different from zero.
p_emp_nocorr
: p_emp
is truncated to be no less than the reciprocal of the number of bootstrap iterations (the length of the score vectors). p_emp_nocorr
is not truncated and can be equal to 0.
ci_low
: The 2.5% quantile of distribution of score differences
ci_high
: The 97.5% quantile of distribution of score differences
The other items in this list are less important and can be ignored. In particular, p
is not as reliable as p_emp
because it assumes that the score differences follow a normal distribution.
## Not run: fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100) compare_fits(fits[[1]]$score, fits[[2]]$score) ## End(Not run)
## Not run: fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100) compare_fits(fits[[1]]$score, fits[[2]]$score) ## End(Not run)
Takes two data frames with model fits computed on two graphs for on the same populations and tests whether the scores of one graph are significantly better than the scores of the other
compare_fits2(fits1, fits2, boot = FALSE)
compare_fits2(fits1, fits2, boot = FALSE)
fits1 |
The fits of the first graph |
fits2 |
The fits of the second graph |
boot |
should match the |
## Not run: nblocks = dim(example_f2_blocks)[3] train = sample(1:nblocks, round(nblocks/2)) fits1 = qpgraph_resample_snps(example_f2_blocks[,,train], graph = graph1, f2_blocks_test = example_f2_blocks[,,-train]) fits2 = qpgraph_resample_snps(example_f2_blocks[,,train], graph = graph2, f2_blocks_test = example_f2_blocks[,,-train]) compare_fits2(fit1, fit2) ## End(Not run)
## Not run: nblocks = dim(example_f2_blocks)[3] train = sample(1:nblocks, round(nblocks/2)) fits1 = qpgraph_resample_snps(example_f2_blocks[,,train], graph = graph1, f2_blocks_test = example_f2_blocks[,,-train]) fits2 = qpgraph_resample_snps(example_f2_blocks[,,train], graph = graph2, f2_blocks_test = example_f2_blocks[,,-train]) compare_fits2(fit1, fit2) ## End(Not run)
Takes two data frames with model fits computed on two graphs for on the same populations and tests whether the scores of one graph are significantly better than the scores of the other.
compare_fits4(fit1, fit2, f2_blocks, f2_blocks_test, boot = FALSE, seed = NULL)
compare_fits4(fit1, fit2, f2_blocks, f2_blocks_test, boot = FALSE, seed = NULL)
fit1 |
The fit of the first graph |
fit2 |
The fit of the second graph |
f2_blocks |
f2 blocks used for fitting |
f2_blocks_test |
f2 blocks which were not used for fitting |
boot |
If |
seed |
Random seed used if |
## Not run: nblocks = dim(example_f2_blocks)[3] train = sample(1:nblocks, round(nblocks/2)) fit1 = qpgraph(example_f2_blocks[,,train], graph1) fit2 = qpgraph(example_f2_blocks[,,train], graph2) compare_fits4(fit1, fit2, example_f2_blocks[,,train], example_f2_blocks[,,-train]) ## End(Not run)
## Not run: nblocks = dim(example_f2_blocks)[3] train = sample(1:nblocks, round(nblocks/2)) fit1 = qpgraph(example_f2_blocks[,,train], graph1) fit2 = qpgraph(example_f2_blocks[,,train], graph2) compare_fits4(fit1, fit2, example_f2_blocks[,,train], example_f2_blocks[,,-train]) ## End(Not run)
This function adds up all block lengths (number of SNPs in each SNP blocks), which are stored in the names along the third dimension of the array. When the f2-statistics were computed while setting maxmiss
to values greater than 0, it is possible that not all f2-statistics are based on the same number of SNPs. In that case, the value returned by this function is merely an upper bound on the number of SNPs used.
count_snps(f2_blocks)
count_snps(f2_blocks)
f2_blocks |
A 3d array of per block f2-statistics |
The total number of SNPs across all blocks
agraph
is the format used by the admixturegraph
packge. igraph
is used by the admixtools
package
count_zero_edges(edges, epsilon = 1e-06)
count_zero_edges(edges, epsilon = 1e-06)
edges |
Edges data frame from fitted admixture graph |
epsilon |
Every edge with length |
The number of edges with length < epsilon
## Not run: fit = qpgraph(example_f2_blocks, example_igraph) count_zero_edges(fit$edges) ## End(Not run)
## Not run: fit = qpgraph(example_f2_blocks, example_igraph) count_zero_edges(fit$edges) ## End(Not run)
Returns all trees which can be reached through one iteration of subtree-prune-and-regraft on any graph component tree
decomposed_tree_neighbors(graph)
decomposed_tree_neighbors(graph)
graph |
An admixture graph |
A data frame with all trees
Delete an admixture edge
delete_admix(graph, from = NULL, to = NULL)
delete_admix(graph, from = NULL, to = NULL)
graph |
An admixture graph |
from |
Edge source node |
to |
Edge target node |
Admixture graph with one deleted edge
This function deletes data for groups created by group_samples
delete_groups(dir, groups = NULL, verbose = TRUE)
delete_groups(dir, groups = NULL, verbose = TRUE)
dir |
Directory with precomputed individual pair data |
groups |
Groups to delete. Defaults to all groups |
verbose |
Print progress updates |
Invisibly returns sample IDs in deleted groups as character vector
## Not run: dir = 'my/f2/dir/' inds = c('ind1', 'ind2', 'ind3', 'ind4', 'ind5') pops = c('pop_A', 'pop_A', 'pop_A', 'pop_B', 'pop_B') group_samples(dir, inds, pops) ## End(Not run)
## Not run: dir = 'my/f2/dir/' inds = c('ind1', 'ind2', 'ind3', 'ind4', 'ind5') pops = c('pop_A', 'pop_A', 'pop_A', 'pop_B', 'pop_B') group_samples(dir, inds, pops) ## End(Not run)
Remove population from graph
delete_leaf(graph, leaf)
delete_leaf(graph, leaf)
graph |
An admixture graph |
leaf |
Population to be removed |
Admixture graph with removed population
This is used to revert simplify_graph.
desimplify_graph(graph)
desimplify_graph(graph)
graph |
An admixture graph |
simple = simplify_graph(example_igraph) desimple = desimplify_graph(simple) plot_graph(simple) plot_graph(desimple)
simple = simplify_graph(example_igraph) desimple = desimplify_graph(simple) plot_graph(simple) plot_graph(desimple)
Convert data frame graph to igraph
edges_to_igraph(edges)
edges_to_igraph(edges)
edges |
An admixture graph as an edge list data frame |
An igraph
object
Read allele frequencies from EIGENSTRAT files
eigenstrat_to_afs( pref, inds = NULL, pops = NULL, numparts = 100, adjust_pseudohaploid = TRUE, verbose = TRUE )
eigenstrat_to_afs( pref, inds = NULL, pops = NULL, numparts = 100, adjust_pseudohaploid = TRUE, verbose = TRUE )
pref |
Prefix of EIGENSTRAT files (files have to end in |
inds |
Individuals from which to compute allele frequencies |
pops |
Populations from which to compute allele frequencies. If |
numparts |
Number of parts into which the genotype file is split. Lowering this number can speed things up, but will take more memory. |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
verbose |
Print progress updates |
A list with three data frames: allele frequency data, allele counts, and SNP metadata
## Not run: afdat = eigenstrat_to_afs(prefix, pops = pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
## Not run: afdat = eigenstrat_to_afs(prefix, pops = pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
This works for any statistics which, when computed across N
blocks, are equal
to the weighted mean of the statistics across the N
blocks.
est_to_boo(arr, nboot = dim(arr)[3], block_lengths = NULL)
est_to_boo(arr, nboot = dim(arr)[3], block_lengths = NULL)
arr |
3d array with blocked estimates, with blocks in the 3rd dimension. |
nboot |
Number of bootstrap iterations |
A 3d array with bootstrap estimates. The first two dimensions are equal to those of arr
.
The 3rd dimension is equal to nboot
.
This works for any statistics which, when computed across N
blocks, are equal
to the weighted mean of the statistics across the N
blocks.
est_to_loo(arr, block_lengths = NULL)
est_to_loo(arr, block_lengths = NULL)
arr |
3d array with blocked estimates, with blocks in the 3rd dimension |
block_lengths |
Optional block lengths. If |
A 3d array with leave-one-out estimates for jackknife. Dimensions are equal to those of arr
.
Data frame with sample annotations
example_anno
example_anno
A data frame with sample annotations
Blocked f2-statistics for 7 populations
example_f2_blocks
example_f2_blocks
A 3d array with populations along the first two dimensions, and SNP blocks along the 3rd dimension
Admixture graph for 7 populations
example_graph
example_graph
A two column matrix where each row represents one edge
Admixture graph for 7 populations
example_igraph
example_igraph
An igraph
object
Data frame with one fitted admixture graph
example_opt
example_opt
A data frame with a fitted admixture graph, generated by find_graphs_old
example_graph fitted using qpGraph
example_qpgraph_ref_results
example_qpgraph_ref_results
A list with parsed qpGraph output
Data frame with population triples
example_triples
example_triples
A data frame with population triples, generated by summarize_triples
Prepare data for various ADMIXTOOLS 2 functions. Reads data from packedancestrymap or PLINK files,
and computes allele frequencies for selected populations and stores it as .rds
files in outdir.
extract_afs( pref, outdir, inds = NULL, pops = NULL, cols_per_chunk = 10, numparts = 100, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, auto_only = TRUE, transitions = TRUE, transversions = TRUE, keepsnps = NULL, format = NULL, poly_only = FALSE, adjust_pseudohaploid = TRUE, verbose = TRUE )
extract_afs( pref, outdir, inds = NULL, pops = NULL, cols_per_chunk = 10, numparts = 100, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, auto_only = TRUE, transitions = TRUE, transversions = TRUE, keepsnps = NULL, format = NULL, poly_only = FALSE, adjust_pseudohaploid = TRUE, verbose = TRUE )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
outdir |
Directory where data will be stored. |
inds |
Individuals for which data should be extracted |
pops |
Populations for which data should be extracted. If both |
cols_per_chunk |
Number of populations per chunk. Lowering this number will lower the memory requirements when running |
numparts |
Number of parts in which genotype data will be read for computing allele frequencies |
maxmiss |
Discard SNPs which are missing in a fraction of populations higher than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
minac2 |
Discard SNPs with allele count lower than 2 in any population (default |
outpop |
Keep only SNPs which are heterozygous in this population |
auto_only |
Keep only SNPs on chromosomes 1 to 22 |
transitions |
Set this to |
transversions |
Set this to |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
poly_only |
Specify whether SNPs with identical allele frequencies in every population should be discarded ( |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
verbose |
Print progress updates |
SNP metadata (invisibly)
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_afs(pref, outdir) ## End(Not run)
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_afs(pref, outdir) ## End(Not run)
Prepare data for various ADMIXTOOLS 2 functions. Reads data from packedancestrymap or PLINK files,
and computes allele frequencies for selected populations and stores it as .rds
files in outdir.
extract_afs_simple( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, cols_per_chunk = 10, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, transitions = TRUE, transversions = TRUE, keepsnps = NULL, format = NULL, poly_only = FALSE, adjust_pseudohaploid = TRUE, verbose = TRUE )
extract_afs_simple( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, cols_per_chunk = 10, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, transitions = TRUE, transversions = TRUE, keepsnps = NULL, format = NULL, poly_only = FALSE, adjust_pseudohaploid = TRUE, verbose = TRUE )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
outdir |
Directory where data will be stored. |
inds |
Individuals for which data should be extracted |
pops |
Populations for which data should be extracted. If both |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
cols_per_chunk |
Number of populations per chunk. Lowering this number will lower the memory requirements when running |
maxmiss |
Discard SNPs which are missing in a fraction of populations higher than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
minac2 |
Discard SNPs with allele count lower than 2 in any population (default |
outpop |
Keep only SNPs which are heterozygous in this population |
transitions |
Set this to |
transversions |
Set this to |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
poly_only |
Specify whether SNPs with identical allele frequencies in every population should be discarded ( |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
verbose |
Print progress updates |
SNP metadata (invisibly)
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_afs(pref, outdir) ## End(Not run)
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_afs(pref, outdir) ## End(Not run)
Prepare data for various ADMIXTOOLS 2 functions. This function reads data from genotype files,
and extracts data required to compute blocked f-statistics for any sets of samples. The data consists of
.rds
files with total and alternative allele counts for each individual, and products of total
and alternative allele counts for each pair.
The function calls packedancestrymap_to_afs
or plink_to_afs
and afs_to_f2_blocks
.
extract_counts( pref, outdir, inds = NULL, blgsize = 0.05, maxmiss = 0, minmaf = 0, maxmaf = 0.5, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, maxmem = 8000, overwrite = FALSE, format = NULL, cols_per_chunk = NULL, verbose = TRUE )
extract_counts( pref, outdir, inds = NULL, blgsize = 0.05, maxmiss = 0, minmaf = 0, maxmaf = 0.5, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, maxmem = 8000, overwrite = FALSE, format = NULL, cols_per_chunk = NULL, verbose = TRUE )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
outdir |
Directory where data will be stored. |
inds |
Individuals for which data should be read. Defaults to all individuals |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
maxmiss |
Discard SNPs which are missing in a fraction of individuals greater than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
transitions |
Set this to |
transversions |
Set this to |
auto_only |
Keep only SNPs on chromosomes 1 to 22 |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
maxmem |
Maximum amount of memory to be used. If the required amount of memory exceeds |
overwrite |
Overwrite existing files in |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
cols_per_chunk |
Number of genotype chunks to store on disk. Setting this to a positive integer makes the function slower, but requires less memory. The default value for |
verbose |
Print progress updates |
This function prepares data for various other ADMIXTOOLS 2 functions. It reads data from genotype files,
computes allele frequencies and blocked f2-statistics for selected populations, and writes the results to outdir
.
extract_f2( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, maxmem = 8000, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, pops2 = NULL, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, overwrite = FALSE, format = NULL, adjust_pseudohaploid = TRUE, cols_per_chunk = NULL, fst = TRUE, afprod = TRUE, poly_only = c("f2"), apply_corr = TRUE, qpfstats = FALSE, n_cores = 1, verbose = TRUE, ... )
extract_f2( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, maxmem = 8000, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, pops2 = NULL, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, overwrite = FALSE, format = NULL, adjust_pseudohaploid = TRUE, cols_per_chunk = NULL, fst = TRUE, afprod = TRUE, poly_only = c("f2"), apply_corr = TRUE, qpfstats = FALSE, n_cores = 1, verbose = TRUE, ... )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
outdir |
Directory where data will be stored. |
inds |
Individuals for which data should be extracted |
pops |
Populations for which data should be extracted. If both |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
maxmem |
Maximum amount of memory to be used. If the required amount of memory exceeds |
maxmiss |
Discard SNPs which are missing in a fraction of populations higher than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
minac2 |
Discard SNPs with allele count lower than 2 in any population (default |
pops2 |
If specified, only a pairs between |
outpop |
Keep only SNPs which are heterozygous in this population |
outpop_scale |
Scale f2-statistics by the inverse |
transitions |
Set this to |
transversions |
Set this to |
auto_only |
Keep only SNPs on chromosomes 1 to 22 |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
overwrite |
Overwrite existing files in |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
cols_per_chunk |
Number of allele frequency chunks to store on disk. Setting this to a positive integer makes the function slower, but requires less memory. The default value for |
fst |
Write files with pairwise FST for every population pair. Setting this to FALSE can make |
afprod |
Write files with allele frequency products for every population pair. Setting this to FALSE can make |
poly_only |
Specify whether SNPs with identical allele frequencies in every population should be discarded ( |
apply_corr |
Apply small-sample-size correction when computing f2-statistics (default |
qpfstats |
Compute smoothed f2-statistics (default |
n_cores |
Parallelize computation across |
verbose |
Print progress updates |
... |
Pass arguments to |
SNP metadata (invisibly)
f2_from_precomp
for reading the stored f2-statistics back
into R, f2_from_geno
to skip writting f2-statistics to disk and return them directly
## Not run: pref = 'my/genofiles/prefix' f2dir = 'my/f2dir/' extract_f2(pref, f2dir, pops = c('popA', 'popB', 'popC')) ## End(Not run)
## Not run: pref = 'my/genofiles/prefix' f2dir = 'my/f2dir/' extract_f2(pref, f2dir, pops = c('popA', 'popB', 'popC')) ## End(Not run)
extract_f2_large
does the same as extract_f2
, but it requires less memory and is slower. outdir
has to be set in extract_f2_large
.
extract_f2_large( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, cols_per_chunk = 10, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, keepsnps = NULL, snpblocks = NULL, overwrite = FALSE, format = NULL, adjust_pseudohaploid = TRUE, afprod = TRUE, fst = TRUE, poly_only = c("f2"), apply_corr = TRUE, verbose = TRUE )
extract_f2_large( pref, outdir, inds = NULL, pops = NULL, blgsize = 0.05, cols_per_chunk = 10, maxmiss = 0, minmaf = 0, maxmaf = 0.5, minac2 = FALSE, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, keepsnps = NULL, snpblocks = NULL, overwrite = FALSE, format = NULL, adjust_pseudohaploid = TRUE, afprod = TRUE, fst = TRUE, poly_only = c("f2"), apply_corr = TRUE, verbose = TRUE )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
outdir |
Directory where data will be stored. |
inds |
Individuals for which data should be extracted |
pops |
Populations for which data should be extracted. If both |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
cols_per_chunk |
Number of populations per chunk. Lowering this number will lower the memory requirements when running |
maxmiss |
Discard SNPs which are missing in a fraction of populations higher than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
minac2 |
Discard SNPs with allele count lower than 2 in any population (default |
outpop |
Keep only SNPs which are heterozygous in this population |
outpop_scale |
Scale f2-statistics by the inverse |
transitions |
Set this to |
transversions |
Set this to |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
overwrite |
Overwrite existing files in |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
afprod |
Write files with allele frequency products for every population pair. Setting this to FALSE can make |
fst |
Write files with pairwise FST for every population pair. Setting this to FALSE can make |
poly_only |
Specify whether SNPs with identical allele frequencies in every population should be discarded ( |
apply_corr |
Apply small-sample-size correction when computing f2-statistics (default |
verbose |
Print progress updates |
extract_f2_large
requires less memory because it writes allele frequency data to disk, and doesn't store the allele frequency matrix for all populations and SNPs in memory. If you still run out of memory, reduce cols_per_chunk
. This function is a wrapper around extract_afs
and afs_to_f2
, and is slower than extract_f2
. It may be faster to call extract_afs
and afs_to_f2
directly, parallelizing over the different calls to afs_to_f2
.
SNP metadata (invisibly)
## Not run: pref = 'my/genofiles/prefix' f2dir = 'my/f2dir/' extract_f2_large(pref, f2dir, pops = c('popA', 'popB', 'popC')) ## End(Not run)
## Not run: pref = 'my/genofiles/prefix' f2dir = 'my/f2dir/' extract_f2_large(pref, f2dir, pops = c('popA', 'popB', 'popC')) ## End(Not run)
Copy a subset of f2-statistics to a new directory
extract_f2_subset(from, to, pops, verbose = TRUE)
extract_f2_subset(from, to, pops, verbose = TRUE)
from |
Directory with f2-statistics |
to |
Target directory |
pops |
Populations to copy |
verbose |
Print progress updates |
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_f2_subset(pref, outdir) ## End(Not run)
## Not run: pref = 'my/genofiles/prefix' outdir = 'dir/for/afdata/' extract_f2_subset(pref, outdir) ## End(Not run)
This function reads PLINK files, extracts a subset of samples, and writes new PLINK files using write_plink
. It's probably slower than running the equivalent command in PLINK directly, but it can be useful to do this from within R.
When inds
or pops
is provided, only a subset of samples will be extracted.
extract_samples( inpref, outpref, inds = NULL, pops = NULL, overwrite = FALSE, verbose = TRUE )
extract_samples( inpref, outpref, inds = NULL, pops = NULL, overwrite = FALSE, verbose = TRUE )
inpref |
Prefix of the PLINK input files |
outpref |
Prefix of the PLINK output files |
inds |
Individuals which should be extracted |
pops |
Populations which should be extracted. Can not be provided together with |
overwrite |
Set this to |
verbose |
Print progress updates |
Computes f2 statistics from f2 blocks of the form
f2( data, pop1 = NULL, pop2 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, verbose = FALSE, ... )
f2( data, pop1 = NULL, pop2 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, verbose = FALSE, ... )
data |
Input data in one of three forms:
|
pop1 |
One of the following four:
|
pop2 |
A vector of population labels |
boot |
If |
sure |
The number of population combinations can get very large. This is a safety option that stops you from accidently computing all combinations if that number is large. |
unique_only |
If |
verbose |
Print progress updates |
... |
Additional arguments passed to |
f2
returns a data frame with f2 statistics
Patterson, N. et al. (2012) Ancient admixture in human history Genetics
Peter, B. (2016) Admixture, Population Structure, and F-Statistics Genetics
pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') f2(example_f2_blocks, pop1, pop2) ## Not run: f2(f2_dir, pop1, pop2) ## End(Not run)
pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') f2(example_f2_blocks, pop1, pop2) ## Not run: f2(f2_dir, pop1, pop2) ## End(Not run)
This function prepares data for various other ADMIXTOOLS 2 functions. It reads data from genotype files, computes allele frequencies and blocked f2-statistics for selected populations, and returns them as a 3d array.
f2_from_geno( pref, inds = NULL, pops = NULL, blgsize = 0.05, maxmem = 8000, maxmiss = 0, minmaf = 0, maxmaf = 0.5, pops2 = NULL, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, afprod = FALSE, fst = FALSE, poly_only = c("f2"), format = NULL, adjust_pseudohaploid = TRUE, remove_na = TRUE, apply_corr = TRUE, qpfstats = FALSE, verbose = TRUE, ... )
f2_from_geno( pref, inds = NULL, pops = NULL, blgsize = 0.05, maxmem = 8000, maxmiss = 0, minmaf = 0, maxmaf = 0.5, pops2 = NULL, outpop = NULL, outpop_scale = TRUE, transitions = TRUE, transversions = TRUE, auto_only = TRUE, keepsnps = NULL, afprod = FALSE, fst = FALSE, poly_only = c("f2"), format = NULL, adjust_pseudohaploid = TRUE, remove_na = TRUE, apply_corr = TRUE, qpfstats = FALSE, verbose = TRUE, ... )
pref |
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in |
inds |
Individuals for which data should be extracted |
pops |
Populations for which data should be extracted. If both |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
maxmem |
Maximum amount of memory to be used. If the required amount of memory exceeds |
maxmiss |
Discard SNPs which are missing in a fraction of populations higher than |
minmaf |
Discard SNPs with minor allele frequency less than |
maxmaf |
Discard SNPs with minor allele frequency greater than than |
pops2 |
If specified, only a pairs between |
outpop |
Keep only SNPs which are heterozygous in this population |
outpop_scale |
Scale f2-statistics by the inverse |
transitions |
Set this to |
transversions |
Set this to |
auto_only |
Keep only SNPs on chromosomes 1 to 22 |
keepsnps |
SNP IDs of SNPs to keep. Overrides other SNP filtering options |
afprod |
Return negative average allele frequency products instead of f2-statistics.
Setting |
fst |
Write files with pairwise FST for every population pair. Setting this to FALSE can make |
poly_only |
Specify whether SNPs with identical allele frequencies in every population should be discarded ( |
format |
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
apply_corr |
Apply small-sample-size correction when computing f2-statistics (default |
qpfstats |
Compute smoothed f2-statistics (default |
verbose |
Print progress updates |
... |
Pass arguments to |
A 3d array of f2-statistics (or scaled allele frequency products if afprod = TRUE
)
f2_from_precomp
for reading previously stored f2-statistics into R, extract_f2
for storing f2-statistics on disk
This function generates an msprime simulation script, executes it in python, and turns the resulting genotype data into f2-statistics
f2_from_msprime(..., blgsize = 0.05, cleanup = TRUE, verbose = TRUE)
f2_from_msprime(..., blgsize = 0.05, cleanup = TRUE, verbose = TRUE)
Read blocked f2 statistics from disk
f2_from_precomp( dir, inds = NULL, pops = NULL, pops2 = NULL, afprod = FALSE, fst = FALSE, return_array = TRUE, apply_corr = TRUE, remove_na = TRUE, verbose = TRUE )
f2_from_precomp( dir, inds = NULL, pops = NULL, pops2 = NULL, afprod = FALSE, fst = FALSE, return_array = TRUE, apply_corr = TRUE, remove_na = TRUE, verbose = TRUE )
dir |
Directory with precomputed f2 statistics, or precomputed individual pair data |
inds |
Individuals for which data should be read. Defaults to all individuals, which may require a lot of memory. |
pops |
Populations for which data should be read. Defaults to all populations, which may require a lot of memory. |
pops2 |
Optional second vector of populations. Useful if a f4 statistics of a few against many populations should be computed. |
afprod |
Return negative average allele frequency products instead of f2 estimates. This will result in more precise f4-statistics when the original data had large amounts of missingness, and should be used in that case for |
return_array |
Return a 3d array (default). If false, a data frame will be returned. |
apply_corr |
Subtract the f2 correction factor. Setting this to |
remove_na |
Remove blocks with missing values |
verbose |
Print progress updates |
A 3d array of f2 statistics
## Not run: dir = 'my/f2/dir/' f2_blocks = f2_from_precomp(dir, pops = c('pop1', 'pop2', 'pop3')) ## End(Not run)
## Not run: dir = 'my/f2/dir/' f2_blocks = f2_from_precomp(dir, pops = c('pop1', 'pop2', 'pop3')) ## End(Not run)
Turn f2 data to f4 data
f2dat_f4dat(f2dat, popcomb = NULL)
f2dat_f4dat(f2dat, popcomb = NULL)
f2dat |
A data frame of f2-statistics with columns |
A data frame with f4-statistics
Compute per-block f3-statistics directly from genotype data
f3blockdat_from_geno( pref, popcombs, auto_only = TRUE, blgsize = 0.05, block_lengths = NULL, allsnps = FALSE, adjust_pseudohaploid = TRUE, poly_only = FALSE, apply_corr = TRUE, outgroupmode = FALSE, verbose = TRUE )
f3blockdat_from_geno( pref, popcombs, auto_only = TRUE, blgsize = 0.05, block_lengths = NULL, allsnps = FALSE, adjust_pseudohaploid = TRUE, poly_only = FALSE, apply_corr = TRUE, outgroupmode = FALSE, verbose = TRUE )
pref |
Prefix of genotype files |
popcombs |
A data frame with one population combination per row, and columns |
auto_only |
Use only chromosomes 1 to 22. |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
block_lengths |
An optional vector with block lengths. If |
allsnps |
Use all SNPs with allele frequency estimates in every population of any given population quadruple. If |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
apply_corr |
With |
outgroupmode |
With |
verbose |
Print progress updates |
A data frame with per-block f4-statistics for each population quadruple.
Compute f4 from allele frequencies
f4_from_afdat(afdat, popcombs)
f4_from_afdat(afdat, popcombs)
afdat |
A data frame with allele frequencies and SNP metadata. Can be grouped. |
popcombs |
A data frame with population combinations. Columns |
## Not run: # Compute f4 for all mutatation classes separately afs = plink_to_afs('/my/geno/prefix', pops = c('p1', 'p2', 'p3', 'p4', 'p5')) afdat = bind_cols(afs$snpfile, afs$afs %>% as_tibble()) %>% mutate(gr = paste0(pmin(A1, A2), pmax(A1, A2))) %>% group_by(gr) popcombs = tibble(pop1 = c('p1', 'p5'), pop2 = 'p2', pop3 = 'p3', pop4 = 'p4') out = f4_from_afdat(afdat, popcombs) out %>% ggplot(aes(gr, est)) + geom_point() + geom_errorbar(aes(ymin = est - se, ymax = est + se)) + facet_wrap(~paste(pop1, pop2, pop3, pop4), scales = 'free') ## End(Not run)
## Not run: # Compute f4 for all mutatation classes separately afs = plink_to_afs('/my/geno/prefix', pops = c('p1', 'p2', 'p3', 'p4', 'p5')) afdat = bind_cols(afs$snpfile, afs$afs %>% as_tibble()) %>% mutate(gr = paste0(pmin(A1, A2), pmax(A1, A2))) %>% group_by(gr) popcombs = tibble(pop1 = c('p1', 'p5'), pop2 = 'p2', pop3 = 'p3', pop4 = 'p4') out = f4_from_afdat(afdat, popcombs) out %>% ggplot(aes(gr, est)) + geom_point() + geom_errorbar(aes(ymin = est - se, ymax = est + se)) + facet_wrap(~paste(pop1, pop2, pop3, pop4), scales = 'free') ## End(Not run)
This function turns per-block f2-statistics into per-block f4-statistics of the form f4(pop1, pop2; pop3, pop4)
f4_from_f2(f2_data, pop1, pop2 = NULL, pop3 = NULL, pop4 = NULL)
f4_from_f2(f2_data, pop1, pop2 = NULL, pop3 = NULL, pop4 = NULL)
f2_data |
A 3d array with blocked f2 statistics, output of |
pop1 |
Either the name(s) of the first population(s), or a four column matrix with the names of all four populations. |
pop2 |
Population 2 (same length as |
pop3 |
Population 3 (same length as |
pop4 |
Population 4 (same length as |
A matrix of per-block f4-statistics (popcomb x block
)
Compute per-block f4-statistics directly from genotype data
f4blockdat_from_geno( pref, popcombs = NULL, left = NULL, right = NULL, auto_only = TRUE, blgsize = 0.05, block_lengths = NULL, f4mode = TRUE, allsnps = FALSE, poly_only = FALSE, snpwt = NULL, keepsnps = NULL, verbose = TRUE )
f4blockdat_from_geno( pref, popcombs = NULL, left = NULL, right = NULL, auto_only = TRUE, blgsize = 0.05, block_lengths = NULL, f4mode = TRUE, allsnps = FALSE, poly_only = FALSE, snpwt = NULL, keepsnps = NULL, verbose = TRUE )
pref |
Prefix of genotype files |
popcombs |
A data frame with one population combination per row, and columns |
left |
Populations on the left side of f4 ( |
right |
Populations on the right side of f4 ( |
auto_only |
Use only chromosomes 1 to 22. |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
block_lengths |
An optional vector with block lengths. If |
f4mode |
If |
allsnps |
Use all SNPs with allele frequency estimates in every population of any given population quadruple. If |
poly_only |
Only keep SNPs with mean allele frequency not equal to 0 or 1 (default |
snpwt |
A vector of SNP weights |
keepsnps |
A vector of SNP IDs to keep |
verbose |
Print progress updates |
A data frame with per-block f4-statistics for each population quadruple.
Turn f4 block data to 3d array
f4blockdat_to_f4blocks(f4blockdat, remove_na = TRUE)
f4blockdat_to_f4blocks(f4blockdat, remove_na = TRUE)
f4blockdat |
f4 block data frame generated by |
remove_na |
Remove blocks with missing values |
Find admixture edges
find_admixedges(graph)
find_admixedges(graph)
graph |
An admixture graph |
A data frame with columns from
and to
with admixture edges
This function generates and evaluates admixture graphs in numgen
iterations
to find well fitting admixturegraphs.
find_graphs( data, numadmix = 0, outpop = NULL, stop_gen = 100, stop_gen2 = 15, stop_score = 0, stop_sec = NULL, initgraph = NULL, numgraphs = 10, mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once, flipadmix_random, place_root_random, mutate_n), opt_worst_residual = FALSE, plusminus_generations = 5, return_searchtree = FALSE, admix_constraints = NULL, event_constraints = NULL, reject_f4z = 0, max_admix = numadmix, verbose = TRUE, ... )
find_graphs( data, numadmix = 0, outpop = NULL, stop_gen = 100, stop_gen2 = 15, stop_score = 0, stop_sec = NULL, initgraph = NULL, numgraphs = 10, mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once, flipadmix_random, place_root_random, mutate_n), opt_worst_residual = FALSE, plusminus_generations = 5, return_searchtree = FALSE, admix_constraints = NULL, event_constraints = NULL, reject_f4z = 0, max_admix = numadmix, verbose = TRUE, ... )
data |
Input data in one of three forms:
|
numadmix |
Number of admixture events within each graph. (Only relevant if |
outpop |
Name of the outgroup population |
stop_gen |
Total number of generations after which to stop |
stop_gen2 |
Number of generations without improvement after which to stop |
stop_score |
Stop once this score has been reached |
stop_sec |
Number of seconds after which to stop |
initgraph |
Graph to start with. If it is specified, |
numgraphs |
Number of graphs in each generation |
mutfuns |
Functions used to modify graphs. Defaults to the following:
|
opt_worst_residual |
Optimize for lowest worst residual instead of best score. |
plusminus_generations |
If the best score does not improve after |
return_searchtree |
Return the search tree in addition to the models. Output will be a list with three items: models, search tree, search tree as data frame |
admix_constraints |
A data frame with constraints on the number of admixture events for each population.
See |
event_constraints |
A data frame with constraints on the order of events in an admixture graph.
See |
reject_f4z |
If this is a number greater than zero, all f4-statistics with |
max_admix |
Maximum number of admixture edges. By default, this number is equal to |
verbose |
Print progress updates |
... |
Additional arguments passed to |
A nested data frame with one model per line
## Not run: res = find_graphs(example_f2_blocks, numadmix = 2) res %>% slice_min(score) ## End(Not run) ## Not run: # Start with a graph with 0 admixture events, increase up to 3, and stop after 10 generations of no improvement pops = dimnames(example_f2_blocks)[[1]] initgraph = random_admixturegraph(pops, 0, outpop = 'Chimp.REF') res = find_graphs(example_f2_blocks, initgraph = initgraph, stop_gen2 = 10, max_admix = 3) res %>% slice_min(score) ## End(Not run)
## Not run: res = find_graphs(example_f2_blocks, numadmix = 2) res %>% slice_min(score) ## End(Not run) ## Not run: # Start with a graph with 0 admixture events, increase up to 3, and stop after 10 generations of no improvement pops = dimnames(example_f2_blocks)[[1]] initgraph = random_admixturegraph(pops, 0, outpop = 'Chimp.REF') res = find_graphs(example_f2_blocks, initgraph = initgraph, stop_gen2 = 10, max_admix = 3) res %>% slice_min(score) ## End(Not run)
This function generates and evaluates admixture graphs in numgen
iterations across numrep
independent repeats
to find well fitting admixturegraphs. It uses the function future_map
to parallelize across the independent repeats. The function plan
can be called
to specify the details of the parallelization. This can be used to parallelize across cores or across nodes on
a compute cluster. Setting numadmix
to 0 will search for well fitting trees, which is much faster than searching
for admixture graphs with many admixture nodes.
find_graphs_old( data, pops = NULL, outpop = NULL, numrep = 1, numgraphs = 50, numgen = 5, numsel = 5, numadmix = 0, numstart = 1, keep = c("all", "best", "last"), initgraphs = NULL, mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once, flipadmix_random, mutate_n), mutprobs = NULL, opt_worst_residual = FALSE, store_intermediate = NULL, parallel = TRUE, stop_after = NULL, verbose = TRUE, ... )
find_graphs_old( data, pops = NULL, outpop = NULL, numrep = 1, numgraphs = 50, numgen = 5, numsel = 5, numadmix = 0, numstart = 1, keep = c("all", "best", "last"), initgraphs = NULL, mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once, flipadmix_random, mutate_n), mutprobs = NULL, opt_worst_residual = FALSE, store_intermediate = NULL, parallel = TRUE, stop_after = NULL, verbose = TRUE, ... )
data |
Input data in one of three forms:
|
pops |
Populations for which to fit admixture graphs (default all) |
outpop |
An outgroup population which will split at the root from all other populations in all tested graphs. If one of the populations is know to be an outgroup, designating it as |
numrep |
Number of independent repetitions (each repetition can be run in parallel) |
numgraphs |
Number of graphs in each generation |
numgen |
Number of generations |
numsel |
Number of graphs which are selected in each generation. Should be less than |
numadmix |
Number of admixture events within each graph |
numstart |
Number of random initializations in each call to |
keep |
Which models should be returned. One of
|
initgraphs |
Optional graph or list of igraphs to start with. If |
mutfuns |
Functions used to modify graphs. Defaults to the following:
See examples for how to make new mutation functions. |
mutprobs |
Relative frequencies of each mutation function.
|
opt_worst_residual |
Optimize for lowest worst residual instead of best score. |
store_intermediate |
Path and prefix of files for intermediate results to |
parallel |
Parallelize over repeats (if |
stop_after |
Stop optimization after |
verbose |
Print progress updates |
... |
Additional arguments passed to |
A nested data frame with one model per line
## Not run: find_graphs_old(example_f2_blocks, numrep = 200, numgraphs = 100, numgen = 20, numsel = 5, numadmix = 3) ## End(Not run) ## Not run: # Making new mutation functions by modifying or combining existing ones: newfun1 = function(graph, ...) mutate_n(graph, 3, ...) newfun2 = function(graph, ...) flipadmix_random(spr_leaves(graph, ...), ...) find_graphs_old(f2_blocks, mutfuns = namedList(spr_leaves, newfun1, newfun2), mutprobs = c(0.2, 0.3, 0.5)) ## End(Not run)
## Not run: find_graphs_old(example_f2_blocks, numrep = 200, numgraphs = 100, numgen = 20, numsel = 5, numadmix = 3) ## End(Not run) ## Not run: # Making new mutation functions by modifying or combining existing ones: newfun1 = function(graph, ...) mutate_n(graph, 3, ...) newfun2 = function(graph, ...) flipadmix_random(spr_leaves(graph, ...), ...) find_graphs_old(f2_blocks, mutfuns = namedList(spr_leaves, newfun1, newfun2), mutprobs = c(0.2, 0.3, 0.5)) ## End(Not run)
Find possible new edges
find_newedges(graph, fix_outgroup = TRUE, all = TRUE)
find_newedges(graph, fix_outgroup = TRUE, all = TRUE)
graph |
An admixture graph |
A data frame with columns from
and to
. New edges which begin above from
and end above to
could be inserted
find_normedges
find_admixedges
Find drift edges
find_normedges(graph, exclude_first = FALSE)
find_normedges(graph, exclude_first = FALSE)
graph |
An admixture graph |
exclude_first |
Do not return edge from root to outgroup |
A data frame with columns from
and to
with drift edges
Modify a graph flipping the direction of an admixture edge
flipadmix_random(graph, fix_outgroup = TRUE)
flipadmix_random(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
fix_outgroup |
Keep outgroup in place (has no effect here) |
A new admixture graph
This function reads Fst from a directory with precomputed f2-statistics, and turns per-block data
into estimates and standard errors for each population pair. See details
for how Fst is computed.
fst(data, pop1 = NULL, pop2 = NULL, boot = FALSE, verbose = FALSE, ...)
fst(data, pop1 = NULL, pop2 = NULL, boot = FALSE, verbose = FALSE, ...)
data |
Input data in one of three forms:
|
pop1 |
One of the following four:
|
pop2 |
A vector of population labels |
boot |
If |
verbose |
Print progress updates |
... |
Additional arguments passed to |
The Hudson Fst estimator used here is described in the two publications below.
For two populations with estimated allele frequency vectors p1
and p2
,
and allele count vectors n1
and n2
, it is calculated as follows:num = (p1 - p2)^2 - p1*(1-p1)/(n1-1) - p2*(1-p2)/(n2-1)
denom = p1 + p2 - 2*p1*p2
fst = mean(num)/mean(denom)
This is done independently for each SNP block, and is stored on disk for each population pair.
Jackknifing or bootstrapping across these per-block estimates yields the overall estimates and standard errors.
Reich, D. (2009) Reconstructing Indian population history Nature
Bhatia, G. (2013) Estimating and interpreting Fst: the impact of rare variants Genome Research
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') fst(f2_dir, pop1, pop2) ## End(Not run)
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') fst(f2_dir, pop1, pop2) ## End(Not run)
This functions generates all possible admixture graphs with a set number of admixture events for a given set of leaf nodes. It's pretty slow, and may not terminate in reasonable time for more than 5 leaves and 2 admixture events. The function is similar to the all_graphs
function in the admixturegraph
package, but there are a few differences:
The function does not return graphs with fewer than nadmix
admixture events
The function does not return most graphs which are unidentifiable and would have equal fits as simpler identifiable graphs (for example it does not return graphs where a node is expanded to a loop)
The function does not return duplicated graphs, as identified by the graph_hash
function
The function generates unique graphs which are missing in the output of all_graphs
generate_all_graphs(leaves, nadmix = 0, verbose = TRUE)
generate_all_graphs(leaves, nadmix = 0, verbose = TRUE)
leaves |
The leaf nodes |
nadmix |
The number of admixture nodes |
verbose |
Print progress updates |
A list of graphs in igraph
format
all_graphs
, generate_all_trees
, graph_hash
## Not run: graphs = generate_all_graphs(letters[1:4], 1) ## End(Not run)
## Not run: graphs = generate_all_graphs(letters[1:4], 1) ## End(Not run)
This functions generates all possible trees with for a given set of leaf nodes.
generate_all_trees(leaves)
generate_all_trees(leaves)
leaves |
The leaf nodes |
A list of trees in igraph
format
A new block begins at the SNP after the first SNP which is not within blgsize
of the start of the last block.
dat
needs to be ordered first by 'CHR', then by 'POS' or 'cm'
get_block_lengths(dat, blgsize = 0.05, cpp = TRUE, verbose = TRUE)
get_block_lengths(dat, blgsize = 0.05, cpp = TRUE, verbose = TRUE)
dat |
Data frame with columns 'CHR' and either 'POS' or 'cm' |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). If |
cpp |
Should the faster C++ version be used? |
verbose |
Print progress updates |
A numeric vector where the ith element lists the number of SNPs in the ith block.
## Not run: prefix = 'path/to/packedancestrymap_prefix' pops = c('pop1', 'pop2', 'pop3') afdat = packedancestrymap_to_afs(prefix, pops = pops) block_lengths = get_block_lengths(afdat) ## End(Not run)
## Not run: prefix = 'path/to/packedancestrymap_prefix' pops = c('pop1', 'pop2', 'pop3') afdat = packedancestrymap_to_afs(prefix, pops = pops) block_lengths = get_block_lengths(afdat) ## End(Not run)
Turns f2_data into f2_blocks
get_f2(f2_data, pops = NULL, pops2 = NULL, afprod = FALSE, verbose = TRUE, ...)
get_f2(f2_data, pops = NULL, pops2 = NULL, afprod = FALSE, verbose = TRUE, ...)
f2_data |
f2 data as genotype file prefix, f2 directory, or f2 blocks |
pops |
Populations for which to extract f2-stats (defaults to all) |
pops2 |
Optional second vector of populations. Can result in non-square array |
afprod |
Get allele frequency products |
verbose |
Print progress updates |
... |
Additional arguments passed to |
A 3d array with f2-statistics
Get the population names of a graph
get_leafnames(graph)
get_leafnames(graph)
graph |
An admixture graph |
Population names
Get the outgroup from a graph (if it exists)
get_outpop(graph)
get_outpop(graph)
graph |
An admixture graph |
Outgroup name
Get the root name
get_rootname(graph)
get_rootname(graph)
graph |
An admixture graph |
Root name
Add a population to an admixture graph
graph_addleaf(graph, pop)
graph_addleaf(graph, pop)
graph |
An admixture graph |
pop |
Population to add to the graph |
Admixture graph with the added population
insert_leaf
adds pop at a specific position
Computes a distance estimate for each graph pair. Each graph is first summarized as a vector which counts for every leaf pair how many internal nodes reach that pair. The distance between two graphs is the Euclidean distance between the vectors of two graphs, and is scaled to fall between 0 and 1.
graph_distances(graphlist)
graph_distances(graphlist)
graphlist |
List of graphs |
A data frame with graph distances
This function generates and evaluates admixture graphs in numgen
iterations
to find well fitting admixturegraphs.
graph_equations( graph, substitute = TRUE, nam = c("a", "e", "f"), return_everything = FALSE )
graph_equations( graph, substitute = TRUE, nam = c("a", "e", "f"), return_everything = FALSE )
graph |
An admixture graph |
substitute |
Should edge names be represented by shorter symbols? |
nam |
Symbols used to shorten edge names |
A list with two data frames: equations
holds the equtions for all f2-statistics; coding
has the mapping from edge names to edge symbols, which is used when substitute = TRUE
This function takes an igraph object and turns it into a function that takes edge weights as input, and outputs the expected f2-statistics.
graph_f2_function( graph, admix_default = 0.5, drift_default = 0.01, random_defaults = FALSE )
graph_f2_function( graph, admix_default = 0.5, drift_default = 0.01, random_defaults = FALSE )
graph |
An admixture graph |
admix_default |
The default weights for admixture edges |
drift_default |
The default weights for drift edges |
random_defaults |
Set default weights randomly for each edge between 0 and 1 |
A function mapping edge weights to f2-statistics
## Not run: mygraph = graph_f2_function(example_igraph) mygraph(N3N8 = 0.1, `N2N1|Vindija.DG` = 0.4) ## End(Not run)
## Not run: mygraph = graph_f2_function(example_igraph) mygraph(N3N8 = 0.1, `N2N1|Vindija.DG` = 0.4) ## End(Not run)
Find all valid graphs which result from flipping one admixture edge
graph_flipadmix(graph)
graph_flipadmix(graph)
graph |
Admixture graph in |
A data frame with columns from
, to
, and graph
## Not run: newgraphs = graph_flipadmix(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
## Not run: newgraphs = graph_flipadmix(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
This can be used to check if two graphs are identical. For two graphs to be identical, the topology and leaf nodes have to match, but internal node names do not matter.
graph_hash(graph)
graph_hash(graph)
graph |
Admixture graph in igraph format |
A hash string of the admixture graph
Find all graphs which result from removing one admixture edge
graph_minusone(graph, ntry = Inf)
graph_minusone(graph, ntry = Inf)
graph |
Admixture graph in |
A data frame with columns from
, to
, and graph
## Not run: newgraphs = graph_minusone(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
## Not run: newgraphs = graph_minusone(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
Find all graphs which result from adding and removing one admixture edge
graph_minusplus(graph)
graph_minusplus(graph)
graph |
Admixture graph in |
A data frame with columns source_from
, source_to
, dest_from
, dest_to
, and graph
## Not run: newgraphs = graph_minusplus(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
## Not run: newgraphs = graph_minusplus(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
Find all graphs which result from adding one admixture edge
graph_plusone(graph, ntry = Inf)
graph_plusone(graph, ntry = Inf)
graph |
Admixture graph in |
ntry |
Specify this to return only a subset of all possible graphs with one more edge |
A data frame with columns from
, to
, and graph
## Not run: newgraphs = graph_plusone(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
## Not run: newgraphs = graph_plusone(example_igraph) # now evaluate the new graphs newgraphs %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
Find all trees which are part of the admixture graph
graph_splittrees(graph, return_admix = FALSE, simplify = TRUE)
graph_splittrees(graph, return_admix = FALSE, simplify = TRUE)
graph |
Admixture graph in |
A data frame with columns name
and graph
## Not run: trees = graph_splittrees(example_igraph) # now evaluate the trees trees %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
## Not run: trees = graph_splittrees(example_igraph) # now evaluate the trees trees %>% rowwise %>% mutate(res = list(qpgraph(example_f2_blocks, graph))) %>% unnest_wider(res) ## End(Not run)
This function performs a very crude simulation of allele frequencies under an admixture graph model
graph_to_afs( graph, nsnps = 10000, drift_default = 0.02, admix_default = 0.5, leaves_only = FALSE )
graph_to_afs( graph, nsnps = 10000, drift_default = 0.02, admix_default = 0.5, leaves_only = FALSE )
graph |
An admixture graph as igraph object, or as edge list data frame
with a column |
nsnps |
Number of SNPs to simulate |
drift_default |
Default branch lengths. Ignored if |
admix_default |
Default admixture weights. Ignored if |
leaves_only |
Return allele frequencies for leaf nodes only |
A data frame with allele frequencies
## Not run: afs = graph_to_afs(example_igraph) ## End(Not run)
## Not run: afs = graph_to_afs(example_igraph) ## End(Not run)
This function simulates PCA of allele frequencies under an admixture graph model
graph_to_pcs( graph, nsnps = 10000, drift_default = 0.02, admix_default = 0.5, leaves_only = TRUE )
graph_to_pcs( graph, nsnps = 10000, drift_default = 0.02, admix_default = 0.5, leaves_only = TRUE )
graph |
An admixture graph as igraph object, or as edge list data frame
with a column |
nsnps |
Number of SNPs to simulate |
drift_default |
Default branch lengths. Ignored if |
admix_default |
Default admixture weights. Ignored if |
leaves_only |
Return PCs for leaf nodes only |
A data frame with PCs for each population
## Not run: pcs = graph_to_pcs(example_igraph) pcs %>% ggplot(aes(PC1, PC2, label = pop)) + geom_text() + geom_point() ## End(Not run)
## Not run: pcs = graph_to_pcs(example_igraph) pcs %>% ggplot(aes(PC1, PC2, label = pop)) + geom_text() + geom_point() ## End(Not run)
This function tests which qpadm models should be valid for an admixture graph and a target population. By default, all models returned by qpadm_models
are tested. For large graphs this will be too slow, and you may want test only some models by providing the models
argument, or only a single model by providing the left
and right
arguments.
graph_to_qpadm( graph, target, left = NULL, right = NULL, models = NULL, weights = TRUE, f4dat = NULL, allpops = TRUE, more_right = TRUE, return_f4 = FALSE, eps = 1e-10 )
graph_to_qpadm( graph, target, left = NULL, right = NULL, models = NULL, weights = TRUE, f4dat = NULL, allpops = TRUE, more_right = TRUE, return_f4 = FALSE, eps = 1e-10 )
graph |
An admixture graph |
target |
Name of the target population. |
left |
Left populations (provide this optionally if you want to test only a single qpadm model) |
right |
Right populations (provide this optionally if you want to test only a single qpadm model) |
models |
A two column nested data frame with models to be evaluated, one model per row. The first column, |
weights |
Set this to |
f4dat |
A data frame of f4-statistics which can be provided to override the default branch lengths. |
allpops |
Evaluate only models which use all populations in the admixture graph. See |
return_f4 |
Include f4 statistic matrices in the results (default |
eps |
Epsilon value close to zero which is used for determining which f4 matrix elements should be considered non-zero, and which weights are strictly between 0 and 1. |
Two validity criteria are tested for each qpadm model: Rank validity and weight validity. Rank validity means that the rank of the f4 statistic matrix for only left and right populations is the same as the rank of the f4 statistic matrix that includes the target population among the left populations. Weight validity means that the estimated admixture weights for all left populations are between 0 and 1.
A data frame with one qpadm model per row and columns valid_rank
and valid_weights
indicating whether a model should be valid under the graph.
An earlier version of this function tried to use the graph topology for identifying valid qpadm models, but this didn't work reliably. Christian Huber had the idea of using the ranks of expected f4 statistic matrices instead.
qpadm_models
, graph_f2_function
## Not run: graph2 = example_igraph %>% simplify_graph() %>% delete_admix('N3N0', 'N3N4') %>% delete_admix('N3N1', 'N3N8') graph_to_qpadm(graph2, 'Mbuti.DG') %>% filter(valid_rank, valid_weights) ## End(Not run)
## Not run: graph2 = example_igraph %>% simplify_graph() %>% delete_admix('N3N0', 'N3N4') %>% delete_admix('N3N1', 'N3N8') graph_to_qpadm(graph2, 'Mbuti.DG') %>% filter(valid_rank, valid_weights) ## End(Not run)
generates new graphs from basegraph as follows:
generates all possible trees using addpops
(which are not in basegraph)
attaches trees to connection_edge, which is defined by two nodes in basegraph
adds edges originating above each edge in source_node
, to each node above addpops
graphmod_pavel(basegraph, addpops, connection_edge, source_nodes)
graphmod_pavel(basegraph, addpops, connection_edge, source_nodes)
basegraph |
an admixture graph as igraph object. (convert from edge list using |
addpops |
a vector of population labels which are not in |
connection_edge |
edge in |
source_nodes |
nodes in |
## Not run: graphlist = graphmod_pavel(example_igraph, addpops = c('pop1', 'pop2', 'pop3'), connection_edge = c('N2N0', 'N1N'), source_nodes = c('Denisova.DG', 'N2N2')) results = tibble(graph = graphlist) %>% mutate(res = map(graph, ~qpgraph(example_f2_blocks, .))) %>% unnest_wider(res) %>% mutate(worstz = map_dbl(f3, ~max(abs(.$z)))) ## End(Not run)
## Not run: graphlist = graphmod_pavel(example_igraph, addpops = c('pop1', 'pop2', 'pop3'), connection_edge = c('N2N0', 'N1N'), source_nodes = c('Denisova.DG', 'N2N2')) results = tibble(graph = graphlist) %>% mutate(res = map(graph, ~qpgraph(example_f2_blocks, .))) %>% unnest_wider(res) %>% mutate(worstz = map_dbl(f3, ~max(abs(.$z)))) ## End(Not run)
Computing f2 statistics for populations consisting of many individuals can be slow and require a lot of memory.
To speed this up, this function groups individuals into populations, computes allele counts and products of
pairwise allele counts with all individuals and groups, and writes the data to disk. f2 statistics for a
combination of grouped and ungrouped precomputed data can then be read using f2_from_precomp
,
replacing individual IDs of the grouped samples with the new group labels.
All groupings are listed in {dir}/groups/{groupname}.rds
group_samples(dir, inds, pops, overwrite = FALSE, verbose = TRUE)
group_samples(dir, inds, pops, overwrite = FALSE, verbose = TRUE)
dir |
Directory with precomputed individual pair data |
inds |
Individuals to group |
pops |
Group names, either length 1, or same length as |
overwrite |
Overwrite existing files in |
verbose |
print progress updates |
## Not run: dir = 'my/f2/dir/' inds = c('ind1', 'ind2', 'ind3', 'ind4', 'ind5') pops = c('pop_A', 'pop_A', 'pop_A', 'pop_B', 'pop_B') group_samples(dir, inds, pops) ## End(Not run)
## Not run: dir = 'my/f2/dir/' inds = c('ind1', 'ind2', 'ind3', 'ind4', 'ind5') pops = c('pop_A', 'pop_A', 'pop_A', 'pop_B', 'pop_B') group_samples(dir, inds, pops) ## End(Not run)
agraph
is the format used by the admixturegraph
packge. igraph
is used by the admixtools
package
igraph_to_agraph(igraph)
igraph_to_agraph(igraph)
agraph |
An admixture graph in |
An admixture graph in agraph
format
## Not run: igraph_to_agraph(example_igraph) ## End(Not run)
## Not run: igraph_to_agraph(example_igraph) ## End(Not run)
Insert a single edge into graph
insert_admix( graph, source_from = NULL, source_to = NULL, dest_from = NULL, dest_to = NULL, substitute = FALSE, fix_outgroup = TRUE )
insert_admix( graph, source_from = NULL, source_to = NULL, dest_from = NULL, dest_to = NULL, substitute = FALSE, fix_outgroup = TRUE )
graph |
An admixture graph |
source_from |
Parent node of the source edge |
source_to |
Child node of the source edge |
dest_from |
Parent node of the destination edge |
dest_to |
Child node of the destination edge |
substitute |
Should another edge be inserted, if the one specified doesn't work? |
Adxmiture graph with inserted edge
Insert admixture edges into graph
insert_admix_n(graph, n = 1, fix_outgroup = TRUE)
insert_admix_n(graph, n = 1, fix_outgroup = TRUE)
graph |
An admixture graph |
from |
List of nodes. New edges will originate above these nodes. |
to |
List of nodes. New edges will end above these nodes. |
substitute |
Should another edge be inserted, if the one specified doesn't work? |
Admixture graph with inserted edges
Insert admixture edges into graph
insert_admix_old( graph, fromnodes, tonodes, substitute_missing = FALSE, allow_below_admix = FALSE )
insert_admix_old( graph, fromnodes, tonodes, substitute_missing = FALSE, allow_below_admix = FALSE )
graph |
An admixture graph |
fromnodes |
List of nodes. New edges will originate above these nodes. |
tonodes |
List of nodes. New edges will end above these nodes. |
substitute_missing |
If |
allow_below_admix |
Allow insertion of edges which begin or end directly underneath an admixture node |
Adxmiture graph with inserted edges
Add population to graph
insert_leaf(graph, leaf, from, to)
insert_leaf(graph, leaf, from, to)
graph |
An admixture graph |
leaf |
Population to be added |
from |
Source node of edge onto which |
to |
Target node of edge onto which |
Admixture graph with added population
delete_leaf
, graph_addleaf
to add leaf
at any position
Test if an admixture graph is valid
is_valid(graph)
is_valid(graph)
graph |
An admixture graph |
TRUE
if graph is valid, otherwise FALSE
Find identical graphs
isomorphism_classes(igraphlist)
isomorphism_classes(igraphlist)
igraphlist |
A list with admixture graphs |
An integer vector with isomorphism classes. Graphs with the same number have identical topology (but may have different labels).
Find identical graphs
isomorphism_classes2(igraphlist)
isomorphism_classes2(igraphlist)
igraphlist |
A list with admixture graphs |
An integer vector with isomorphism classes. Graphs with the same number have identical topology and leaf labels (but may have different internal labels).
This function computes the joint site frequency spectrum from genotype files or allele frequency and count matrices. The joint site frequency spectrum lists how often each combination of haplotypes is observed. The number of combinations is equal to the product of one plus the number of haplotypes in each population. For example, five populations with a single diploid individual each have 3^5 possible combinations.
joint_sfs(afs, pref = NULL)
joint_sfs(afs, pref = NULL)
afs |
A named list of length two where the first element ( |
pref |
Instead of |
A data frame with the number of times each possible combination of allele counts is observed.
## Not run: dat = plink_to_afs('/my/plink/file', pops = c('pop1', 'pop2', 'pop3', 'pop4', 'pop5')) joint_sfs(dat) ## End(Not run)
## Not run: dat = plink_to_afs('/my/plink/file', pops = c('pop1', 'pop2', 'pop3', 'pop4', 'pop5')) joint_sfs(dat) ## End(Not run)
Estimate joint allele frequency spectrum
joint_spectrum(afs)
joint_spectrum(afs)
afs |
A matrix or data frame of allele frequencies |
A data frame with columns pattern
and proportion
## Not run: dat = plink_to_afs('/my/plink/file', pops = c('pop1', 'pop2', 'pop3', 'pop4', 'pop5')) # Spectrum across all SNPs joint_spectrum(dat$afs) # Stratify by allele frequency in one population dat$afs %>% as_tibble %>% select(1:4) %>% group_by(grp = cut(pop1, 10)) %>% group_modify(joint_spectrum) %>% ungroup # Stratify by mutation class dat$afs %>% as_tibble %>% select(1:4) %>% mutate(mut = paste(dat$snpfile$A1, dat$snpfile$A2)) %>% group_by(mut) %>% group_modify(joint_spectrum) %>% ungroup ## End(Not run)
## Not run: dat = plink_to_afs('/my/plink/file', pops = c('pop1', 'pop2', 'pop3', 'pop4', 'pop5')) # Spectrum across all SNPs joint_spectrum(dat$afs) # Stratify by allele frequency in one population dat$afs %>% as_tibble %>% select(1:4) %>% group_by(grp = cut(pop1, 10)) %>% group_modify(joint_spectrum) %>% ungroup # Stratify by mutation class dat$afs %>% as_tibble %>% select(1:4) %>% mutate(mut = paste(dat$snpfile$A1, dat$snpfile$A2)) %>% group_by(mut) %>% group_modify(joint_spectrum) %>% ungroup ## End(Not run)
Models target as a mixture of left populations, and outgroup right populations. Uses Lazaridis method based non-negative least squares of f4 matrix.
lazadm(data, left, right, target, boot = FALSE, constrained = TRUE)
lazadm(data, left, right, target, boot = FALSE, constrained = TRUE)
data |
The input data in the form of:
|
left |
Left populations (sources) |
right |
Right populations (outgroups) |
target |
Target population |
boot |
If |
constrained |
Constrain admixture weights to be non-negative |
lazadm
returns a data frame with weights and standard errors for each left population
Patterson, N. et al. (2012) Ancient admixture in human history. Genetics
Haak, W. et al. (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature (SI 9)
target = 'Denisova.DG' left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') lazadm(example_f2_blocks, left, right, target) lazadm(example_f2_blocks, left, right, target, constrained = FALSE)
target = 'Denisova.DG' left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') lazadm(example_f2_blocks, left, right, target) lazadm(example_f2_blocks, left, right, target, constrained = FALSE)
Generate a list of leave-one-out arrays
loo_list(arr)
loo_list(arr)
arr |
3d array with blocked estimates, with blocks in the 3rd dimension |
A list of leave-one-out arrays, each of which is 1 element shorter than arr
along the 3rd dimension
Inverse of est_to_loo
This works for any statistics which, when computed across N
blocks, are equal
to the weighted mean of the statistics across the N
blocks.
loo_to_est(arr, block_lengths = NULL)
loo_to_est(arr, block_lengths = NULL)
arr |
3d array with blocked estimates, with blocks in the 3rd dimension. |
block_lengths |
Optional block lengths. If |
A 3d array with leave-one-out estimates for jackknife. Dimensions are equal to those of arr
.
Modify a graph by moving an admixture edge
move_admixedge_once(graph, fix_outgroup = TRUE)
move_admixedge_once(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
fix_outgroup |
Keep outgroup in place |
A new admixture graph
This function generates an msprime simulation script, and optionally executes it in python.
Unlike msprime_sim
, this function can simulate continuous sequence (not independent SNPs) and multiple chromosomes.
msprime_genome( graph, outpref = "msprime_sim", neff = 1000, ind_per_pop = 1, mutation_rate = 1.25e-08, time = 1000, fix_leaf = FALSE, nchr = 1, recomb_rate_chr = 2e-08, seq_length = 1000, admix_default = 0.5, run = FALSE, ghost_lineages = FALSE, shorten_admixed_leaves = FALSE )
msprime_genome( graph, outpref = "msprime_sim", neff = 1000, ind_per_pop = 1, mutation_rate = 1.25e-08, time = 1000, fix_leaf = FALSE, nchr = 1, recomb_rate_chr = 2e-08, seq_length = 1000, admix_default = 0.5, run = FALSE, ghost_lineages = FALSE, shorten_admixed_leaves = FALSE )
graph |
A graph as an |
outpref |
A prefix of output files. |
neff |
Effective population size (in diploid individuals). If a scalar value, it will be constant across all populations. Alternatively, it can be a named vector with a different value for each population (e.g., |
ind_per_pop |
The number of diploid individuals to simulate for each population. If a scalar value, it will be constant across all populations.
Alternatively, it can be a named vector with a different value for each population (e.g., |
mutation_rate |
Mutation rate per site per generation. The default is |
time |
Either a scalar value (1000 generations by default) with the dates generated by |
fix_leaf |
A boolean value specifying if the dates of the leaf nodes will be fixed at time 0. If |
nchr |
Number of chromosomes to simulate. |
recomb_rate_chr |
A float value specifying recombination rate along the chromosomes. The default is |
seq_length |
The sequence length of the chromosomes. If it is a scalar value, the sequence length will be constant for all chromosomes.
Alternatively, it can be a vector with a length equal to the number of chromosomes (i.e., |
admix_default |
A float value specifying default admixture proportion for all admixture nodes. The default is |
run |
If |
ghost_lineages |
A boolean value specifying whether ghost lineages will be allowed.
If |
shorten_admixed_leaves |
If |
The file name and path of the simulation script
results = qpgraph(example_f2_blocks, example_graph) # Simulate 3 chromosomes whose lengths are 50, 100 and 100 msprime_genome(results$edges, nchr=3, seq_length=c(50, 100, 100))
results = qpgraph(example_f2_blocks, example_graph) # Simulate 3 chromosomes whose lengths are 50, 100 and 100 msprime_genome(results$edges, nchr=3, seq_length=c(50, 100, 100))
This function generates an msprime simulation script, and optionally executes it in python. Unlike the msprime_genome
function, this function simulates independent SNP sites.
msprime_sim( graph, outpref = "msprime_sim", nsnps = 1000, neff = 1000, ind_per_pop = 1, mutation_rate = 0.001, time = 1000, fix_leaf = FALSE, admix_default = 0.5, run = FALSE, numcores = NULL, ghost_lineages = FALSE, shorten_admixed_leaves = FALSE )
msprime_sim( graph, outpref = "msprime_sim", nsnps = 1000, neff = 1000, ind_per_pop = 1, mutation_rate = 0.001, time = 1000, fix_leaf = FALSE, admix_default = 0.5, run = FALSE, numcores = NULL, ghost_lineages = FALSE, shorten_admixed_leaves = FALSE )
graph |
A graph as an |
outpref |
A prefix of output files. |
nsnps |
The number of SNPs to simulate. All SNPs will be simulated independently of each other. |
neff |
Effective population size (in diploid individuals). If a scalar value, it will be constant across all populations. Alternatively, it can be a named vector with a different value for each population (e.g., |
ind_per_pop |
The number of diploid individuals to simulate for each population. If a scalar value, it will be constant across all populations.
Alternatively, it can be a named vector with a different value for each population (e.g., |
mutation_rate |
Mutation rate per site per generation. The default is set to a high value (0.001 per site per generation) to obtain more polymorphic SNPs in order to speed up the simulation. |
time |
Either a scalar value (1000 generations by default) with the dates generated by |
fix_leaf |
A boolean value specifying if the dates of the leaf nodes will be fixed at time 0. If |
admix_default |
A float value specifying default admixture proportion for all admixture nodes. The default is |
run |
If |
numcores |
The number of cores to use when simulating data. |
ghost_lineages |
A boolean value specifying whether ghost lineages will be allowed.
If |
shorten_admixed_leaves |
If |
The file name and path of the simulation script
## Not run: results = qpgraph(example_f2_blocks, example_graph) msprime_sim(results$edges, nsnps=100) ## End(Not run)
## Not run: results = qpgraph(example_f2_blocks, example_graph) msprime_sim(results$edges, nsnps=100) ## End(Not run)
Modify a graph by applying n mutation functions
mutate_n( graph, n = 2, funs = list(spr_all, spr_leaves, swap_leaves, move_admixedge_once, flipadmix_random, mutate_n), fix_outgroup = TRUE )
mutate_n( graph, n = 2, funs = list(spr_all, spr_leaves, swap_leaves, move_admixedge_once, flipadmix_random, mutate_n), fix_outgroup = TRUE )
graph |
An admixture graph |
n |
Number of functions to apply |
funs |
List of function from which to choose |
fix_outgroup |
Keep outgroup in place |
A new admixture graph
Turn a newick format tree to a matrix of edges
newick_to_edges(newick, node = "R", edgemat = matrix(NA, 0, 2))
newick_to_edges(newick, node = "R", edgemat = matrix(NA, 0, 2))
newick |
Tree in newick format. |
node |
Root label of the tree. |
edgemat |
Used for recursive function calls. |
Tree as two column matrix of edges (adjacency list)
newick = random_newick(c('a', 'b', 'c', 'd')) newick newick_to_edges(newick)
newick = random_newick(c('a', 'b', 'c', 'd')) newick newick_to_edges(newick)
Count how often each node in graph occurs in other graphs
node_counts(graph, graphlist)
node_counts(graph, graphlist)
graph |
An admixture graph |
graphlist |
List of graphs |
Can be used to determine how often internal nodes occur in a list of other well fitting models
node_signature(graph)
node_signature(graph)
graph |
An admixture graph |
A graph signature as character vector
## Not run: sigs = example_winners %>% mutate(sig = map(igraph, node_signature)) %$% sig %>% unlist %>% table %>% c node_signature(example_winners$igraph[[1]]) ## End(Not run)
## Not run: sigs = example_winners %>% mutate(sig = map(igraph, node_signature)) %$% sig %>% unlist %>% table %>% c node_signature(example_winners$igraph[[1]]) ## End(Not run)
Count number of admixture nodes
numadmix(graph)
numadmix(graph)
graph |
An admixture graph |
Number of admixture nodes
Read allele frequencies from packedancestrymap files
packedancestrymap_to_afs( pref, inds = NULL, pops = NULL, adjust_pseudohaploid = TRUE, first = 1, last = NULL, verbose = TRUE )
packedancestrymap_to_afs( pref, inds = NULL, pops = NULL, adjust_pseudohaploid = TRUE, first = 1, last = NULL, verbose = TRUE )
pref |
Prefix of packedancestrymap files (files have to end in |
inds |
Individuals from which to compute allele frequencies |
pops |
Populations from which to compute allele frequencies. If |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
verbose |
Print progress updates |
A list with three data frames: allele frequency data, allele counts, and SNP metadata
## Not run: afdat = packedancestrymap_to_afs(prefix, pops = pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
## Not run: afdat = packedancestrymap_to_afs(prefix, pops = pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
This function converts EIGENSTRAT/PACKEDANCESTRYMAP format files to PLINK files, using write_plink
.
When inds
or pops
is provided, only a subset of samples will be extracted.
This function can have a high memory footprint, because data for all SNPs will be read before writing the PLINK files.
packedancestrymap_to_plink( inpref, outpref, inds = NULL, pops = NULL, verbose = TRUE )
packedancestrymap_to_plink( inpref, outpref, inds = NULL, pops = NULL, verbose = TRUE )
inpref |
Prefix of the input files |
outpref |
Prefix of the PLINK output files |
inds |
Individuals which should be extracted |
pops |
Populations which should be extracted. Can not be provided together with |
verbose |
Print progress updates |
eigenstrat_to_plink
Read graph in dot format
parse_dot(dotfile)
parse_dot(dotfile)
dotfile |
Name of a file with a dot formatted admixture graph |
## Not run: graph = parse_dot("/my/graph.dot") ## End(Not run)
## Not run: graph = parse_dot("/my/graph.dot") ## End(Not run)
Read qp3Pop output file
parse_qp3pop_output(outfile)
parse_qp3pop_output(outfile)
outfile |
output file generated by qp3Pop. |
tibble with output data.
Read qpAdm output file
parse_qpadm_output(outfile)
parse_qpadm_output(outfile)
outfile |
Output file generated by qpAdm. |
Data frame with output data.
Read qpDstat output file
parse_qpdstat_output(outfile)
parse_qpdstat_output(outfile)
outfile |
output file generated by qpDstat. |
tibble with output data.
Read qpF4ratio output file
parse_qpf4ratio_output(parfile)
parse_qpf4ratio_output(parfile)
outfile |
Output file generated by qpF4ratio |
Data frame with output data
Read qpGraph graph file
parse_qpgraph_graphfile( graphfile, split_multi = TRUE, igraph = FALSE, uselabels = FALSE )
parse_qpgraph_graphfile( graphfile, split_multi = TRUE, igraph = FALSE, uselabels = FALSE )
graphfile |
File with admixture graph in qpGraph format. |
split_multi |
Split multifurcations |
igraph |
Convert to igraph format |
uselabels |
Should labels or full names be used? Defaults to full names. |
Graph represented as two column edge matrix. Can have four columns if edges are locked
Read qpGraph output file
parse_qpgraph_output(outfile, logfile = outfile)
parse_qpgraph_output(outfile, logfile = outfile)
outfile |
output file generated by qpGraph. |
list of output data.
Modify a graph by permuting leaf nodes
permute_leaves(graph, fix_outgroup = TRUE)
permute_leaves(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
fix_outgroup |
Keep outgroup in place |
A new admixture graph
Modify a graph by changing the position of the root node
place_root_random(graph, fix_outgroup = TRUE)
place_root_random(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
fix_outgroup |
Keep outgroup in place |
A new admixture graph
PLINK
filesRead allele frequencies from PLINK
files
plink_to_afs( pref, inds = NULL, pops = NULL, adjust_pseudohaploid = TRUE, first = 1, last = NULL, numblocks = 1, poly_only = FALSE, verbose = TRUE )
plink_to_afs( pref, inds = NULL, pops = NULL, adjust_pseudohaploid = TRUE, first = 1, last = NULL, numblocks = 1, poly_only = FALSE, verbose = TRUE )
pref |
prefix of |
inds |
Individuals from which to compute allele frequencies |
pops |
Populations from which to compute allele frequencies. If |
adjust_pseudohaploid |
Genotypes of pseudohaploid samples are usually coded as |
numblocks |
Number of blocks in which to read genotype file. Setting this to a number greater than one is more memory efficient, but slower. |
poly_only |
Only keep SNPs with mean allele frequency not equal to 0 or 1 (default |
verbose |
Print progress updates |
A list with three items: Allele frequency matrix, allele count matrix, and SNP meta data.
## Not run: afdat = plink_to_afs(prefix, pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
## Not run: afdat = plink_to_afs(prefix, pops) afs = afdat$afs counts = afdat$counts ## End(Not run)
Plot a comparison of two runs of qpgraph or two runs of qpadm
plot_comparison(out1, out2, name1 = NULL, name2 = NULL)
plot_comparison(out1, out2, name1 = NULL, name2 = NULL)
out1 |
First model |
out2 |
Second model |
name1 |
Optional name for first model |
name2 |
Optional name for second model |
A ggplot object.
fit1 = qpgraph(example_f2_blocks, example_graph, lsqmode = FALSE) fit2 = qpgraph(example_f2_blocks, example_graph, lsqmode = TRUE) plot_comparison(fit1, fit2)
fit1 = qpgraph(example_f2_blocks, example_graph, lsqmode = FALSE) fit2 = qpgraph(example_f2_blocks, example_graph, lsqmode = TRUE) plot_comparison(fit1, fit2)
Plot an admixture graph
plot_graph( graph, fix = NULL, title = "", color = TRUE, textsize = 2.5, highlight_unidentifiable = FALSE, pos = NULL, dates = NULL, neff = NULL, scale_y = FALSE, hide_weights = FALSE )
plot_graph( graph, fix = NULL, title = "", color = TRUE, textsize = 2.5, highlight_unidentifiable = FALSE, pos = NULL, dates = NULL, neff = NULL, scale_y = FALSE, hide_weights = FALSE )
graph |
An admixture graph. If it's an edge list with a |
fix |
If |
title |
A plot title |
color |
Plot it in color or greyscale |
textsize |
Size of edge and node labels |
highlight_unidentifiable |
Highlight unidentifiable edges in red. Can be slow for large graphs. See |
pos |
An optional data frame with node coordinates (columns |
dates |
An optional named vector with dates (in generations) for each node to plot dates on the y-axis (e.g., |
neff |
An optional named vector with effective population sizes for each population (e.g., |
scale_y |
If |
hide_weights |
A boolean value specifying if the drift values on the edges will be hidden. The default is |
A ggplot object
plot_graph(example_graph) # Plot a random simulation output. Show dates and population sizes on the plot out = random_sim(nleaf=5, nadmix=1) plot_graph(out$edges, dates=out$dates, neff=out$neff)
plot_graph(example_graph) # Plot a random simulation output. Show dates and population sizes on the plot out = random_sim(nleaf=5, nadmix=1) plot_graph(out$edges, dates=out$dates, neff=out$neff)
Plot an admixture graph on a map
plot_graph_map(graph, leafcoords, shapedata = NULL)
plot_graph_map(graph, leafcoords, shapedata = NULL)
graph |
a two column matrix specifying the admixture graph. first column is source node, second column is target node. the first edge has to be root -> outgroup. admixture nodes are inferred as those nodes which are the targets of two different sources. |
leafcoords |
data frame with columns |
shapedata |
shapedata |
a plotly object
plot_graph_map(example_igraph, example_anno)
plot_graph_map(example_igraph, example_anno)
Plot samples on a map
plot_map( leafcoords, map_layout = 1, color = "yearsbp", colorscale = "Portland", collog10 = TRUE )
plot_map( leafcoords, map_layout = 1, color = "yearsbp", colorscale = "Portland", collog10 = TRUE )
leafcoords |
data frame with columns |
map_layout |
1 or 2 |
color |
color |
colorscale |
colorscale |
collog10 |
collog10 |
a plotly object.
## Not run: plot_map(example_anno, 1) ## End(Not run)
## Not run: plot_map(example_anno, 1) ## End(Not run)
Plot a comparison of two runs of qpgraph or two runs of qpadm
plotly_comparison(out1, out2, name1 = NULL, name2 = NULL)
plotly_comparison(out1, out2, name1 = NULL, name2 = NULL)
out1 |
First model |
out2 |
Second model |
name1 |
Optional name for first model |
name2 |
Optional name for second model |
A plotly object.
fit1 = qpgraph(example_f2_blocks, example_graph, lsqmode = FALSE) fit2 = qpgraph(example_f2_blocks, example_graph, lsqmode = TRUE) plotly_comparison(fit1, fit2)
fit1 = qpgraph(example_f2_blocks, example_graph, lsqmode = FALSE) fit2 = qpgraph(example_f2_blocks, example_graph, lsqmode = TRUE) plotly_comparison(fit1, fit2)
Plot an admixture graph using plotly
plotly_graph( graph, collapse_threshold = 0, fix = FALSE, print_highlow = FALSE, highlight_unidentifiable = FALSE, pos = NULL, nudge_y = -0.1, annot = "" )
plotly_graph( graph, collapse_threshold = 0, fix = FALSE, print_highlow = FALSE, highlight_unidentifiable = FALSE, pos = NULL, nudge_y = -0.1, annot = "" )
graph |
An admixture graph |
collapse_threshold |
Collapse nodes if they are separated by less than this distance (for fitted graphs) |
fix |
If |
highlight_unidentifiable |
Highlight unidentifiable edges in red. Can be slow for large graphs. See |
pos |
Optional data frame with node coordinates (columns |
A plotly object
plotly_graph(example_graph)
plotly_graph(example_graph)
This function assigns a date (in generations) to each node in an admixture graph and is used in msprime_sim
and msprime_genome
.
The date of the node will correspond to the y-coordinate of that node used for plotting in plotly_graph
,
unless the fix_leaf
option is set to TRUE
, in which case the dates of all leaf nodes returned by get_leafnames
will be set to 0.
pseudo_dates(graph, time = 1000, fix_leaf = FALSE)
pseudo_dates(graph, time = 1000, fix_leaf = FALSE)
graph |
An admixture graph |
time |
A scalar by which y-coordinate values will be multiplied to get dates |
fix_leaf |
A boolean specifying if the dates of the leaf nodes will be fixed at time 0 (i.e., at the most recent time).
If |
A named vector with pseudo dates for each graph node
Computes f3 statistics of the form . This is generally equivalent to
and to
.
However, the exact estimates depend on the type of input data, selection of SNPs,
and on the values of some of the arguments, which are described below.
qp3pop( data, pop1 = NULL, pop2 = NULL, pop3 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, blgsize = NULL, block_lengths = NULL, verbose = FALSE, ... )
qp3pop( data, pop1 = NULL, pop2 = NULL, pop3 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, blgsize = NULL, block_lengths = NULL, verbose = FALSE, ... )
data |
Input data in one of three forms:
|
pop1 |
One of the following four:
|
pop2 |
A vector of population labels |
pop3 |
A vector of population labels |
boot |
If |
sure |
The number of population combinations can get very large. This is a safety option that stops you from accidently computing all combinations if that number is large. |
unique_only |
If |
verbose |
Print progress updates |
... |
Additional arguments passed to |
apply_corr |
With |
outgroupmode |
With |
poly_only |
Only keep SNPs with mean allele frequency not equal to 0 or 1. Defaults to |
The default values of the parameters apply_corr
, outgroupmode
, poly_only
depend on the first argument (data
), and they can affect the estimated f3-statistics. When the data
is the prefix of genotype data, the default parameters are generally the same as in the original qp3pop program. When data
is an array of pre-computed f2-statistics, the parameters may be set while computing f2-statistics. If the first population is a single pseudodiploid sample, it is not possible to get unbiased estimates of f3. To get biased estimates for a single pseudodiploid sample from genotype data, set outgroupmode = TRUE
and apply_corr = FALSE
. Under this combination of parameters, f3(A; B, C)
should also be identical to f4(A, B; A, C)
, since f4-statistics generally do not require any bias correction.
See examples
for more information.
qp3pop
returns a data frame with f3 statistics, with columns for populations 1 to 3,
f3-estimate (est
), standard error of the estimate (se
), z-score (z
, est
/se
), p-value (2*(1-pnorm(z))
),
and the number of SNPs used (n
; only if first argument is genotype prefix)
f3
Patterson, N. et al. (2012) Ancient admixture in human history Genetics
Peter, B. (2016) Admixture, Population Structure, and F-Statistics Genetics
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') qp3pop(example_f2_blocks, pop1, pop2, pop3) pops = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') qp3pop(example_f2_blocks, pops) qp3pop(example_f2_blocks, pops, unique_only = FALSE) qp3pop(f2_dir, pop1, pop2, pop3) # Below are three scenarios, and in each one `qp3pop()` and `qp3pop_wrapper()` # should give the same or very similar estimates. Note that to compute `f3(A; B, C)`, # `qp3pop_wrapper()`, and the original qp3pop program, expect populations to be in the order `B`, `C`, `A`. prefix = '/path/to/geno/prefix' qp3popbin = '/path/to/AdmixTools/bin/qp3Pop' pops = dimnames(example_f2_blocks)[[1]] # 1. target diploid, outgroupmode NO (this is the default when passing a geno file prefix) qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = FALSE) qp3pop(prefix, pops[1], pops[2], pops[3]) # est, se, z match well; n is higher qp3pop(prefix, pops[1], pops[2], pops[3], poly_only = TRUE) # est, se, z match less well; n is identical # 2. target diploid, outgroupmode YES (this is the only option with precomputed f2-stats) qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = TRUE) qp3pop(prefix, pops[1], pops[2], pops[3], outgroupmode = TRUE) # est, se, z match (except for factor 1000) f2b = f2_from_geno(prefix, pops = pops[1:3], poly_only = FALSE) qp3pop(f2b, pops[1], pops[2], pops[3]) # 3. target pseudodiploid (no heterozygotes means heterozygosity rate correction is not possible) qp3pop_wrapper(prefix, pops[1], pops[3], pops[2], bin = qp3popbin, outgroupmode = TRUE) qp3pop(prefix, pops[2], pops[1], pops[3], outgroupmode = TRUE, apply_corr = FALSE) # est, se, z match (except for factor 1000) ## End(Not run)
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') qp3pop(example_f2_blocks, pop1, pop2, pop3) pops = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') qp3pop(example_f2_blocks, pops) qp3pop(example_f2_blocks, pops, unique_only = FALSE) qp3pop(f2_dir, pop1, pop2, pop3) # Below are three scenarios, and in each one `qp3pop()` and `qp3pop_wrapper()` # should give the same or very similar estimates. Note that to compute `f3(A; B, C)`, # `qp3pop_wrapper()`, and the original qp3pop program, expect populations to be in the order `B`, `C`, `A`. prefix = '/path/to/geno/prefix' qp3popbin = '/path/to/AdmixTools/bin/qp3Pop' pops = dimnames(example_f2_blocks)[[1]] # 1. target diploid, outgroupmode NO (this is the default when passing a geno file prefix) qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = FALSE) qp3pop(prefix, pops[1], pops[2], pops[3]) # est, se, z match well; n is higher qp3pop(prefix, pops[1], pops[2], pops[3], poly_only = TRUE) # est, se, z match less well; n is identical # 2. target diploid, outgroupmode YES (this is the only option with precomputed f2-stats) qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = TRUE) qp3pop(prefix, pops[1], pops[2], pops[3], outgroupmode = TRUE) # est, se, z match (except for factor 1000) f2b = f2_from_geno(prefix, pops = pops[1:3], poly_only = FALSE) qp3pop(f2b, pops[1], pops[2], pops[3]) # 3. target pseudodiploid (no heterozygotes means heterozygosity rate correction is not possible) qp3pop_wrapper(prefix, pops[1], pops[3], pops[2], bin = qp3popbin, outgroupmode = TRUE) qp3pop(prefix, pops[2], pops[1], pops[3], outgroupmode = TRUE, apply_corr = FALSE) # est, se, z match (except for factor 1000) ## End(Not run)
Computes f3 statistics of the form . Equivalent to
and to
. Requires a working installation of qp3Pop, which will be called
using
system
qp3pop_wrapper( pref, source1, source2 = NULL, target = NULL, bin = "~np29/o2bin/qp3Pop", outdir = ".", parfile = NULL, inbreed = "NO", outgroupmode = "NO", printonly = FALSE, env = "", verbose = TRUE )
qp3pop_wrapper( pref, source1, source2 = NULL, target = NULL, bin = "~np29/o2bin/qp3Pop", outdir = ".", parfile = NULL, inbreed = "NO", outgroupmode = "NO", printonly = FALSE, env = "", verbose = TRUE )
pref |
Path to and prefix of the packedancestrymap genotype files |
source1 |
One of the following four:
|
source2 |
A vector of population labels |
target |
A vector of population labels |
bin |
Path to the qp3Pop binary file |
outdir |
Output directory. files |
parfile |
qp3Pop parameter file. If this is specified, |
inbreed |
inbreed |
outgroupmode |
outgroupmode |
printonly |
Should the command be printed or executed? |
env |
Export environmental variables. See examples. |
verbose |
Print progress updates |
If printonly
, the qp3Pop
command, otherwise a data frame with parsed qp3Pop
output
## Not run: source1 = c('Altai_Neanderthal.DG', 'Vindija.DG') source2 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') target = 'Denisova.DG' qp3pop_wrapper('genotype_prefix', source1, source2, target, bin = 'path/to/qp3Pop') ## End(Not run)
## Not run: source1 = c('Altai_Neanderthal.DG', 'Vindija.DG') source2 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') target = 'Denisova.DG' qp3pop_wrapper('genotype_prefix', source1, source2, target, bin = 'path/to/qp3Pop') ## End(Not run)
qpadm
models a target population as a mixture of left (source) populations, given a set of right (outgroup) populations.
It can be used to estimate whether the left populations explain all genetic variation in the target population, relative to the right populations, and to estimate admixture proportions of the left populations to the target population.
qpadm( data, left, right, target, f4blocks = NULL, fudge = 1e-04, fudge_twice = FALSE, boot = FALSE, getcov = TRUE, constrained = FALSE, return_f4 = FALSE, cpp = TRUE, verbose = TRUE, ... )
qpadm( data, left, right, target, f4blocks = NULL, fudge = 1e-04, fudge_twice = FALSE, boot = FALSE, getcov = TRUE, constrained = FALSE, return_f4 = FALSE, cpp = TRUE, verbose = TRUE, ... )
data |
The input data in the form of:
|
left |
Left populations (sources) |
right |
Right populations (outgroups) |
target |
Target population |
f4blocks |
Instead of f2 blocks, f4 blocks can be supplied. This is used by |
fudge |
Value added to diagonal matrix elements before inverting |
fudge_twice |
Setting this to |
boot |
If |
getcov |
Compute weights covariance. Setting |
constrained |
Constrain admixture weights to be non-negative |
return_f4 |
Return f4-statistics |
cpp |
Use C++ functions. Setting this to |
verbose |
Print progress updates |
... |
If |
qpadm
returns a list with up to four data frames describing the model fit:
weights
: A data frame with estimated admixture proportions where each row is a left population.
f4
: A data frame with estimated and fitted f4-statistics
rankdrop
: A data frame describing model fits with different ranks, including p-values for the overall fit
and for nested models (comparing two models with rank difference of one). A model with L
left populations and R
right populations has an f4-matrix of dimensions (L-1)*(R-1)
. If no two left population form a clade with respect to all right populations, this model will have rank (L-1)*(R-1)
.
f4rank
: Tested rank
dof
: Degrees of freedom of the chi-squared null distribution: (L-1-f4rank)*(R-1-f4rank)
chisq
: Chi-sqaured statistic, obtained as E'QE
, where E
is the difference between estimated and fitted f4-statistics, and Q
is the f4-statistic covariance matrix.
p
: p-value obtained from chisq
as pchisq(chisq, df = dof, lower.tail = FALSE)
dofdiff
: Difference in degrees of freedom between this model and the model with one less rank
chisqdiff
: Difference in chi-squared statistics
p_nested
: p-value testing whether the difference between two models of rank difference 1 is significant
popdrop
: A data frame describing model fits with different populations. Note that all models with fewer populations use the same set of SNPs as the first model.
pat
: A binary code indicating which populations are present in this model. A 1
represents dropped populations. The full model is all zeros.
wt
: Number of populations dropped
dof
: Degrees of freedom of the chi-squared null distribution: (L-1-f4rank)*(R-1-f4rank)
chisq
: Chi-sqaured statistic, obtained as E'QE
, where E
is the difference between estimated and fitted f4-statistics, and Q
is the f4-statistic covariance matrix.
p
: p-value obtained from chisq
as pchisq(chisq, df = dof, lower.tail = FALSE)
f4rank
: Tested rank
feasible
: A model is feasible if all weights fall between 0 and 1
<population name>
: The weights for each population in this model
Haak, W. et al. (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature (SI 10)
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm(example_f2_blocks, left, right, target) ## Not run: # The original ADMIXTOOLS qpAadm program has an option called "allsnps" # that selects different SNPs for each f4-statistic, which is # useful when working with sparse genotype data. # To get the same behavior in ADMIXTOOLS 2, supply the genotype data prefix # and set `allsnps = TRUE` qpadm("/my/geno/prefix", left, right, target, allsnps = TRUE) ## End(Not run)
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm(example_f2_blocks, left, right, target) ## Not run: # The original ADMIXTOOLS qpAadm program has an option called "allsnps" # that selects different SNPs for each f4-statistic, which is # useful when working with sparse genotype data. # To get the same behavior in ADMIXTOOLS 2, supply the genotype data prefix # and set `allsnps = TRUE` qpadm("/my/geno/prefix", left, right, target, allsnps = TRUE) ## End(Not run)
Partition a list of populations into left and right populations
qpadm_models(pops, allpops = TRUE, more_right = TRUE)
qpadm_models(pops, allpops = TRUE, more_right = TRUE)
pops |
A vector of populations |
allpops |
Return only models which use all provided populations |
more_right |
Return only models with more right than left populations |
A data frame with one qpadm model per row
qpadm_models
, graph_f2_function
## Not run: graph_to_qpadm(get_leafnames(example_igraph)) ## End(Not run)
## Not run: graph_to_qpadm(get_leafnames(example_igraph)) ## End(Not run)
For large admixturegraph, there may be a large number of valid qpAdm models!
qpadm_models_old(graph, add_outgroup = FALSE, nested = TRUE, abbr = -1)
qpadm_models_old(graph, add_outgroup = FALSE, nested = TRUE, abbr = -1)
graph |
An admixture graph as igraph object |
add_outgroup |
Should the graph outgroup be added to the qpAdm right populations? Technically this shouldn't be an informative outgroup for qpAdm. |
nested |
Should a nested data frame be returned ( |
abbr |
Maximum number of characters to print for each population. The default (-1) doesn't abbreviate the names. |
## Not run: qpadm_models_old(igraph2, add_outgroup = TRUE) qpadm_models_old(igraph2, add_outgroup = TRUE) %>% slice(1) %$% list(target, left, right) ## End(Not run)
## Not run: qpadm_models_old(igraph2, add_outgroup = TRUE) qpadm_models_old(igraph2, add_outgroup = TRUE) %>% slice(1) %$% list(target, left, right) ## End(Not run)
This function runs multiple qpadm models, re-using f4-statistics where possible. Supports parallel evaluation of models, which can be turned on with future::plan('multisession')
or similar, and turned off with future::plan('sequential')
.
qpadm_multi( data, models, allsnps = FALSE, full_results = TRUE, verbose = TRUE, ... )
qpadm_multi( data, models, allsnps = FALSE, full_results = TRUE, verbose = TRUE, ... )
data |
The input data in the form of:
|
models |
A nested list (or data frame) with qpadm models. It should consist of two or three other named lists (or columns) containing |
allsnps |
Use all SNPs with allele frequency estimates in every population of any given population quadruple. If |
verbose |
Print progress updates |
... |
Further arguments passed to |
A list where each element is the output of one qpadm model.
## Not run: # the following specifies two models: one with 2/3/1 and one with 1/2/1 left/right/target populations models = tibble( left = list(c('pop1', 'pop2'), c('pop3')), right = list(c('pop5', 'pop6', 'pop7'), c('pop7', 'pop8')), target = c('pop10', 'pop10')) results = qpadm_multi('/my/geno/prefix', models) ## End(Not run)
## Not run: # the following specifies two models: one with 2/3/1 and one with 1/2/1 left/right/target populations models = tibble( left = list(c('pop1', 'pop2'), c('pop3')), right = list(c('pop5', 'pop6', 'pop7'), c('pop7', 'pop8')), target = c('pop10', 'pop10')) results = qpadm_multi('/my/geno/prefix', models) ## End(Not run)
qpadm
with reduced outputModels target as a mixture of left populations, given a set of outgroup right populations. Can be used to estimate admixture proportions, and to estimate the number of independent admixture events.
qpadm_p( f2_data, left, right, target = NULL, fudge = 1e-04, boot = FALSE, constrained = FALSE, rnk = length(setdiff(left, target)) - 1, cpp = TRUE, weights = FALSE, f4blocks = NULL )
qpadm_p( f2_data, left, right, target = NULL, fudge = 1e-04, boot = FALSE, constrained = FALSE, rnk = length(setdiff(left, target)) - 1, cpp = TRUE, weights = FALSE, f4blocks = NULL )
left |
Left populations (sources) |
right |
Right populations (outgroups) |
target |
Target population |
fudge |
Value added to diagonal matrix elements before inverting |
boot |
If |
constrained |
Constrain admixture weights to be non-negative |
rnk |
Rank of f4-matrix. Defaults to one less than full rank. |
cpp |
Use C++ functions. Setting this to |
weights |
Return weights (default = |
f4blocks |
Instead of f2 blocks, f4 blocks can be supplied. This is used by |
Data frame with f4rank
, dof
, chisq
, p
, feasible
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm_p(example_f2_blocks, left, right, target)
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm_p(example_f2_blocks, left, right, target)
This functions evaluates many qpadm models simultaneously by keeping the target population
and the rightfix
populations fixed, and distributing the leftright
populations by keeping some
in the set of left population and adding the remaining populations to the right populations.
(See details for an example of how models are generated)
qpadm_rotate( f2_blocks, leftright, target, rightfix = NULL, full_results = FALSE, verbose = TRUE )
qpadm_rotate( f2_blocks, leftright, target, rightfix = NULL, full_results = FALSE, verbose = TRUE )
f2_blocks |
3d array of blocked f2 statistics, output of |
leftright |
Populations which will be distributed between left and right |
target |
Target population |
rightfix |
Populations which will be on the right side in all models |
full_results |
Return all output items which are returned by |
verbose |
Print progress updates |
If leftright
consists of the populations L1, L2, L3, L4; rightfix
is the population R; and target
is T,
the following models will be genrated:
(left), (right), (target)
(L1), (L2, L3, L4, R), (T)
(L2), (L1, L3, L4, R), (T)
(L3), (L1, L2, L4, R), (T)
(L4), (L1, L2, L3, R), (T)
(L1, L2), (L3, L4, R), (T)
(L1, L3), (L2, L4, R), (T)
(L1, L4), (L2, L3, R), (T)
(L2, L3), (L1, L4, R), (T)
(L2, L4), (L1, L3, R), (T)
(L3, L4), (L1, L2, R), (T)
A data frame with Chi-squared statistics and p-values for each population combination
## Not run: pops = dimnames(example_f2_blocks)[[1]] qpadm_rotate(example_f2_blocks, leftright = pops[1:4], target = pops[5], rightfix = pops[6:7]) ## End(Not run)
## Not run: pops = dimnames(example_f2_blocks)[[1]] qpadm_rotate(example_f2_blocks, leftright = pops[1:4], target = pops[5], rightfix = pops[6:7]) ## End(Not run)
This requires a working installation of qpAdm, which will be called using system
qpadm_wrapper( pref, left, right, target = NULL, bin = "~np29/o2bin/qpAdm", outdir = "./", parfile = NULL, allsnps = "NO", blgsize = 0.05, fancyf4 = "NO", f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
qpadm_wrapper( pref, left, right, target = NULL, bin = "~np29/o2bin/qpAdm", outdir = "./", parfile = NULL, allsnps = "NO", blgsize = 0.05, fancyf4 = "NO", f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
pref |
Path to and prefix of the packedancestrymap genotype files |
left |
Left populations (or leftlist file) |
right |
Right populations (or rightlist file) |
target |
Target population |
bin |
Path to the qpAdm binary file |
outdir |
Output directory. files |
parfile |
qpAdm parameter file |
allsnps |
allsnps |
blgsize |
blgsize |
fancyf4 |
fancyf4 |
f4mode |
f4mode |
inbreed |
inbreed |
printonly |
Should the command be printed or executed? |
env |
Export environmental variables. See examples. |
verbose |
Print progress updates |
If not printonly, a data frame with parsed qpAdm output
## Not run: left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm_wrapper('genotype_prefix', left, right, target, bin = 'path/to/qpAdm') ## End(Not run)
## Not run: left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') target = 'Denisova.DG' qpadm_wrapper('genotype_prefix', left, right, target, bin = 'path/to/qpAdm') ## End(Not run)
Computes f4-statistics of the form . For allele frequencies
a
, b
, c
, d
, f4(A, B; C, D)
is computed as the average of (a-b)*(c-d)
over all SNPs in each SNP block. This is equivalent to (assuming no missing data). The input of this function can either be a 3d array of f2-statistics generated by
f2_from_precomp
or f2_from_geno
, a directory with f2-statistics, or the prefix of genotype files. Computing f4 from genotype files directly is slower, but provides more flexibility in dealing with missing data (see details).
qpdstat( data, pop1 = NULL, pop2 = NULL, pop3 = NULL, pop4 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, comb = TRUE, blgsize = NULL, block_lengths = NULL, f4mode = TRUE, afprod = TRUE, cpp = TRUE, verbose = is.character(data), ... )
qpdstat( data, pop1 = NULL, pop2 = NULL, pop3 = NULL, pop4 = NULL, boot = FALSE, sure = FALSE, unique_only = TRUE, comb = TRUE, blgsize = NULL, block_lengths = NULL, f4mode = TRUE, afprod = TRUE, cpp = TRUE, verbose = is.character(data), ... )
data |
Input data in one of three forms:
|
pop1 |
One of the following four:
|
pop2 |
A vector of population labels |
pop3 |
A vector of population labels |
pop4 |
A vector of population labels |
boot |
If |
sure |
The number of population combinations can get very large. This is a safety option that stops you from accidently computing all combinations if that number is large. |
unique_only |
If |
comb |
Generate all combinations of |
blgsize |
SNP block size in Morgan. Default is 0.05 (5 cM). Only used when |
block_lengths |
Vector with lengths of each jackknife block. |
f4mode |
Set this to |
afprod |
Compute f4 from allele frequency products instead of f2 (default |
cpp |
Use C++ functions. Setting this to |
verbose |
Print progress updates |
... |
Additional arguments passed to |
f4- and D-statistics are informative about how four populations are related to one another. Estimates of f4 are unbiased as long as assumptions about SNP ascertainment and mutation rates are met. Missing data can violate the ascertainment assumptions: an f4-statistic may be significantly different when it is calculated from all non-missing SNPs, compared to what it would be if it were calculated from all SNPs in the genome. However, because this difference is often small, f4 is often calculated using samples or populations with missing data, on a particular subset of all SNPs. There are different strategies for choosing the SNPs in this case, and these strategies differ in how many SNPs they use, how likely they lead to bias, and whether pre-computed f2-statistics can be used.
Use the same SNPs for every f4-statistic
This is the most conservative option, but also the option which will use the smallest number of SNPs, which may result in a lack of power. It is the default option when pre-computing f2-statistics (maxmiss = 0
in extract_f2
or f2_from_geno
).
Use different SNPs for each f4-statistic
This option strikes a balance between avoiding bias and using a larger number of SNPs. For each f4-statistic it selects all SNPs which are present in all four populations. This option only works when the first argument is a genotype prefix (it doesn't work with pre-computed f2-statistics). In that case, this option is used by default. To turn it off and instead use the same SNPs for every f4-statistic, set allsnps = FALSE
. This option is the default option in the original qpDstat program (it is in fact the only option for selecting SNPs in the original qpDstat; however in the original qpAdm and qpGraph programs, this mode of selecting SNPs can be enabled with the allsnps
option, whereas the default mode in qpAdm and qpGraph is equivalent to maxmiss = 0
)
Use different SNPs for each f2-statistic
This is the least conservative option, but also the option which uses most of the available information. It makes it possible to use pre-computed f2-statistics for a large number of populations without losing a large number of SNPs. To use this option, set the maxmiss
parameter to a value greater than 0 (and not larger than 1) in extract_f2
or f2_from_geno
. When using this option, be aware that bias is possible, in particular for f4-statistics where some populations have large amounts of missing data. To reduce the bias that can result from using this option, you may want to combine it with using the option afprod = TRUE
in f2_from_precomp
.
In summary, whenever you work with populations with missing data, there is no guarantee that f4- or D-statistics involving these populations are not skewed in some way. If you choose to analyze these populations anyway, and you decide which SNPs to use, there is a trade-off between maximizing power and minimizing the risk of bias. One strategy might be to first use the least conservative option (setting maxmiss = 1
in extract_f2
) to get an overview, and then spot-check individual results using more conservative options.
qpdstat
returns a data frame with f4 statistics, with columns for populations 1 to 4,
f4-estimate (est
), standard error of the estimate (se
), z-score (z
, est
/se
), p-value (2*(1-pnorm(z))
),
and the number of SNPs used (n
; only if first argument is genotype prefix)
f4
Patterson, N. et al. (2012) Ancient admixture in human history Genetics
Peter, B. (2016) Admixture, Population Structure, and F-Statistics Genetics
pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') pop4 = 'Switzerland_Bichon.SG' qpdstat(example_f2_blocks, pop1, pop2, pop3, pop4) ## Not run: qpdstat(f2_dir, pop1, pop2, pop3, pop4) ## End(Not run) ## Not run: # compute D-statistics instead qpdstat("/geno/prefix", pop1, pop2, pop3, pop4) ## End(Not run) ## Not run: # make a data frame with the population combinations for which f4 should be computed combinations = tibble(pop1 = pop1, pop2 = pop2[1], pop3 = pop3, pop4 = pop4) qpdstat(example_f2_blocks, combinations) ## End(Not run)
pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') pop4 = 'Switzerland_Bichon.SG' qpdstat(example_f2_blocks, pop1, pop2, pop3, pop4) ## Not run: qpdstat(f2_dir, pop1, pop2, pop3, pop4) ## End(Not run) ## Not run: # compute D-statistics instead qpdstat("/geno/prefix", pop1, pop2, pop3, pop4) ## End(Not run) ## Not run: # make a data frame with the population combinations for which f4 should be computed combinations = tibble(pop1 = pop1, pop2 = pop2[1], pop3 = pop3, pop4 = pop4) qpdstat(example_f2_blocks, combinations) ## End(Not run)
This requires a working installation of qpDstat, which will be called using system
qpdstat_wrapper( pref, pop1, pop2 = NULL, pop3 = NULL, pop4 = NULL, bin = "~np29/o2bin/qpDstat", outdir = ".", parfile = NULL, f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
qpdstat_wrapper( pref, pop1, pop2 = NULL, pop3 = NULL, pop4 = NULL, bin = "~np29/o2bin/qpDstat", outdir = ".", parfile = NULL, f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
pref |
Path to and prefix of the packedancestrymap genotype files |
pop1 |
One of the following four:
|
pop2 |
A vector of population labels |
pop3 |
A vector of population labels |
pop4 |
A vector of population labels |
bin |
Path to the qpDstat binary file |
outdir |
Output directory. files |
parfile |
qpDstat parameter file. If this is specified, |
f4mode |
f4mode |
inbreed |
inbreed |
printonly |
Should the command be printed or executed? |
env |
Export environmental variables. See examples. |
verbose |
Print progress updates |
If printonly
, the qpDstat
command, otherwise a data frame with parsed qpDstat
output
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') pop4 = 'Switzerland_Bichon.SG' qpdstat_wrapper('genotype_prefix', pop1, pop2, pop3, pop4, bin = 'path/to/qpDstat', pref = 'path/to/packedancestrymap_prefix') ## End(Not run)
## Not run: pop1 = 'Denisova.DG' pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG') pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG') pop4 = 'Switzerland_Bichon.SG' qpdstat_wrapper('genotype_prefix', pop1, pop2, pop3, pop4, bin = 'path/to/qpDstat', pref = 'path/to/packedancestrymap_prefix') ## End(Not run)
Estimate f4 differences
qpf4diff(data, pops, boot = FALSE, verbose = FALSE)
qpf4diff(data, pops, boot = FALSE, verbose = FALSE)
data |
Input data in one of three forms:
|
pops |
A vector of 5 populations or a five column population matrix.
The following ratios will be computed: |
boot |
If |
verbose |
Print progress updates |
qpf4diff
returns a data frame with f4 ratios
Estimate admixture proportions via f4 ratios
qpf4ratio(data, pops, boot = FALSE, verbose = FALSE)
qpf4ratio(data, pops, boot = FALSE, verbose = FALSE)
data |
Input data in one of three forms:
|
pops |
A vector of 5 populations or a five column population matrix.
The following ratios will be computed: |
boot |
If |
verbose |
Print progress updates |
qpf4ratio
returns a data frame with f4 ratios
This requires a working installation of qpF4ratio, which will be called using system
qpf4ratio_wrapper( pref, pops, bin = "~np29/o2bin/qpF4ratio", outdir = ".", parfile = NULL, blgsize = 0.05, fancyf4 = "YES", printonly = FALSE, env = "", verbose = TRUE )
qpf4ratio_wrapper( pref, pops, bin = "~np29/o2bin/qpF4ratio", outdir = ".", parfile = NULL, blgsize = 0.05, fancyf4 = "YES", printonly = FALSE, env = "", verbose = TRUE )
pref |
Path to and prefix of the packedancestrymap genotype files |
pops |
A vector of five populations, or a 5 x n matrix with population names. For each line |
bin |
Path to the qpF4ratio binary file |
outdir |
Output directory. files |
parfile |
qpF4ratio parameter file. If this is specified, |
blgsize |
blgsize |
fancyf4 |
fancyf4 |
printonly |
Should the command be printed or executed? |
env |
Export environmental variables. See examples. |
verbose |
Print progress updates |
If printonly
, the qpF4ratio
command, otherwise a data frame with parsed qpF4ratio
output
## Not run: pops = c('Denisova.DG', 'Altai_Neanderthal.DG', 'Vindija.DG', 'Chimp.REF', 'Mbuti.DG') qpf4ratio_wrapper('genotype_prefix', pops, bin = 'path/to/qpDstat') ## End(Not run)
## Not run: pops = c('Denisova.DG', 'Altai_Neanderthal.DG', 'Vindija.DG', 'Chimp.REF', 'Mbuti.DG') qpf4ratio_wrapper('genotype_prefix', pops, bin = 'path/to/qpDstat') ## End(Not run)
This function returns an array of (pseudo-) f2-statistics which are computed by
taking into account other f2-, f3-, and f4-statistics. The advantage of doing that
is that f3- and f4-statistics computed from these smoothed f2-statistics can be
more accurate for populations with large amounts of missing data. The function
uses SNPs which are missing in some populations in a manner which tends to introduce
less bias than setting maxmiss
to values greater than 0.
qpfstats( pref, pops, include_f2 = TRUE, include_f3 = TRUE, include_f4 = TRUE, verbose = TRUE )
qpfstats( pref, pops, include_f2 = TRUE, include_f3 = TRUE, include_f4 = TRUE, verbose = TRUE )
pref |
Prefix of genotype files |
pops |
Populations for which to compute f2-statistics |
include_f2 |
Should f2-statistics be used to get smoothed f2-statistics?
If |
include_f3 |
Should f3-statistics be used to get smoothed f2-statistics?
If |
include_f4 |
Should f4-statistics be used to get smoothed f2-statistics?
If |
A 3d-array of smoothed f2-statistics
## Not run: f2_blocks = qpfstats(geno_prefix, mypops) ## End(Not run)
## Not run: f2_blocks = qpfstats(geno_prefix, mypops) ## End(Not run)
Computes the fit of a given admixturegraph from f2-statistics. Drift edge weights and admixture edges weights are optimized until the (negative) likelihood score is minimized. The likelihood score is based on the squared difference between estimated and fitted f3-statistics.
qpgraph( data, graph, lambdascale = 1, boot = FALSE, diag = 1e-04, diag_f3 = 1e-05, lsqmode = FALSE, numstart = 10, seed = NULL, cpp = TRUE, return_fstats = FALSE, return_pvalue = FALSE, f3precomp = NULL, f3basepop = NULL, constrained = TRUE, allsnps = FALSE, ppinv = NULL, f2_blocks_test = NULL, verbose = FALSE )
qpgraph( data, graph, lambdascale = 1, boot = FALSE, diag = 1e-04, diag_f3 = 1e-05, lsqmode = FALSE, numstart = 10, seed = NULL, cpp = TRUE, return_fstats = FALSE, return_pvalue = FALSE, f3precomp = NULL, f3basepop = NULL, constrained = TRUE, allsnps = FALSE, ppinv = NULL, f2_blocks_test = NULL, verbose = FALSE )
data |
Input data in one of three forms:
|
graph |
An admixture graph represented as a matrix of edges, an |
lambdascale |
Scales f2-statistics. This has no effect on the fit, but is used in the original qpGraph program to display branch weights on a scale that corresponds to FST distances. |
boot |
If |
diag |
Regularization term added to the diagonal elements of the covariance matrix of fitted branch lengths (after scaling by the matrix trace). Default is 0.0001. |
diag_f3 |
Regularization term added to the diagonal elements of the covariance matrix of estimated f3 statistics (after scaling by the matrix trace). In the original qpGraph program, this is fixed at 0.00001. |
lsqmode |
Least-squares mode. If
Both of these options do not take the covariance of f3-statistics into account. This can lead to bias, but is more stable in cases where the inverse f3-statistics covariance matrix can not be estimated precisely (for example because the number of populations is large). An alternative to |
numstart |
Number of random initializations of starting weights. Defaults to 10. Increasing this number will make the optimization slower, but reduce the risk of not finding the optimal weights. Check the |
seed |
Random seed for generating starting weights. |
cpp |
Use C++ functions. Setting this to |
return_fstats |
Return estimated and fitted f2- and f4-statistics, as well as the worst f4-statistic residual Z-score. Defaults to |
f3precomp |
Optional precomputed f3-statistics. This should be the output of |
f3basepop |
Optional f3-statistics base population. Inference will be based on f3-statistics of the form |
constrained |
Constrain estimated drift edge weights to be non-negative, and admixture edge weights to be between zero and one. |
allsnps |
Compute f3 from different SNPs for each population triplet (if data is missing for some SNPs and populations). This only has an effect when |
ppinv |
Optional inverse f3-statistics covariance matrix |
f2_blocks_test |
An optional 3d array of f2-statistics used for computing an out-of-sample score. This should contain only SNP blocks which are not part of |
verbose |
Print progress updates |
qpgraph
returns a list with data describing the model fit:
edges
: A data frame where each row is an edge in the graph. For regular edges,
the column weight
is the estimated edge length, and for admixture edges, it is the estimated admixture weight.
score
: The likelihood score of the fitted graph. Lower values correspond to better fits.
The score is calculated as the inner product of the residuals (difference between estimated and
fitted f3 statistics), weighted by the inverse of the f3 covariance matrix. See qpgraph_score
f2
: Estimated and fitted f2 statistics (if return_fstats = TRUE
). p-values and z-scores test the significance of the difference.
f3
: Estimated and fitted f3 statistics. p-values and z-scores test the significance of the difference.
f4
: Estimated and fitted f4 statistics (if return_fstats = TRUE
). p-values and z-scores test the significance of the difference.
opt
: A data frame with details of the weight-fitting step, including the randomly sampled starting weights. The column value
contains the score for each set of starting weights. Columns starting with x
denote initial weights, and columns starting with y
denote fitted weights.
worst_residual
: The highest absolute z-score of f4-statistics residuals (fitted - estimated f4); (returned if return_fstats = TRUE
)
Patterson, N. et al. (2012) Ancient admixture in human history. Genetics
qpgraph_wrapper
for a wrapper functions which calls the original qpGraph program.
out = qpgraph(example_f2_blocks, example_graph) plot_graph(out$edges)
out = qpgraph(example_f2_blocks, example_graph) plot_graph(out$edges)
Takes a 3d array of f2 block jackknife estimates and computes f3-statistics between the
first population and all population pairs
:
qpgraph_precompute_f3( data, pops, f3basepop = NULL, lambdascale = 1, boot = FALSE, seed = NULL, diag_f3 = 1e-05, lsqmode = FALSE )
qpgraph_precompute_f3( data, pops, f3basepop = NULL, lambdascale = 1, boot = FALSE, seed = NULL, diag_f3 = 1e-05, lsqmode = FALSE )
data |
Input data in one of three forms:
|
pops |
Populations for which to compute f3-statistics |
f3basepop |
f3-statistics base population. If |
lambdascale |
Scales f2-statistics. This has no effect on the fit, but is used in the original qpGraph program to display branch weights on a scale that corresponds to FST distances. |
boot |
If |
seed |
Random seed used if |
diag_f3 |
Regularization term added to the diagonal elements of the covariance matrix of estimated f3 statistics (after scaling by the matrix trace). In the original qpGraph program, this is fixed at 0.00001. |
lsqmode |
Least-squares mode. If |
A list with four items
f3_est
a matrix with f3-statistics for all population pairs with the output
ppinv
a matrix with the inverse of the f3-statistic covariance matrix
f2out
a data frame with f2 estimates
f3out
a data frame with f3 estimates
pops = get_leafnames(example_igraph) qpgraph_precompute_f3(example_f2_blocks, pops)$f3_est ## Not run: qpgraph_precompute_f3(f2_dir, pops) ## End(Not run)
pops = get_leafnames(example_igraph) qpgraph_precompute_f3(example_f2_blocks, pops)$f3_est ## Not run: qpgraph_precompute_f3(f2_dir, pops) ## End(Not run)
This function is used in combination with compare_fits
in order to test whether one graph has a significantly better fit than another. It creates bootstrap resampled SNP block training and test sets, and uses them to evaluate multiple graphs.
qpgraph_resample_multi(f2_blocks, graphlist, nboot, verbose = TRUE, ...)
qpgraph_resample_multi(f2_blocks, graphlist, nboot, verbose = TRUE, ...)
f2_blocks |
3d array of f2-statistics |
graphlist |
A list of admixture graphs |
nboot |
Number of bootstrap iterations |
verbose |
Print progress updates |
... |
Arguments passed to |
A list of same length as graphlist
with data frames with qpgraph
results for each iteration of bootstrap resampled f2-statistics
## Not run: fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100) compare_fits(fits[[1]]$score_test, fits[[2]]$score_test) ## End(Not run)
## Not run: fits = qpgraph_resample_multi(f2_blocks, list(graph1, graph2), nboot = 100) compare_fits(fits[[1]]$score_test, fits[[2]]$score_test) ## End(Not run)
This function is used in combination with compare_fits
in order to test whether one graph has a significantly better fit than another. using a list of bootstrap resampled f2-statistics and corresponding test-set f2-statistics
qpgraph_resample_snps2(f2_blocks, graph, f2_blocks_test, verbose = TRUE, ...)
qpgraph_resample_snps2(f2_blocks, graph, f2_blocks_test, verbose = TRUE, ...)
f2_blocks |
A list of bootstrap resampled f2-statistics |
graph |
An admixture graph |
f2_blocks_test |
A list of f2-statistics from all blocks which were not used in the corresponding f2-array in |
verbose |
Print progress updates |
... |
Arguments passed to |
A data frame with qpgraph
results for each iteration of bootstrap resampled f2-statistics
Wrapper function around the original qpGraph program
qpgraph_wrapper( pref, graph, bin = "~np29/o2bin/qpGraph", parfile = NULL, outdir = ".", printonly = FALSE, badsnps = NULL, lambdascale = -1, inbreed = "NO", diag = 1e-04, blgsize = 0.05, outpop = "NULL", loadf3 = NULL, lsqmode = "NO", fstdmode = "NO", hires = "NO", forcezmode = "NO", zthresh = 0, allsnps = "NO", oldallsnps = "NO", doanalysis = "YES", bigiter = 100, initmix = 0, env = "", verbose = TRUE )
qpgraph_wrapper( pref, graph, bin = "~np29/o2bin/qpGraph", parfile = NULL, outdir = ".", printonly = FALSE, badsnps = NULL, lambdascale = -1, inbreed = "NO", diag = 1e-04, blgsize = 0.05, outpop = "NULL", loadf3 = NULL, lsqmode = "NO", fstdmode = "NO", hires = "NO", forcezmode = "NO", zthresh = 0, allsnps = "NO", oldallsnps = "NO", doanalysis = "YES", bigiter = 100, initmix = 0, env = "", verbose = TRUE )
pref |
Prefix of the packedancestrymap format genotype files. |
graph |
An admixture graph or qpGraph graph file |
bin |
Location of the qpGraph binary |
parfile |
qpGraph parameter file |
outdir |
Output directory |
printonly |
Should output be executed or the command just be printed? |
badsnps |
badsnps |
lambdascale |
lambdascale |
inbreed |
inbreed |
diag |
diag |
blgsize |
blgsize |
outpop |
outgroup population |
loadf3 |
loadf3 |
lsqmode |
least-squares mode. sets the offdiagonal elements of the block-jackknife covariance matrix to zero. |
fstdmode |
fstdmode |
hires |
hires |
forcezmode |
forcezmode |
zthresh |
zthresh |
allsnps |
allsnps |
oldallsnps |
oldallsnps |
doanalysis |
doanalysis |
bigiter |
bigiter |
initmix |
initmix |
env |
Export environmental variables. See examples. |
verbose |
Print progress updates |
A list with parsed qpGraph output
edges
: data frame
score
: scalar
f2
: data frame
## Not run: qpgraph_wrapper('genotype_prefix', example_graph, bin = 'path/to/qpGraph') ## End(Not run)
## Not run: qpgraph_wrapper('genotype_prefix', example_graph, bin = 'path/to/qpGraph') ## End(Not run)
qpwave
compares two sets of populations (left
and right
) to each other. It estimates a lower bound on the number of admixtue waves that went from left
into right
, by comparing a matrix of f4-statistics to low-rank approximations. For a rank of 0 this is equivalent to testing whether left
and right
form clades relative to each other.
qpwave( data, left, right, fudge = 1e-04, boot = FALSE, constrained = FALSE, cpp = TRUE, verbose = TRUE )
qpwave( data, left, right, fudge = 1e-04, boot = FALSE, constrained = FALSE, cpp = TRUE, verbose = TRUE )
data |
The input data in the form of:
|
left |
Left populations (sources) |
right |
Right populations (outgroups) |
fudge |
Value added to diagonal matrix elements before inverting |
boot |
If |
constrained |
Constrain admixture weights to be non-negative |
cpp |
Use C++ functions. Setting this to |
verbose |
Print progress updates |
qpwave
returns a list with up to two data frames describing the model fit:
f4
A data frame with estimated f4-statistics
rankdrop
: A data frame describing model fits with different ranks, including p-values for the overall fit
and for nested models (comparing two models with rank difference of one). A model with L
left populations and R
right populations has an f4-matrix of dimensions (L-1)*(R-1)
. If no two left population form a clade with respect to all right populations, this model will have rank (L-1)*(R-1)
.
f4rank
: Tested rank
dof
: Degrees of freedom of the chi-squared null distribution: (L-1-f4rank)*(R-1-f4rank)
chisq
: Chi-sqaured statistic, obtained as E'QE
, where E
is the difference between estimated and fitted f4-statistics, and Q
is the f4-statistic covariance matrix.
p
: p-value obtained from chisq
as pchisq(chisq, df = dof, lower.tail = FALSE)
dofdiff
: Difference in degrees of freedom between this model and the model with one less rank
chisqdiff
: Difference in chi-squared statistics
p_nested
: p-value testing whether the difference between two models of rank difference 1 is significant
Patterson, N. et al. (2012) Ancient admixture in human history. Genetics
Haak, W. et al. (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature (SI 10)
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') qpwave(example_f2_blocks, left, right)
left = c('Altai_Neanderthal.DG', 'Vindija.DG') right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG') qpwave(example_f2_blocks, left, right)
For all pairs of left populations qpwave rank 0 Chi-squared statistics and p-values will be computed. This tests for each pair of left populations whether it forms a clade with respect to the right populations.
qpwave_pairs(f2_data, left, right)
qpwave_pairs(f2_data, left, right)
f2_data |
A 3d array of blocked f2 statistics, output of |
left |
Left populations |
right |
Right populations |
## Not run: left = pops[5:7] right = pops[1:4] f2_blocks = f2_from_precomp('/my/f2/dir/', pops = left, pops2 = right, afprod = TRUE) qpwave_pairs(f2_blocks, left, right) # If f2-stats are too big to load them into memory, # the following will read the data for each pair from disk: qpwave_pairs('/my/f2/dir/', left, right) ## End(Not run)
## Not run: left = pops[5:7] right = pops[1:4] f2_blocks = f2_from_precomp('/my/f2/dir/', pops = left, pops2 = right, afprod = TRUE) qpwave_pairs(f2_blocks, left, right) # If f2-stats are too big to load them into memory, # the following will read the data for each pair from disk: qpwave_pairs('/my/f2/dir/', left, right) ## End(Not run)
Wrapper function around the original qpWave program
qpwave_wrapper( pref, left, right, target = NULL, bin = "~np29/o2bin/qpAdm", outdir = "./", parfile = NULL, allsnps = "NO", blgsize = 0.05, fancyf4 = "NO", f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
qpwave_wrapper( pref, left, right, target = NULL, bin = "~np29/o2bin/qpAdm", outdir = "./", parfile = NULL, allsnps = "NO", blgsize = 0.05, fancyf4 = "NO", f4mode = "YES", inbreed = "NO", printonly = FALSE, env = "", verbose = TRUE )
left |
Left populations (sources) |
right |
Right populations (outgroups) |
target |
Target population |
verbose |
Print progress updates |
This function randomly generates an admixture graph for a given set of leaf nodes
random_admixturegraph( leaves, numadmix = 0, simple = TRUE, outpop = NULL, nonzero_f4 = NULL, admix_constraints = NULL, event_order = NULL, ntry = 100 )
random_admixturegraph( leaves, numadmix = 0, simple = TRUE, outpop = NULL, nonzero_f4 = NULL, admix_constraints = NULL, event_order = NULL, ntry = 100 )
leaves |
Names of the leaf nodes, or a number specifying how many leaf nodes there should be |
numadmix |
Number of admixture events |
simple |
Should edges leading to admixture nodes consist of separate admix edges and normal edges |
outpop |
Outgroup population |
rand_graph = random_admixturegraph(10, numadmix = 5) plot_graph(rand_graph)
rand_graph = random_admixturegraph(10, numadmix = 5) plot_graph(rand_graph)
This function assigns a date (in generations) to each node in an admixture graph and is used in random_sim
.
The dates are drawn from a uniform distribution with given lower and upper bounds,
unless the fix_leaf
option is set to TRUE
, in which case the dates of all leaf nodes returned by get_leafnames
will be set to 0.
random_dates(graph, min = 1000, max = 1000, fix_leaf = FALSE)
random_dates(graph, min = 1000, max = 1000, fix_leaf = FALSE)
graph |
An admixture graph |
min |
A lower limit of uniform distribution |
max |
An upper limit of uniform distribution |
fix_leaf |
A boolean specifying if the dates of the leaf nodes will be fixed at time 0 (i.e., at the most recent time).
If |
A named vector with random dates for each graph node
Generate a random binary graph
random_newick(n, start = "", end = "", outpop = NULL)
random_newick(n, start = "", end = "", outpop = NULL)
n |
The number of terminal nodes, or a vector of population labels. |
start |
Prefix. |
end |
Postfix. |
outpop |
Outgroup (if |
Tree in newick format.
random_newick(5) random_newick(c('a', 'b', 'c', 'd')) # toplogy random, but pop order fixed random_newick(sample(c('a', 'b', 'c', 'd'))) # toplogy and pop order random
random_newick(5) random_newick(c('a', 'b', 'c', 'd')) # toplogy random, but pop order fixed random_newick(sample(c('a', 'b', 'c', 'd'))) # toplogy and pop order random
This is basically a wrapper function around the msprime_genome
that allows user to create a random graph and simulate it in msprime v1.x.
random_sim( nleaf, nadmix, outpref = "random_sim", max_depth = NULL, ind_per_pop = 1, mutation_rate = 1.25e-08, admix_weights = 0.5, neff = 1000, time = 1000, fix_leaf = FALSE, outpop = NULL, nchr = 1, recomb_rate = 2e-08, seq_length = 1000, ghost_lineages = TRUE, run = FALSE )
random_sim( nleaf, nadmix, outpref = "random_sim", max_depth = NULL, ind_per_pop = 1, mutation_rate = 1.25e-08, admix_weights = 0.5, neff = 1000, time = 1000, fix_leaf = FALSE, outpop = NULL, nchr = 1, recomb_rate = 2e-08, seq_length = 1000, ghost_lineages = TRUE, run = FALSE )
nleaf |
The number of leaf nodes |
nadmix |
The number of admixture events |
outpref |
A prefix of output files |
max_depth |
A constraint specifying the maximum time depth of the admixture graph (in generations) |
ind_per_pop |
The number of individuals to simulate for each population. If a scalar value, it will be constant across all populations. Alternatively, it can be a named vector with a different value for each population. |
mutation_rate |
Mutation rate per site per generation. The default is |
admix_weights |
Admixture weights. If a float value (0 < value < 1), admixture weights for each admixture event will be (value, 1-value).
Alternatively, it can be a range, i.e., |
neff |
Effective population size (in diploid individuals). If a scalar value, it will be constant across all populations. Alternatively, it can be a range, i.e., |
time |
Time between nodes. Either a scalar value (1000 by default) with the dates generated by |
fix_leaf |
A boolean specifying if the dates of the leaf nodes will be fixed at time 0. If |
outpop |
A name of the (optional) outgroup population. |
nchr |
The number of chromosomes to simulate |
recomb_rate |
A float value specifying recombination rate along the chromosomes. The default is |
seq_length |
The sequence length of the chromosomes. If it is a scalar value, the sequence length will be constant for all chromosomes.
Alternatively, it can be a vector with a length equal to number of chromosomes (i.e., |
ghost_lineages |
A boolean value specifying whether ghost lineages will be allowed.
If |
run |
If |
A list with the path of simulation script, a data frame of graph edges, dates and population sizes:
out - A file name and path of the simulation script
edges - An edge dataframe with admixture weights
dates - A named vector with a date for each node
neffs - A named vector with an effective population size for each node
# Create simulation script that simulates 2 chromosomes that are 50base long # where maximum depth of the tree is 5000 generations, and plot the output graph ## Not run: out = random_sim(nleaf=4, nadmix=0, max_depth=5000, nchr=2, seq_length=50) plot_graph(out$edges, dates = out$dates, neff = out$neff, hide_weights = TRUE) ## End(Not run)
# Create simulation script that simulates 2 chromosomes that are 50base long # where maximum depth of the tree is 5000 generations, and plot the output graph ## Not run: out = random_sim(nleaf=4, nadmix=0, max_depth=5000, nchr=2, seq_length=50) plot_graph(out$edges, dates = out$dates, neff = out$neff, hide_weights = TRUE) ## End(Not run)
Read genotype data from EIGENSTRAT files
read_eigenstrat( pref, inds = NULL, pops = NULL, first = 1, last = Inf, transpose = FALSE, verbose = TRUE )
read_eigenstrat( pref, inds = NULL, pops = NULL, first = 1, last = Inf, transpose = FALSE, verbose = TRUE )
pref |
Prefix of the packedancestrymap files |
inds |
Individuals for which data should be read. Defaults to all individuals |
pops |
Populations for which data should be read. Cannot be provided together with 'inds' |
first |
Index of first SNP to read |
last |
Index of last SNP to read |
transpose |
Transpose genotype matrix (default is |
verbose |
Print progress updates |
A list with the genotype data matrix, the .ind
file, and the .snp
file
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
This function reads blocked f2 estimates (or allele frequency products) which were writtend to disk by write_f2
and returns them as a 3d array.
read_f2( f2_dir, pops = NULL, pops2 = NULL, type = "f2", counts = FALSE, remove_na = TRUE, verbose = FALSE )
read_f2( f2_dir, pops = NULL, pops2 = NULL, type = "f2", counts = FALSE, remove_na = TRUE, verbose = FALSE )
f2_dir |
Directory from which to read files |
pops |
Populations for which f2 statistics should be read. Defaults to all populations, which may require a lot of memory. |
pops2 |
Specify this if you only want to read a subset of all population pairs. The resulting array will differ on 1st and 2nd dimension and will not work with all functions. |
counts |
Return allele counts instead of f2 estimates |
remove_na |
Remove blocks with missing values |
verbose |
Print progress updates |
afprod |
Return allele frequency products instead of f2 estimates |
A 3d array of block jackknife estimates
## Not run: read_f2(f2_dir, pops = c('pop1', 'pop2', 'pop3')) ## End(Not run)
## Not run: read_f2(f2_dir, pops = c('pop1', 'pop2', 'pop3')) ## End(Not run)
Read genotype data from packedancestrymap files
read_packedancestrymap( pref, inds = NULL, pops = NULL, first = 1, last = Inf, transpose = FALSE, verbose = TRUE )
read_packedancestrymap( pref, inds = NULL, pops = NULL, first = 1, last = Inf, transpose = FALSE, verbose = TRUE )
pref |
Prefix of the packedancestrymap files |
inds |
Individuals for which data should be read. Defaults to all individuals |
pops |
Populations for which data should be read. Cannot be provided together with 'inds' |
first |
Index of first SNP to read |
last |
Index of last SNP to read |
transpose |
Transpose genotype matrix (default is |
verbose |
Print progress updates |
A list with the genotype data matrix, the .ind
file, and the .snp
file
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
PLINK
filesSee genio for a dedicated R
package for
reading and writing PLINK
files. This function is based on a similar function in the plink2R
package.
read_plink(pref, inds = NULL, pops = NULL, verbose = FALSE)
read_plink(pref, inds = NULL, pops = NULL, verbose = FALSE)
pref |
Prefix of packedancestrymap files (files have to end in |
inds |
Individuals for which data should be read. Defaults to all individuals |
pops |
Populations for which data should be read. Cannot be provided together with 'inds' |
verbose |
Print progress updates |
A list with the genotype data matrix, the .ind
file, and the .snp
file
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
## Not run: samples = c('Ind1', 'Ind2', 'Ind3') geno = read_packedancestrymap(prefix, samples) ## End(Not run)
These are wrapper functions around various qp
functions, which will evaluate many models at once.
The models are formed by leaving out every individual which belongs to a population with more than one sample,
one at a time.
qp3pop_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpdstat_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpwave_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpadm_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpgraph_resample_inds(dir, inds, pops, verbose = TRUE, ...)
qp3pop_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpdstat_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpwave_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpadm_resample_inds(dir, inds, pops, verbose = TRUE, ...) qpgraph_resample_inds(dir, inds, pops, verbose = TRUE, ...)
dir |
directory with precomputed data |
inds |
vector of individual names |
pops |
vector of population names. Should be the same length as |
verbose |
print progress updates |
... |
named arguments passed to the |
a nested data frame where each model is a row, and the columns are model parameters and model outputs
## Not run: res = qpadm_resample_inds unnest(res, weights) ## End(Not run)
## Not run: res = qpadm_resample_inds unnest(res, weights) ## End(Not run)
These are wrapper functions around various qp
functions, which will evaluate a model on multiple subdivisions of the data.
The subdivisions are formed by leaving out one or more SNP block at a time (see boot
for details).
qp3pop_resample_snps(f2_blocks, boot = FALSE, ...) qpdstat_resample_snps(f2_blocks, boot = FALSE, ...) qpwave_resample_snps(f2_blocks, boot = FALSE, ...) qpadm_resample_snps(f2_blocks, boot = FALSE, ...) qpgraph_resample_snps(f2_blocks, boot = FALSE, ...)
qp3pop_resample_snps(f2_blocks, boot = FALSE, ...) qpdstat_resample_snps(f2_blocks, boot = FALSE, ...) qpwave_resample_snps(f2_blocks, boot = FALSE, ...) qpadm_resample_snps(f2_blocks, boot = FALSE, ...) qpgraph_resample_snps(f2_blocks, boot = FALSE, ...)
f2_blocks |
a 3d array of blocked f2 statistics |
boot |
If |
... |
named arguments which are passed to the |
verbose |
print progress updates |
a nested data frame where each model is a row, and the columns are model parameters and model outputs
## Not run: res = qpadm_resample_snps(example_f2_blocks, target = target, left = left, right = right) unnest(res, weights) ## End(Not run)
## Not run: res = qpadm_resample_snps(example_f2_blocks, target = target, left = left, right = right) unnest(res, weights) ## End(Not run)
This functions creates a data frame with population combinations which can be used as the input for qpadm_multi
rotate_models(leftright, target, rightfix = NULL)
rotate_models(leftright, target, rightfix = NULL)
leftright |
Populations which will be distributed between left and right |
target |
Target population |
rightfix |
Populations which will be on the right side in all models |
A data frame with Chi-squared statistics and p-values for each population combination
## Not run: pops = dimnames(example_f2_blocks)[[1]] rotate_models(leftright = pops[1:4], target = pops[5], rightfix = pops[6:7]) ## End(Not run)
## Not run: pops = dimnames(example_f2_blocks)[[1]] rotate_models(leftright = pops[1:4], target = pops[5], rightfix = pops[6:7]) ## End(Not run)
Launch ADMIXTOOLS 2 GUI
run_shiny_admixtools()
run_shiny_admixtools()
This function tests whether a graph satisfies a number of different types of constraints
satisfies_constraints( graph, nonzero_f4 = NULL, admix_constraints = NULL, event_order = NULL )
satisfies_constraints( graph, nonzero_f4 = NULL, admix_constraints = NULL, event_order = NULL )
graph |
An admixture graph |
nonzero_f4 |
A data frame or matrix with four columns. One row for each f4-statistic which is observed to be significantly non-zero |
admix_constraints |
A data frame with columns |
event_order |
A data frame with columns |
TRUE
if all admixture constraints are satisfied, else FALSE
satisfies_numadmix
, satisfies_zerof4
, satisfies_eventorder
## Not run:
# At least one admixture event for C, and none for D:
constrain_cd = tibble(pop = c('C', 'D'), min = c(1, NA), max = c(NA, 0))
satisfies_numadmix(random_admixturegraph(5, 2), constrain_cd)
## End(Not run)
This function returns TRUE
if and only if the admixture graph is compatible with
all event orders listed in eventorder
satisfies_eventorder(graph, eventorder, strict = TRUE)
satisfies_eventorder(graph, eventorder, strict = TRUE)
graph |
An admixture graph |
eventorder |
A data frame with columns |
strict |
What to do in case some events are not determined by the graph.
If |
Each row in eventorder
represents a constraint that earlier1
and earlier2
split earlier than later1
and later2
. If later2
is NA
, later2
will be set to the parent node of later1
. By default (type = 1
), a constraint will be satisfied as long as there is any lineage in which a split between earlier1
and earlier2
is ancestral to a split between later1
and later2
. type = 2
is stricter and requires that the earlier1
, earlier2
split is ancestral to the later1
, later2
split in all lineages. In graphs with multiple admixture events there can be multiple splits between earlier1
, earlier2
and later1
, later2
, and many ways in which these splits can relate to each other. The current implementation only covers some of the many possible topological relationships.
TRUE
if all constraints are satisfied, else FALSE
## Not run: # Test whether the split between A and B is earlier than the split between C and D, # and whether the split between C and D is earlier than the terminal branch leading to E constrain_events = tribble( ~earlier1, ~earlier2, ~later1, ~later2, 'A', 'B', 'C', 'D', 'C', 'D', 'E', NA) satisfies_eventorder(random_admixturegraph(5, 0), eventorder = constrain_events) ## End(Not run)
## Not run: # Test whether the split between A and B is earlier than the split between C and D, # and whether the split between C and D is earlier than the terminal branch leading to E constrain_events = tribble( ~earlier1, ~earlier2, ~later1, ~later2, 'A', 'B', 'C', 'D', 'C', 'D', 'E', NA) satisfies_eventorder(random_admixturegraph(5, 0), eventorder = constrain_events) ## End(Not run)
This function returns TRUE
if and only if the admixture graph is compatible with
the f4-statistics listed in nonzero_f4
being non-zero
satisfies_nonzerof4(graph, nonzero_f4)
satisfies_nonzerof4(graph, nonzero_f4)
graph |
An admixture graph |
nonzero_f4 |
A data frame or matrix with four columns. One row for each f4-statistic which is expected to be non-zero |
TRUE
if all constraints are satisfied, else FALSE
## Not run: # Test whether f4(A,B; C,D) is expected to be non-zero in this graph: nonzero_f4 = matrix(c('A', 'B', 'C', 'D'), 1) satisfies_nonzerof4(random_admixturegraph(5, 2), nonzero_f4) ## End(Not run)
## Not run: # Test whether f4(A,B; C,D) is expected to be non-zero in this graph: nonzero_f4 = matrix(c('A', 'B', 'C', 'D'), 1) satisfies_nonzerof4(random_admixturegraph(5, 2), nonzero_f4) ## End(Not run)
This function returns TRUE
if and only if the admixture graph satisfies all constraints on
the number of admixture events for the populations in admix_constraints
satisfies_numadmix(graph, admix_constraints)
satisfies_numadmix(graph, admix_constraints)
graph |
An admixture graph |
admix_constraints |
A data frame with columns |
TRUE
if all admixture constraints are satisfied, else FALSE
## Not run: # At least one admixture event for C, and none for D: constrain_cd = tribble( ~pop, ~min, ~max, 'C', 1, NA, 'D', NA, 0) satisfies_numadmix(random_admixturegraph(5, 2), constrain_cd) ## End(Not run)
## Not run: # At least one admixture event for C, and none for D: constrain_cd = tribble( ~pop, ~min, ~max, 'C', 1, NA, 'D', NA, 0) satisfies_numadmix(random_admixturegraph(5, 2), constrain_cd) ## End(Not run)
This function returns TRUE
if and only if the admixture graph is compatible with
the f4-statistics listed in nonzero_f4
being non-zero
satisfies_zerof4(graph, zero_f4)
satisfies_zerof4(graph, zero_f4)
graph |
An admixture graph |
zero_f4 |
A data frame or matrix with four columns. One row for each f4-statistic which is expected to be zero |
TRUE
if all constraints are satisfied, else FALSE
## Not run: # Test whether Chimp and Altai are a clade with respect to all populations X and Y: # (whether f4("Chimp", "Altai"; X, Y) is 0 for all pairs of X and Y) zero_f4 = expand_grid(pop1 = "Chimp", pop2 = "Altai", pop3 = X, pop4 = Y) satisfies_zerof4(random_admixturegraph(5, 2), zero_f4) ## End(Not run)
## Not run: # Test whether Chimp and Altai are a clade with respect to all populations X and Y: # (whether f4("Chimp", "Altai"; X, Y) is 0 for all pairs of X and Y) zero_f4 = expand_grid(pop1 = "Chimp", pop2 = "Altai", pop3 = X, pop4 = Y) satisfies_zerof4(random_admixturegraph(5, 2), zero_f4) ## End(Not run)
Return shortest unique prefixes
shortest_unique_prefixes(strings, min_length = 1)
shortest_unique_prefixes(strings, min_length = 1)
strings |
A character vector |
min_length |
Minimum length of prefixes |
Character vector with shortest unique prefixes
strings = c("Abra", "Abracadabra", "Simsalabim") shortest_unique_prefixes(strings)
strings = c("Abra", "Abracadabra", "Simsalabim") shortest_unique_prefixes(strings)
Nodes with in and out degree of one will be removed.
simplify_graph(graph)
simplify_graph(graph)
graph |
an igraph object |
simple = simplify_graph(example_igraph) plot_graph(example_igraph) plot_graph(simple)
simple = simplify_graph(example_igraph) plot_graph(example_igraph) plot_graph(simple)
This function splits a large matrix into smaller blocks with cols_per_chunk
columns per block,
and saves them as .rds
files with prefix prefix
split_mat(mat, cols_per_chunk, prefix, overwrite = TRUE, verbose = TRUE)
split_mat(mat, cols_per_chunk, prefix, overwrite = TRUE, verbose = TRUE)
mat |
The matrix to be split |
cols_per_chunk |
Number of columns per block |
prefix |
Prefix of output files |
overwrite |
Overwrite existing files (default |
verbose |
Print progress updates |
packedancestrymap_to_afs
, afs_to_f2
## Not run: afdat = packedancestrymap_to_afs('path/to/packedancestrymap_prefix', allpopulations) split_mat(afdat$afs, cols_per_chunk = 20, prefix = 'afdat_split_v42.1/afs') ## End(Not run)
## Not run: afdat = packedancestrymap_to_afs('path/to/packedancestrymap_prefix', allpopulations) split_mat(afdat$afs, cols_per_chunk = 20, prefix = 'afdat_split_v42.1/afs') ## End(Not run)
Split nodes with more than two edges
split_multifurcations(graph)
split_multifurcations(graph)
graph |
An admixture graph |
A new admixture graph in which no node has more than two incoming or outgoing edges
Modify a graph by regrafting a subcomponent
spr_all(graph, fix_outgroup = TRUE)
spr_all(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
A new admixture graph
Modify a graph by regrafting a leaf
spr_leaves(graph, fix_outgroup = TRUE)
spr_leaves(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
A new admixture graph
List leaf nodes for all internal nodes
summarize_descendants(graph)
summarize_descendants(graph)
graph |
An admixture graphs |
A data frame with columns from
, to
, id
## Not run: summarize_descendants(graph) ## End(Not run)
## Not run: summarize_descendants(graph) ## End(Not run)
List leaf nodes for all internal nodes in a list of graphs
summarize_descendants_list(graphlist, rename = FALSE)
summarize_descendants_list(graphlist, rename = FALSE)
graphlist |
A list of admixture graphs |
rename |
If |
A data frame with columns graph
, from
, n
, frac
## Not run: summarize_descendants_list(graphlist) ## End(Not run)
## Not run: summarize_descendants_list(graphlist) ## End(Not run)
List population split events in a graph
summarize_eventorder(graph, unique_only = TRUE)
summarize_eventorder(graph, unique_only = TRUE)
graph |
An admixture graph |
A data frame with columns earlier1
, earlier2
, later1
, later2
## Not run: summarize_eventorder(example_igraph) ## End(Not run)
## Not run: summarize_eventorder(example_igraph) ## End(Not run)
List population split events in a list of graphs
summarize_eventorder_list(graphlist)
summarize_eventorder_list(graphlist)
graphlist |
A list of admixture graphs |
A data frame with columns earlier1
, earlier2
, later1
, later2
, n
, frac
## Not run: summarize_eventorder(graphlist) ## End(Not run)
## Not run: summarize_eventorder(graphlist) ## End(Not run)
This function takes the output of qpgraph_resample_snps
and creates a data frame with summaries of the estimated graph parameters. weight
has the mean of all fits, while low
and high
have lower and upper quantiles.
summarize_fits(fits, q_low = 0.05, q_high = 0.95)
summarize_fits(fits, q_low = 0.05, q_high = 0.95)
fits |
A nested data frame where each line is the output of |
A data frame summarizing the fits
List number of admixture events for each population
summarize_numadmix(graph)
summarize_numadmix(graph)
graph |
An admixture graph |
A data frame with columns pop
and nadmix
## Not run: summarize_numadmix(example_igraph) ## End(Not run)
## Not run: summarize_numadmix(example_igraph) ## End(Not run)
List number of admixture events for each population in a list of graphs
summarize_numadmix_list(graphlist)
summarize_numadmix_list(graphlist)
graphlist |
A list of admixture graphs |
A data frame with columns pop
, mean
, mean_nonzero
, min
, max
## Not run: summarize_numadmix_list(graphlist) ## End(Not run)
## Not run: summarize_numadmix_list(graphlist) ## End(Not run)
Assign proxy populations to admixed populations
summarize_proxies(graph)
summarize_proxies(graph)
graph |
An admixture graph in igraph format, or as edge list with column |
A data frame with columns pop
, nadmix
, proxy
, nproxies
(and weight
)
## Not run: summarize_proxies(example_igraph) ## End(Not run)
## Not run: summarize_proxies(example_igraph) ## End(Not run)
List proxy populations in graphlist
summarize_proxies_list(graphlist)
summarize_proxies_list(graphlist)
graphlist |
A list of admixture graphs |
A data frame with columns pop
, proxy
, nadmix
, nproxies
, n
, frac
## Not run: summarize_proxies_list(graphlist) ## End(Not run)
## Not run: summarize_proxies_list(graphlist) ## End(Not run)
This function summarizes topologies of population triples across graphs.
summarize_triples(graphs)
summarize_triples(graphs)
graphs |
A list of graphs |
A data frame with one line for each population triple and columns summarizing the relationship of each triple across graphs.
clade12
: Fraction of graphs in which pop1
and pop2
form a clade
x12
: Fraction of graphs in which pop1
admixes into pop2
at the exclusion of pop3
toptopo
: A binary string representation of the most common topology. Digits represent x12
, x21
, x13
, x31
, x23
, x32
toptopocnt
: The number of graphs in which toptopo
was observed
topos
: The counts of all topologies
List clades in a graph
summarize_zerof4(graph, eps = 1e-07)
summarize_zerof4(graph, eps = 1e-07)
graph |
An admixture graph |
eps |
Used to determine what counts as zero |
A data frame with all (non-redundant) zero f4-statistics
## Not run: summarize_zerof4(example_igraph) ## End(Not run)
## Not run: summarize_zerof4(example_igraph) ## End(Not run)
List clades in a list of graphs
summarize_zerof4_list(graphlist)
summarize_zerof4_list(graphlist)
graphlist |
A list of admixture graphs |
A data frame with columns pop1
, pop2
, pop3
, pop4
, n
, frac
## Not run: summarize_zerof4_list(graphlist) ## End(Not run)
## Not run: summarize_zerof4_list(graphlist) ## End(Not run)
Modify a graph by swapping two leaf nodes
swap_leaves(graph, fix_outgroup = TRUE)
swap_leaves(graph, fix_outgroup = TRUE)
graph |
An admixture graph |
fix_outgroup |
Keep outgroup in place |
A new admixture graph
A thin wrapper around qpadm_p
with rnk
set to zero
test_cladality(f2_data, left, right, fudge = 1e-04, boot = FALSE, cpp = TRUE)
test_cladality(f2_data, left, right, fudge = 1e-04, boot = FALSE, cpp = TRUE)
left |
Left populations (sources) |
right |
Right populations (outgroups) |
fudge |
Value added to diagonal matrix elements before inverting |
boot |
If |
cpp |
Use C++ functions. Setting this to |
This function tests whether a tree is part of a graph. This is useful for testing whether a Y-chromosome tree is consistent with an autosomal admixture graph. Leaf node names matter, but internal node names are ignored.
tree_in_graph(tree, graph)
tree_in_graph(tree, graph)
tree |
An admixture graph without admixture event |
graph |
An admixture graph |
TRUE
if all admixture constraints are satisfied, else FALSE
## Not run:
tree = graph_splittrees(example_igraph)
tree_in_graph(tree, example_igraph)
## End(Not run)
Returns all trees which can be reached through one iteration of subtree-prune-and-regraft
tree_neighbors(tree)
tree_neighbors(tree)
tree |
A tree in |
A data frame with all trees
This function generates and evaluates admixture graphs in numgen
iterations
to find well fitting admixturegraphs.
unidentifiable_edges(graph)
unidentifiable_edges(graph)
graph |
An admixture graph |
A data frame with all unidentifiable graph parameters
Convert graph to dot format
write_dot( graph, outfile = stdout(), size1 = 7.5, size2 = 10, title = "", dot2pdf = FALSE )
write_dot( graph, outfile = stdout(), size1 = 7.5, size2 = 10, title = "", dot2pdf = FALSE )
graph |
Graph as igraph object or edge list (columns labelled 'from', 'to', 'weight') |
outfile |
Output file name |
## Not run: results = qpgraph(example_f2_blocks, example_graph) write_dot(results$edges) ## End(Not run)
## Not run: results = qpgraph(example_f2_blocks, example_graph) write_dot(results$edges) ## End(Not run)
This function takes 3d arrays of blocked f2, allele frequency products, and counts, splits them by population pair,
and writes each pair to a separate .rds
file under {outdir}/{pop1}/{pop2}.rds
.
write_f2(est_arr, count_arr, outdir, id = "f2", overwrite = FALSE)
write_f2(est_arr, count_arr, outdir, id = "f2", overwrite = FALSE)
est_arr |
A 3d array with blocked f2, allele frequency products, or fst estimates for each population pair. The first two dimensions of each array have to have population names. |
count_arr |
A 3d array of the same dimension with counts |
outdir |
Directory where data will be stored |
id |
Postfix showing the type of statistic ("f2", "ap", or "fst") |
overwrite |
Overwrite existing files in |
## Not run: write_f2(f2_arr, count_arr, outdir = 'path/to/f2stats/') ## End(Not run)
## Not run: write_f2(f2_arr, count_arr, outdir = 'path/to/f2stats/') ## End(Not run)