Parallelization

ADMIXTOOLS 2 uses the future / furrr framework. The default plan is sequential, so functions behave like single-core code unless you opt in. Opt-in is one line:

library(future)
plan(multisession, workers = 4)

Switch back with plan(sequential). The same plan() call applies to every admixtools function that supports parallelism — there are no per-function flags to toggle.

What’s parallelized

Core data-extraction and model-fitting workflows:

  • extract_f2 and f4blockdat_from_geno parallelize across SNP blocks. This is the highest-impact case: a 15-pop / 5565-popcomb / 713-block run takes about 200 s sequentially and 75 s under plan(multisession, workers = 4).
  • qpadm, qp3pop, qp4ratio, and qpfstats call f4blockdat_from_geno when the input is a genotype-file prefix, so they benefit transparently.
  • qpadm_multi and qpadm_sweep parallelize across models on top of that — each model’s f4 computation also parallelizes per block, so two layers compose. On a clean compute budget you usually want only one layer parallel (set workers to match cores, and let either the per-block or per-model layer absorb them).
  • read_f2 parallelizes the per-pair .rds reads when loading a precomputed f2 cache. Useful on slow disks or NFS where I/O latency dominates.
  • qpgraph_resample_snps and qpgraph_resample_snps2 parallelize across resamples.
  • find_graphs_old parallelizes its independent repeats (with parallel = TRUE, the default).

Not parallelized:

  • find_graphs (the newer fitter — fast enough single-threaded that parallel overhead would dominate; if you want N independent runs in parallel, wrap it yourself in furrr::future_map(1:N, ~find_graphs(...))).
  • qpgraph itself (single-graph fit).

multisession vs multicore

multisession (workers = R subprocesses, communicating via sockets) is the default-portable choice. It works on all platforms including Windows.

multicore (workers = forked processes, sharing memory copy-on-write) is faster on Linux: a 15-pop qpadm run sees 1.9× speedup vs sequential under plan(multicore, workers = 4) and only 1.1× under plan(multisession, workers = 4) because forking skips the worker-startup and data-marshalling cost. On macOS it works at the command line but fails inside RStudio. On Windows it’s unsupported.

If you’re on Linux and not in RStudio:

plan(multicore, workers = 4)

Otherwise:

plan(multisession, workers = 4)

Parallelization on a compute cluster

Sometimes it makes more sense to parallelize across compute nodes rather than across cores. This can be done either in the traditional way of writing an R script and submitting it many times in parallel as separate jobs, or interactively from within R again using the furrr / future framework. The interactive route is more complicated to set up than parallelization across cores.

On a cluster using the Slurm job scheduler, the following command will set up parallelization across compute nodes.

library(future.batchtools)
plan(tweak(batchtools_slurm, workers = 50,
           resources = list(ncpus = 1, memory = 1024,
                            walltime = 10 * 60 * 60, partition = 'short')))

It specifies that up to 50 jobs should be run at a time, with each one requesting one CPU, 1024 MB of memory, and 10 hours on the partition called short. This requires the future.batchtools R package and a batchtools template file in the working directory — see this example template.

When parallelization doesn’t help

  • Tiny workloads — for a 5-pop / 10-popcomb / 100-block extract_f2, the sequential version finishes in under a second. The worker-spawn overhead of multisession can make parallel slower than serial on tiny inputs.
  • Memory-constrained machinesmultisession workers each hold their own copy of the input data after R serializes-and-sends it. For a 100 GB f2 cache and 8 workers, you’d need ~100 GB × 8 of headroom. Use multicore (copy-on-write) on Linux, fewer workers, or stay sequential.