. Relate Documentation

Overview


This page lists add-on modules that can be applied to the Relate output for inference of coalescnece rates, mutation rates, and selection.

Estimate coalescence rates and effective population sizes


Estimating effective population sizes

You can find a bash script under

PATH_TO_RELATE/scripts/EstimatePopulationSize/EstimatePopulationSize.sh
This script simultaneously
  1. estimates population sizes
  2. re-estimates branch lengths using the estimated population sizes
  3. estimates an average mutation rate.
To run the script, all you need are the inferred .anc/.mut files and a .poplabels file (see here for the file format).

Please note that two versions of coalescence rates/population sizes are outputted:

You can test the code on the toy data set in example/data.
Do not move the script - we are extracting the path to the binaries from the location of the script!

PATH_TO_RELATE/scripts/EstimatePopulationSize/EstimatePopulationSize.sh \
              -i example \
              -m 1.25e-8 \
              --poplabels data/example.poplabels \
              --seed 1 \
              -o example_popsize
Input arguments
-i,--input
Filename of the .anc and .mut files, excluding the file extensions.
-o,--output
Filename of output files without file extensions.
-m,--mutation_rate
Mutation rate per base per generation.
--poplabels
Filename of a .poplabels file. The file format is described here. This file needs to always contain all samples.
In addition, the following optional arguments can be specified:
--noanc
Optional: Flag specifying whether to reinfer branch lengths at the end. 0: reinfer, 1: don't reinfer. Default: 0.
--pop_of_interest
Optional: Specifies populations for which population size is estimated. The second column of the .poplabels file is used for population labels. You can specify multiple populations separated by commas, e.g. pop1,pop2. When using this option, it is recommended (but NOT necessary), to run Relate with --annot. This can also be done after running Relate without --annot, see here.
--first_chr & --last_chr
Optional: Allow for multiple input files, where filenames are assumed to be numbered by chromosome index.
Example: -i example --first_chr 1 --last_chr 2 assumes example_chr1.anc, example_chr1.mut, example_chr2.anc, example_chr2.mut as input.
--chr
Optional: File containing chromosome names (one per line). Overrides --first_chr and --last_chr.
Example: -i example --chr chr.txt assumes example_chr*.anc, example_chr*.mut as input.
--threads
Optional: Maximum number of threads. Default is 1.
--bins
Optional: Specify epoch bins. Format: lower, upper, stepsize for function c(0,10^seq(lower, upper, stepsize)).
--years_per_gen
Optional: Years per generation. Default: 28.0
--num_iter
Optional: Number of iterations of the algorithm. Default is 5.
--threshold
Optional: Float in [0,1] which is the fraction of trees to be dropped, ranked by number of mutations mapping. This option serves two purposes: First, it removes bad/uninformative trees. Second it speeds up the inference. As we increase this value, we remove more trees. Default: 0.5.
--seed
Optional: Seed for the random number generator used in the MCMC algorithm for estimating branch lengths.

Output

example_popsize.pdf A plot showing the estimated population size.
example_popsize.anc.gz, example_popsize.mut.gz, example_popsize.dist Not outputted if --noanc 1. Genealogy with updated branch lengths.
Will contain all trees regardless of threshold value.
If --noanc 1 is specified, you can separately reestimate branch lengths using the ReEstimateBranchLengths.sh script.
example_popsize.coal Contains coalescence rates and cross-coalescence rates, treating all samples as one population (see below for file format).
example_popsize.pairwise.coal and *.bin Coalescence rate file and corresponding binary file containing coalescence rates between pairs of samples. Can be used by RelateCoalescentRates --mode FinalizePopulationSize to extract coalescence rates for any .poplabels file.
example_popsize_avg.rate Contains average mutation rate through time. Please note that any trends in the average mutation rate are confounded by coalescence rate changes - Relate can only infer mutation rate changes in mutation subcategories (see estimate mutation rate section).

File formats

.coal
  • The first line lists populations
  • The second line lists the epochs (in generations).
  • Each subsequent line contains the coalescence rates between two populations. Each line starts with two integers indicating the populations considered.
  • These are "raw" coalescence rates - to obtain diploid effective population size estimates, use 0.5/(coalescence rate).
group1 group2
0 35.7143 49.6248 ...
0 0 8.62951e-05 0.000104302 7.33554e-05 ...
0 1 8.62327e-05 0.000115472 9.85662e-05 ...
1 0 8.62327e-05 0.000115472 9.85662e-05 ...
1 1 0.000114249 8.37037e-05 7.23175e-05 ...
.rate
  • The first column lists the epochs (in generations).
  • The second column lists the mutation rate in this epoch.
0 1.23894e-08
35.7143 1.26274e-08
49.6248 1.2925e-08
68.9535 1.23841e-08

Reestimate branch lengths

Reestimate branch lengths using MCMC given coal file.

For a given coal file, this script will output anc/mut files with posterior mean branch lengths given coalescence rates.

-i,--input
Filename of .anc and .mut files without file extension. Needs to contain all SNPs in input haps/sample files. If not, see --dist option.
-o,--output
File of the output files without file extensions.
-m,--mu
Mutation rate, e.g., 1.25e-8.
--coal
Estimated coalescent rate, treating all haplotypes as one population.
--threads
Optional: Maximum number of threads. Default is 1.
--first_bp
Optional: First bp position.
--last_bp
Optional: Last bp position.
--dist
Optional: Filename containing bp positions of SNPs and distances between adjacent SNPs (file format). E.g., obtained using RelateExtract --mode AncMutForSubregion. Required if anc/mut file only covers a subregion of input haps/sample files.
--seed
Optional: Random seed for branch lengths estimation.

 PATH_TO_RELATE/scripts/SampleBranchLengths/ReEstimateBranchLengths.sh \
                 -i example \
                 -o example_sub \
                 -m 1.25e-8 \
                 --coal popsize.coal \
                 --first_bp 100000 \
                 --last_bp 200000 \
                 --seed 1 
Output: example_sub.anc/mut.

More details

Below, we describe details about the functions used in the script. The general idea is to do the following until convergence:
  1. Estimate average mutation rate
  2. Estimate coalescence rate
  3. Re-estimate branch lengths
If you have multiple chromosomes, you can parallelize step 3.

Calculate coalescence rates through time

Calculates the average pairwise coalescence rate through time.
-i,--input
Filename of the .anc and .mut files, excluding the file extensions.
-o,--output
Filename of output files without file extensions.
-m,--mutation_rate
Mutation rate per base per generation.
--poplabels
Optional: Filename of a .poplabels file. The file format is described here.
--first_chr & --last_chr
Optional: Allow for multiple input files, where filenames are assumed to be numbered by chromosome index.
Example: -i example --first_chr 1 --last_chr 2 assumes example_chr1.anc, example_chr1.mut, example_chr2.anc, example_chr2.mut as input.
--chr
Optional: File containing chromosome names (one per line). Overrides --first_chr and --last_chr.
Example: -i example --chr chr.txt assumes example_chr*.anc, example_chr*.mut as input.
--bins
Optional: Specify epoch bins. Format: lower, upper, stepsize for function c(0,10^seq(lower, upper, stepsize)).
--years_per_gen
Optional: Years per generation. Default: 28.0
 PATH_TO_RELATE/bin/RelateCoalescenceRate \
                 --mode EstimatePopulationSize\
                 -i example \
                 -o example 
Output: example.coal, example.bin.

Sometimes we want to obtain output with and without specifying the option --poplabels, or we want to specify different poplabels. For this, we can extract coalescence rates from example.bin using

 PATH_TO_RELATE/bin/RelateCoalescenceRate \
                 --mode FinalizePopulationSize\
                 --poplabels example.poplabels \
                 -i example \
                 -o example 
Here, --poplabels is optional. The .poplabels file needs to contain all samples.

Reestimate branch lengths

Once the average mutation rate and coalescence rates have been calculated, we can re-estimate the branch lengths.

--anc
Filename of the .anc file
--mut
Filename of the .mut file
-o,--output
Filename of output files without file extensions.
-m,--mutation_rate
Mutation rate per base per generation.
--mrate
Specifies the file containing the average mutation rate. Generated using this function.
--coal
Specifies the file containing coalescent rates. Generated using this function (do not specify option --poplabels for this).
--dist
Optional: Specifies distances between SNPs. If unspecified, we use the distances given in the .mut file. This option is only needed if some SNPs (and trees) have been removed using, e.g., this function. The .dist file can be generated using this function.
--seed
Optional: Specifies a seed for the random number generator used in the MCMC algorithm for estimating branch lengths.

 PATH_TO_RELATE/bin/RelateCoalescenceRate \
                 --mode ReEstimateBranchLengths\
                 --anc example.anc \
                 --mut example.mut \
                 --mrate example_avg.rate \
                 --coal example.coal \
                 -m 1.25e-8 \
                 -o example_updated 
Output: example_updated.anc, example_updated.mut.
With the updated branch lengths, we can then re-estimate the average mutation and coalescence rates. We can iterate these steps until convergence to obtain an algorithm that simulatneously estimates mutation and coalescence rates and branch lengths.

Estimate mutation rates


Calculate the average mutation rate through time

Calculates the average mutation rate.
-i,--input
Filename of the .anc and .mut files, excluding the file extensions.
-o,--output
Filename of output files without file extensions.
--first_chr & --last_chr
Optional: Allow for multiple input files, where filenames are assumed to be numbered by chromosome index.
Example: -i example --first_chr 1 --last_chr 2 assumes example_chr1.anc, example_chr1.mut, example_chr2.anc, example_chr2.mut as input.
--chr
Optional: File containing chromosome names (one per line). Overrides --first_chr and --last_chr.
Example: -i example --chr chr.txt assumes example_chr*.anc, example_chr*.mut as input.
--bins
Optional: Specify epoch bins. Format: lower, upper, stepsize for function c(0,10^seq(lower, upper, stepsize)).
--years_per_gen
Optional: Years per generation. Default: 28.0
--dist
Optional: Specifies .dist file. The .dist file stores distances in bp between SNPs. If unspecified, we use the distances given in the .mut file. This option is only needed if some SNPs (and trees) have been removed using, e.g., this function. The .dist file can be generated using this function. If specified together with --first_chr, --last_chr, please specify the filename without file extension.
 PATH_TO_RELATE/bin/RelateMutationRate \
                 --mode Avg\
                 -i example \
                 -o example 
Output: example_avg.rate.

Calculate the mutation rates grouped by nucleotide in sequence

Calculates the mutation rate for 96 groups of mutations, where each group corresponds to a triplet mutation (e.g. TCC to TTC mutation).
It is necessary to have appended SNP annotation with --annot to use this function (This can also be done after inferring .anc,.mut files, see ).
-i,--input
Filename of the .anc and .mut files, excluding the file extensions.
-o,--output
Filename of output files without file extensions.
--ancestor
Fasta file containing ancestral genome. This is case insensitive.
--mask
Fasta file of same length as the ancestral genome contining a genomic mask. Loci passing the mask are denoted by P, loci not passing the mask are denoted by N. This is case insensitive.
--bins
Optional: Specify epoch bins. Format: lower, upper, stepsize for function c(0,10^seq(lower, upper, stepsize)).
--years_per_gen
Optional: Years per generation. Default: 28.0
--dist
Optional: Specifies .dist file. The .dist file stores distances in bp between SNPs. If unspecified, we use the distances given in the .mut file. This option is only needed if some SNPs (and trees) have been removed using, e.g., this function. The .dist file can be generated using this function. If specified together with --first_chr, --last_chr, please specify the filename without file extension.
--first_chr & --last_chr
Currently not supported for mode WithContextForChromosome (because we need to specify separate mask, ancestor files for each chromosome.)
You can however apply mode WithContextForChromosome to each chromosome individually. You can then use mode Finalize (see below) with options --first_chr and --last_chr to obtain a genome-wide estimate. Filenames are assumed to be indexed by chromosome as before.
Example: -i example --first_chr 1 --last_chr 2 assumes example_chr1.anc, example_chr1.mut, example_chr2.anc, example_chr2.mut as input.
--chr
Optional: File containing chromosome names (one per line). Overrides --first_chr and --last_chr.
Example: -i example --chr chr.txt assumes example_chr*.anc, example_chr*.mut as input.
 PATH_TO_RELATE/bin/RelateMutationRate \
                 --mode WithContextForChromosome \
                 --mask genomic_mask.fa \
                 --ancestor ancestor.fa \
                 -i example_chr1 \
                 -o example_chr1 
Output: example_chr1.bin.

Next, to extract the rates from example.bin, use
 PATH_TO_RELATE/bin/RelateMutationRate \
                 --mode Finalize \
                 -i example_chr1 \
                 -o example_chr1 
Output: example_chr1.rate.

If you applied mode WithContextForChromosome to multiple chromosome, use the following to obtain a .bin file summarizing estimates genome-wide. Input filenames are assumed to be example_chr1.bin, example_chr2.bin, etc.
 PATH_TO_RELATE/bin/RelateMutationRate \
                 --mode Finalize \
                 --first_chr 1 \
                 --last_chr 22 \
                 -i example \
                 -o example 
Output: example.bin, example.rate.

Detect positive selection


Detect selection

Calculates p-values for selection evidence from genealogy.

We recommend that you first estimate the population size history using this module, with setting --threshold 0. This is so that the branch lengths in all trees are updated for the estimated population size history. Alternatively, if the region you are analysing is short, you can specify a .coal file in this module and the script reestimates branch lengths for the trees of interest before calculating selection p-values.

P-values can only be calculated for mutations with derived allele frequency of at least 3. Mutations with a smaller DAF are filtered out if the DAFs are appended to the .mut file using RelateFileFormats --mode GenerateSNPAnnotations, otherwise we set the -log10 p-value column to 1.

-i,--input
Filename of .anc and .mut files without file extension. Needs to contain all SNPs in haps/sample input file.
-o,--output
File of the output files without file extensions.
-m,--mu
Mutation rate, e.g., 1.25e-8.
--first_bp
Optional: First bp position.
--last_bp
Optional: Last bp position.
--years_per_gen
Optional: Years per generation. Default: 28.0
--coal
Optional: Estimate of coalescent rate, treating all haplotypes as one population. Not necessary if .anc/.mut files were obtained using EstimatePopulationSize.sh, otherwise strongly recommended.
--seed
Optional: Random seed for branch lengths estimation. Only used if --coal is specified.

 PATH_TO_RELATE/scripts/DetectSelection/DetectSelection.sh \
                 -i example \
                 -o example_selection \
                 -m 1.25e-8 \
                 --first_bp 1000000 \
                 --last_bp 1000000 \
                 --years_per_gen 28 \
                 --coal popsize.coal 
Output: example_selection.lin, example_selection.freq, example_selection.sele, (example_selection.anc, example_selection.mut, example_selection_dist if coal is specified).

File formats

.freq Records the frequency of the derived allele at generations genN .. gen1 (specified in the header)

Columns: pos rs_id genN genN-1 ... gen1 0
.lin Records the number of lineages in the tree at generations genN .. gen1 (specified in the header), as well as the number of lineages when the mutation had frequency 2

Columns: pos rs_id genN genN-1 ... gen1 0 when_mutation_has_freq2
.sele Records the log10 p-value for selection evidence at generations genN .. gen1 (specified in the header), as well as the log10 p-value when the mutation had frequency 2. Log10 p-value is set to 1 if mutation had frequency <= 1 at a generation.
The "when_mutation_has_freq2" column contains the p-value for selection evidence over the lifetime of this mutation. All other p-values condition on the frequency at the specified time.

Columns: pos rs_id genN genN-1 ... gen1 0 when_mutation_has_freq2

More details

Below, we describe details about the functions used to calculate the selection p-value.

Get selection pvalues

We first calculate, for each SNP, its frequency through time. This information is stored in two files: the .freq file and the .lin file. The former stores the frequency of a SNP through time, the latter stores the number of lineages in the corresponding tree through time.

 PATH_TO_RELATE/bin/RelateSelection \
                 --mode Frequency \
                 -i example \
                 -o example 
Output: example.freq, example.lin.

We then use these two files to calculate a p-value for selection evidence for each SNP.

 PATH_TO_RELATE/bin/RelateSelection \
                 --mode Selection \
                 -i example \
                 -o example 
Output: example.sele.

Sample branch lengths (e.g., for CLUES)


This module is used in CLUES and PALM, two methods for inference of selection strength on single loci and polygenic traits. Please refer to the CLUES documentation or the PALM documentation for details.

Sample branch lengths

Samples branch lengths using MCMC given a coal file.
The output file format is either anc/mut (with multiple branch lengths stored in the anc file), or input file formats of CLUES or PALM.
The output file format for CLUES resembles that of ARGweaver, which is the input file format required by CLUES for inference of selection and inferring allele frequency trajectories.

In Relate, we map mutations to branches with approximately the same descendants as carriers of the derived allele to be robust to errors. Therefore, we output an additional .sites file in which only the descendants of the branch to which the mutation is mapped are marked as derived, the rest are ancestral. Please also note that some SNPs don't map to a unique branch in the tree and such SNPs are left out of this .sites file.

-i,--input
Filename of .anc and .mut files without file extension. Needs to contain all SNPs in input haps/sample files. If not, see --dist option.
-o,--output
File of the output files without file extensions.
-m,--mu
Mutation rate, e.g., 1.25e-8.
--coal
Estimated coalescent rate, treating all haplotypes as one population.
--format
Output file format. 'a': anc/mut, 'n': newick/sites (used by CLUES), 'b': binary (used by CLUES and PALM). Default: 'a'. Note that anc/mut is modified to store multiple branch lengths for any one branch.
--num_samples
Number of times branch lengths are sampled. Integer >= 1.
--num_proposals
Optional: Number of MCMC proposals between samples. Default is max(10*N, 1000), where N is the number of haplotypes. If num_proposals=0, all samples will be identical to the input.
--first_bp
Optional: First bp position.
--last_bp
Optional: Last bp position.
--dist
Optional: Filename containing bp positions of SNPs and distances between adjacent SNPs (file format). E.g., obtained using RelateExtract --mode AncMutForSubregion. Required if anc/mut file only covers a subregion of input haps/sample files.
--seed
Optional: Random seed for branch lengths estimation.

 PATH_TO_RELATE/scripts/SampleBranchLengths/SampleBranchLengths.sh \
                 -i example \
                 -o example_sub \
                 -m 1.25e-8 \
                 --coal popsize.coal \
                 --format a \
                 --num_samples 5 \
                 --first_bp 100000 \
                 --last_bp 200000 \
                 --seed 1 
Output: example_sub.anc/mut or example_sub.newick/.sites or example_sub.timeb.

File formats

.newick A tab-delimited file with columns
chrom chromStart chromEnd MCMC_sample tree
  • chrom: Chromosome id.
  • chromStart: bp position at which tree starts.
  • chromEnd: bp position at which tree ends.
  • MCMC_sample: integer specifying the MCMC sample.
  • tree: Newick string of the tree.
.sites A tab-delimited file with
  • First line: Lists names of haplotypes.
  • Second line: Shows genomic region covered by this file.
  • Remaining lines: bp position followed by haplotype pattern at this site.

Plot trees of interest


TreeView

Plots a tree of interest using R.
Requires libraries ggplot2, cowplot.

--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
--haps
Filename of haps file.
--sample
Filename of sample file.
--poplabels
Filename of a .poplabels file. The file format is described here. This file needs to always contain all samples.
--bp_of_interest
Integer specfifying the BP position of the SNP of interest
--years_per_gen
Years per generation. Default: 28.0
-o,--output
File of the output files without file extensions.

 PATH_TO_RELATE/scripts/TreeView/TreeView.sh \
                 --haps data/example.haps \
                 --sample data/example.sample \
                 --anc example.anc \
                 --mut example.mut \
                 --poplabels data/example.poplabels \
                 --bp_of_interest 3000000 \
                 --years_per_gen 28 \
                 -o example 
Output: example.pdf.

You can change plotting parameters by editing PATH_TO_RELATE/scripts/TreeView/treeview.R. Specifically, lines 9 - 22 directly influence the plotting parameters.

TreeViewMutation

Plots a tree of interest using R and highlights lineages carrying derived allele.
Requires libraries ggplot2, cowplot.
Takes same arguments as the TreeView.sh script, however bp_of_interest has to specify a SNP in the input.

 PATH_TO_RELATE/scripts/TreeView/TreeViewMutation.sh \                   
                   --haps data/example.haps
                   --sample data/example.sample \
                   --anc example.anc \
                   --mut example.mut \
                   --poplabels data/example.poplabels \
                   --bp_of_interest 3000000 \
                   --years_per_gen 28 \
                   -o example 
Output: example.pdf.

TreeViewSamples

Plots a tree of interest using R for an anc/mut file containing multiple sampled branch length (obtained using the SampleBranchLengths module).
Requires libraries ggplot2, cowplot.
Takes same arguments as the TreeView.sh script.

 PATH_TOcowplot/TreeView/TreeViewSamples.sh \                   
                --haps data/example.haps
                --sample data/example.sample \
                --anc example.anc \
                --mut example.mut \
                --poplabels data/example.poplabels \
                --bp_of_interest 3000000 \
                --years_per_gen 28 \
                -o example 
Output: example.pdf.

Extract subtrees, trees in region of interest, and trees in newick format.


Extract subtrees corresponding to a subpopulation

Extracts subtrees such that all tips belong to the populations of interest.

--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
--poplabels
Filename of a .poplabels file. The file format is described here. This file needs to always contain all samples.
--pop_of_interest
Specifies populations for which population size is estimated. The second column of the .poplabels file is used for population labels. You can specify multiple populations separated by commas, e.g. pop1,pop2.
-o,--output
File of the output files without file extensions.
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode SubTreesForSubpopulation\
                 --anc example.anc \
                 --mut example.mut \
                 --poplabels example.poplabels \
                 --pop_of_interest group1 \
                 -o example_subpop 
Output: example_subpop.anc, example_subpop.mut, example_subpop.poplabels.

Extract trees corresponding to a subregion

Extracts trees in region of interest.

Please always apply this on the anc/mut files inferred using Relate (from haps/sample files), i.e., do not apply repeatedly to get "subregions of subregions". If you do, ignore the output .dist file and use the .dist file obtained from the original anc/mut files instead.

--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
--first_bp
BP of start of region.
--last_bp
BP of end of region.
-o,--output
File of the output files without file extensions.
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode AncMutForSubregion\
                 --anc example.anc \
                 --mut example.mut \
                 --first_bp 0 \
                 --last_bp 10000 \
                 -o example_subregion 
Output: example_subregion.anc, example_subregion.mut, example_subregion.dist.

Extract tree in region of interest in newick format

Extracts trees in region of interest in newick format.

--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
-o,--output
File of the output files without file extensions.
--bp_of_interest
Integer specfifying the BP position of interest
--first_bp
Integer specfifying the first BP position of interest
--last_bp
Integer specfifying the last BP position of interest
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode AncToNewick \
                 --anc example.anc \
                 --mut example.mut \
                 --first_bp 3000000 \
                 --last_bp 4000000 \
                 -o example 
Output: example.newick, example.pos. The latter file stores bp start position of each tree.

Remove trees with few mutations

Removes trees and associated SNPs from the .anc/.mut files onto which less than a given number of SNPs are mapped.

--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
--threshold
Float in [0,1] which is the fraction of trees to be dropped, ranked by number of mutations mapping. This option serves two purposes: First, it removes bad/uninformative trees. Second it speeds up the inference. As we increase this value, we remove more trees. Default: 0.5.
-o,--output
File of the output files without file extensions.
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode RemoveTreesWithFewMutations \
                 --anc example.anc \
                 --mut example.mut \
                 --threshold 0.5 \
                 -o example_subset 
Output: example_subset.anc, example_subset.mut, example.dist.

Extract .dist file from .mut file.

Extracts distances between SNPs from the .mut file (3rd and 4th column). The .dist file is needed in some applications, when the .anc/.mut files have been subsetted using, e.g., this function In these cases, the .dist file needs to be generated using the not subsetted .mut file.

--mut
Filename of the .mut file.
-o,--output
File of the output files without file extensions.
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode ExtractDistFromMut \
                 --mut example.mut \
                 -o example 
Output: example.dist.

Divide the .anc/.mut files into smaller files to allow multi-threading.

Dividing the .anc/.mut files into smaller files to allow for multi-threading.

threads
Number of threads that will be used. This slightly varies the number of chunks into which the .anc/.mut files will get divided into.
--anc
Filename of the .anc file.
--mut
Filename of the .mut file.
-o,--output
File of the output files without file extensions.
 PATH_TO_RELATE/bin/RelateExtract\
                 --mode DivideAncMut \
                 --threads 5 \
                 --anc example.anc \
                 --mut example.mut \
                 -o example 
Output: example.param, example_chr1.anc, example_chr1.mut, example_chr2.anc, example_chr2.mut, etc.

Combine the .anc/.mut files into one file.

Combining the .anc/.mut files into one file. This should be used after dividing .anc/.mut files using this function.

-o,--output
File of the output files without file extensions.

In addition, we assume that the corresponding .param file is in the same directory. You can also manually create this file, which has the header "NUM_HAPLOTYPES NUM_SNPS NUM_TREES NUM_CHUNKS", followed by the corresponding entries on the next line.

 PATH_TO_RELATE/bin/RelateExtract\
                 --mode CombineAncMut \
                 -o example 
Output: example.anc.gz, example.mut.gz.

File formats

.dist
  • The first line is a header
  • Each subsequent line contains the base pair position of the SNP and the distance (in bp) to the next SNP.
#pos dist
51479 22
51675 20
51872 42
51914 21