Lobstr-code

lobSTR: a short tandem repeat profiler for next generation sequencing data

home
download
install
usage
documentation
faq
changelog
genotyping y-str/codis
validation sets
contact-us

Recommendations for setting allelotype parameters

Overview

General usage for running the allelotype step can be found on the usage and best practices for WGS/WES pages. This page gives advice on more specific topics related to the allelotyper.

Adjusting alignments

By default, the allelotype recalculates the allele supported by each read by adding up all gaps from the entire read, including in the flanking regions. This gives the most concordant results when comparing to STR genotypes made by traditional capillary electrophoresis methods and gives the most reliable genotypes for ~100bp reads.

There is an option to calculate STR allele supported by each read by only including gaps that allelotype determines to be within the boundary of the STR. You can turn this on using the --dont-include-flank option. While theoretically more accurate than the approach described above, the process of ascertaining the boundary of the STR is still error prone and this option is usually not recommended. However, if you are using very long reads that may span more than a single STR, then it is recommended to set this option.

Improving call accuracy

The following parameters are often helpful in improving call accuracy:

Filtering alignments

You can use the following parameters to filter which read alignments are used by the allelotyper: Setting --min-border 5 --min-bp-before-indel 7 --maximal-end-match 15 --min-read-end-match 5 has given good results.

Speeding up allelotype

To speed up allelotype, it is recommended to run separately on each chromosome using the --chrom option and merging the result VCF files afterwards. For example:
for chrom in $(seq 1 22) X Y
  do
    allelotype \
      --command classify \
      --bam my_sample_output.sorted.bam \
      --index-prefix hg19_v3.0.1/lobstr_v3.0.1_hg19_ref/lobSTR_ \
      --strinfo hg19_v3.0.1/lobstr_v3.0.1_hg19_strinfo.tab \
      --noise_model models/illumina_v3.pcrfree \
      --out my_sample_output_chr${chrom} \
      --chrom $chrom
    cat my_sample_output_chr${chrom}.vcf | vcf-sort | bgzip -c > my_sample_output_chr${chrom}.sorted.vcf.gz
    tabix -p vcf my_sample_output_chr${chrom}.sorted.vcf.gz
done
vcf-concat $(ls my_sample_output_chr*.sorted.vcf.gz) | bgzip -c > my_sample_output_merged.vcf.gz
tabix -p vcf my_sample_output_merged.vcf.gz