Lobstr-code

lobSTR: a short tandem repeat profiler for next generation sequencing data

home
download
install
usage
documentation
faq
changelog
genotyping y-str/codis
validation sets
contact-us

Calling STR genotypes from whole genomes and whole exomes

This page has general information on running lobSTR on whole genome and whole exome sequencing. For more specific use cases or advice on setting specific paramters, see the documentation page.

Step 1: alignment

The first step is to align raw reads to STR-containing regions of the genome. In this step, you will generate a BAM file with reads aligning to STRs. Then you will sort and index that BAM to get ready for the second step, allelotyping.

Run lobSTR

lobSTR accepts a variety of input formats (paired or single end, fastq, fasta, or BAM). Based on the input format your reads are in, follow the appropriate instructions. Note for any of the file formats shown below, if you have multiple files of reads you would like to align at once, you can pass a comma-separated list of files for any of the input arguments. Before starting the next steps, you will need: In each case the following inputs are required:

From single-end fastq files

lobSTR \
   --index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
   -f my_sample.fq -q \
   --rg-sample my_sample --rg-lib my_sample \
   --out my_sample_output
	

From paired-end fastq files

lobSTR \
   --index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
   --p1 my_sample_1.fq --p2 my_sample_2.fq -q \
   --rg-sample my_sample --rg-lib my_sample \
   --out my_sample_output
	

From single-end bam files

lobSTR \
   --index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
   -f my_sample.bam --bam \
   --rg-sample my_sample --rg-lib my_sample \
   --out my_sample_output
	

From paired-end bam files

To run lobSTR in paired-end mode with BAM input, read pairs need to be adjacent to each other in the file so lobSTR knows which reads are paired. You can do this either by:
lobSTR \
   --index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
   -f my_sample.bam --bampair \
   --rg-sample my_sample.sorted.bam --rg-lib my_sample \
   --out my_sample_output
	

Sorting and indexing the resulting BAM

In the examples above, lobSTR will create the output files my_sample_output.aligned.bam and my_sample_output.aligned.stats. Before moving on to the allelotype step, use samtools to sort and index the BAM file:
samtools sort my_sample_output.aligned.bam my_sample_output.sorted
samtools index my_sample_output.sorted.bam
	

Alignment summary statistics

As mentioned above, the alignment step also produces the file my_sample_output.aligned.stats with various statistics about the alignment results. This file reports: If the alignment did not finish and exited with an error, this file will instead contain the error message. To help us learn about issues that users run into with lobSTR and which parameter settings are most used, these stats files are uploaded to Amazon S3 for analysis. To turn off this feature use the --noweb option.

Step 2: allelotype

For this step, you will need:

Running allelotype

allelotype \
   --command classify \
   --bam my_sample_output.sorted.bam \
   --index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
   --strinfo hg19_v3.0.2/lobstr_v3.0.2_hg19_strinfo.tab \
   --noise_model models/illumina_v3.pcrfree \
   --out my_sample_output
	
This will create the output files my_sample_output.vcf and my_sample_output.allelotype.stats. The format of the VCF output is descrbed on the file formats page.

Allelotype summary statistics

The allelotype produces a file my_sample_output.allelotype.stats with various statistics about the allelotype results. It reports: If the allelotype step did not finish and exited with an error, this file will instead contain the error message. To help us learn about issues that users run into with lobSTR and which parameter settings are most used, these stats files are uploaded to Amazon S3 for analysis. To turn off this feature use the --noweb option.

Next steps

Now with a list of STR variant calls, you might be interested in: See the documentation page for more details.