lobSTR comes with a pre-built reference and index for humans (hg19). This reference was built using a process described in Willems et al. by running Tandem Repeat Finder on the hg19 reference genome. Additionally, it contains Y-STR and CODIS markers (described here). The resource bundle for hg19 is available on the download page. However you may want to use a custom set of STR loci in your reference if:
You are working with a different species or different reference genome.
You would like to align to STRs with specific annotations that are not in the current reference.
To use your own reference, you will need to create a bed file with the loci of interest, build a lobSTR index, and build an STR info file. These steps are described below.
Note, all scripts mentioned below are available in the scripts/ directory of the lobSTR download. In addition, you must have bedtools installed and in the PATH. We have tested using version v2.22.1-17-gd6547b3, and older versions may not be compatible.
Reference bed file
The first step is to create a bed file with your custom set of STR loci. One way to do this is by running Tandem Repeats Finder on your reference genome of interest. You will need to make a bed file with the following columns present:
Column 1: chromosome
Column 2: start coordinate of the STR
Column 3: end coordinate of the STR
Column 4: period of the STR
Column 5: reference copy number
Column 9: STR score. This score measures the purity of the STR sequence and is based on the suggested Tandem Repeats Finder scoring scheme with match=2, mismatch=-7 and indel=-7. Therefore the maximum possible score for a perfectly pure STR sequence (e.g. ATATATATATAT) is 2*(length of STR region).
Column 15: STR repeat unit
Note some columns are not used. This is because lobSTR was originally designed to take input directly from tandem repeat finder. You can put any value in the non-required columns, just make sure there are at least 15 columns with the required information listed above.
Build the lobSTR index
To build the lobSTR index, create a clean directory where the index will be stored. Then run:
This will create an index with the prefix output_directory/lobSTR_.
Generating an STRInfo file
If using a custom index, you will also need to generate a file with information on each STR locus to input to the --strinfo option of the allelotyper.