Output file formats (as of v4.0.0)
A BAM file containing all aligned
reads. This is in the standard BAM format, with the following added
Note the BAM output is not sorted.
- XS: start position of matching STR. Tag type: i
- XE: end position of matching STR. Tag type: i
- XR: STR repeat. Tag type: Z
- XC: reference copy number. Tag type: f
- XD: nucleotide length difference compared to reference. Tag type: i
- XX: 0/1 flag to indicate stitching of paired end reads. Tag type: i
- XM: distance between mate pairs. Tag type: i
- XN: (only if given) name of the STR repeat. Tag type: Z
- XG: The extracted STR region. Tag type: z
- XQ: map quality score of the aligned read. Tag type: i
- RG: read group. Tag type: Z
- XA: alternate alignments for multimappers. Tag type: Z
- XO: other STRs spanned by this read. Tag type: Z
- CHROM: chromosome of the STR
- POS: start position of the STR. Note, since the start position is
used, STRs with the same start coordinate but different motifs must
be filtered for compatibility with downstream vcftools.
- ID: this field is set to ".". For annotated STR loci, this gives the name of the STR marker.
- REF: gives the base in the reference at CHROM:POS.
- ALT: contains a comma separated list of "<STRVAR:allele>" where allele is the number of base pairs different from reference. If any STR variations were reported.
- QUAL: contains -10log10(P), where P is the probability that all samples are homozygous reference.
- FILTER: this field is left blank by
allelotype. lobSTR-specific filters can be set using
- RPA: repeats per allele
- END: the end coordinate of the STR
- MOTIF: the canonicalized STR repeat motif
- NS: Number of samples with data
- REF: the reference copy number
- RL: total length of the STR tract in the reference
- RU: repeat motif as it appears on the forward strand in the reference
- VT: variant type, set to STR
- ALLREADS: string giving all alleles of all reads seen (column 11 from genotypes.tab
file). The format is allele1|readcount;allele2|readcount, etc.
- AML: allele marginal likelihood ratio scores. The score for each allele gives the sum of all likelihoods of genotypes containing that allele divided by the sum of likelihoods of all genotypes considered.
- DISTENDS: Average difference between distance of STR to read ends.
- DP: STR coverage
- DPA: STR coverage, including filtered reads
- GB: reported allelotype given in bp difference from reference. Set to "./." for no call.
- PL: Phred-scaled genotype likelihoods. Given for each possible pair of alleles from the ALT field. If j and k are the indices of the alleles in the ALT field, the (F(j/k) = (k*(k+1)/2)+j)th field gives the likelihood of allelotype (allele j, allele k) as in the standard VCF format. Normalized to give likelihoods relative to that of the maximum likelihood genotype, which will have a phred-scaled likelihood of 0.
- PQ: -1*log10(1-Q)
- Q: Likelihood ratio score of the allelotype call. Gives the likelihood of the reported genotype divided by the sum of likelihoods of all considered genotypes.
- GT: genotype, given as indices into the list in the ALT field as
in the standard VCF format. Set to "./." if no call.
- SB: strand bias (as defined in the GATK documentation)
- STITCH: Number of stitched reads
- FT: (filled by
lobSTR_filter_vcf.py) Call-level filter.