This post covers calling germline variant and somatic variant from healthy and cancer sample. We will begin by following GATK best practices, in particular, we will be using HaplotypeCaller and MuTect2.

A tutorial is available here.

HaplotypeCaller

java -jar GenomeAnalysisTK.jar \
     -R reference.fasta \
     -T HaplotypeCaller \
     -I sample.bam \
     -o germline_snv.vcf

Difference between gVCF and VCF can be summarized by the following:

The key difference between a regular VCF and a gVCF is that the gVCF has records for all sites, whether there is a variant call there or not. The goal is to have every site represented in the file in order to do joint analysis of a cohort in subsequent steps.

Source.

We will just be using the VCF format for now.

It took about 4.5 hrs on a single core.

MuTect2

  • MuTect2 does not allow gVCF as for HaplotypeCaller.
  • It requires the tumor and the normal samples to be provided as input. It does not call germline SNVs.
  • It seems like MuTect2 is in a beta stage. Perhaps another tool should be used.
 java -jar GenomeAnalysisTK.jar \
     -T MuTect2 \
     -R reference.fasta \
     -I:tumor tumor.bam \
     -I:normal normal.bam \
     -o somatic_snv.vcf

Resources