This lesson is in the early stages of development (Alpha version)

SV Exercises

Overview

Teaching: 0 min
Exercises: 120 min
Questions
  • How do the calls from short read data compare to those from the long read data?

  • How do the

Objectives
  • Align long read data with minimap2

  • Call SVs in short and long read data

  • Regenotype LR SVs in a short read data set

Exercises

  1. Run manta on our SR data for chromosomes 1, 6, and 20
  2. Run minimap2 on chromosome 20
  3. Run samtools merge for LR data
  4. Run sniffles on merged LR bam

Before we really start

Super fun exercise

cd
wget https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static
chmod a+x bedtools.static
mv bedtools.static miniconda3/envs/siw/bin/bedtools

First

We’ll make our output directory and change our current directory to it.

mkdir /workshop/output/sv
cd /workshop/output/sv

Manta

There are two steps to running manta on our data. First, we tell Manta what the inputs are by running configManta.py.

Run the following code pieces one at a time.

/software/manta-1.6.0.centos6_x86_64/bin/configManta.py \
 --bam /data/alignment/combined/NA12878.dedup.bam \
 --referenceFasta /data/alignment/references/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa \
 --runDir manta_NA12878
./manta_NA12878/runWorkflow.py \
 --mode local \
 --jobs 8 \
 --memGb unlimited 

Challenge

Run manta on NA12878’s parents (NA12891 and NA12892)

ls -lh manta_*/results/variants/diploidSV.vcf.gz

-rw-r--r-- 1 student student 137K Aug 27 22:54 manta_NA12878/results/variants/diploidSV.vcf.gz
-rw-r--r-- 1 student student 141K Aug 27 23:04 manta_NA12891/results/variants/diploidSV.vcf.gz
-rw-rw-r-- 1 student student 141K Aug 27 23:07 manta_NA12892/results/variants/diploidSV.vcf.gz

Minimap2

Challenge

If we take the first 1000 lines of a fastq file how many reads do we get?

zcat /data/SV/long_read/inputs/NA12878_NRHG.chr20.fq.gz \
| head 20000 \
| minimap2 \
    -ayYL \
    --MD \
    --cs \
    -z 600,200 \
    -x map-ont \
    -t 8 \
    -R "@RG\tID:NA12878\tSM:NA12878\tPL:ONT\tPU:PromethION" \
    /data/alignment/references/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.mmi \
    /dev/stdin \
| samtools \
    sort \
    -M \
    -m 4G \
    -O bam \
> NA12878.minimap2.bam

Flags

Sniffles

sniffles \
 --threads 8 \
 --input /data/SV/long_read/minimap2/NA12878_NRHG.minimap2.chr1-6-20.bam \
 --vcf NA12878_NRHG.sniffles.vcf

Regenotyping

Another super fun exercise

We need to make a copy of a file and change some of its contents.

cd 
cp /data/SV/inputs/NA12878.paragraph_manifest.txt .

Open it in the editor on the side panel and change /data/SV/bams/NA12878.chr1-6-20.bam to /data/alignment/combined/NA12878.dedup.bam.

cd /workshop/output/sv

~/paragraph-v2.4a/bin/multigrmpy.py \
 -i /data/SV/inputs/HGSVC_NA12878.chr1-6-20.vcf.gz \
 -m ./NA12878.paragraph_manifest.txt \
 -r /data/alignment/references/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa \
 -o paragraph_NA12878 \
 --threads 8 \
 -M 400

Secondary challenges

Manta

How many variants do we call using manta?

Solution

zgrep -vc "^#" 

What is the breakdown by type?

Sniffles

How many variants do we call using sniffles?

Solution

zgrep -vc "^#" NA12878_NRHG.sniffles.vcf.gz 
4209

What is the breakdown by type?

Solution

zgrep -v "^#" NA12878_NRHG.sniffles.vcf.gz | cut -f 8 | cut -f 2 -d ";" | sort | uniq -c 
    11 SVTYPE=BND
  1727 SVTYPE=DEL
     1 SVTYPE=DUP
  2464 SVTYPE=INS
     6 SVTYPE=INV

Discussion

Key Points