WhatsHap

WhatsHap

open_source

WhatsHap is an open-source bioinformatics tool for phasing genomic variants using DNA sequencing reads, supporting PacBio, Nanopore, and Illumina data.

About

WhatsHap is a powerful open-source (MIT license) bioinformatics software designed for read-based phasing and haplotype assembly of genomic variants. It takes DNA sequencing reads from technologies such as PacBio, Oxford Nanopore, and Illumina and produces accurate phased VCF files. WhatsHap is particularly well-suited for long-read sequencing data but performs strongly with short reads as well. The tool supports phasing of SNVs, indels, and complex variants (e.g., TCG → AGAA), making it highly versatile for genomic studies. It includes a pedigree phasing mode that leverages reads from related individuals (e.g., trios) to improve phasing accuracy and reduce coverage requirements. WhatsHap offers a suite of subcommands for comprehensive genomic analysis: haplotype tagging (haplotag), read splitting by haplotype (split), variant genotyping (genotype), polyploid phasing (polyphase), phasing statistics (stats), variant comparison (compare), and more. Installation is straightforward via Conda or pip, and the tool produces standard-compliant VCF output compatible with existing genomics pipelines. It is widely used in genomic research, clinical genetics, population genomics, and any domain requiring accurate haplotype-resolved variant data. Its open-source nature and extensive documentation make it accessible to both academic researchers and bioinformatics developers.

Key Features

  • Multi-Technology Read Support: Works with Illumina, PacBio, Oxford Nanopore, and other sequencing technologies for broad compatibility.
  • Comprehensive Variant Phasing: Phases SNVs, indels, and complex variants (e.g., TCG → AGAA) for thorough genomic analysis.
  • Pedigree Phasing Mode: Uses reads from related individuals such as trios to improve phasing accuracy and lower coverage requirements.
  • Rich Subcommand Suite: Includes tools for haplotype tagging, read splitting, genotyping, polyploid phasing, statistics, and variant comparison.
  • Standard-Compliant VCF Output: Produces industry-standard VCF output by default, with optional ReadBackedPhasing-compatible output.

Use Cases

  • Phasing genomic variants in whole-genome sequencing studies to distinguish maternal and paternal haplotypes.
  • Haplotype-resolved analysis of long-read sequencing data from PacBio or Oxford Nanopore platforms.
  • Pedigree-based phasing in family genomics studies, including trio analysis to improve accuracy and reduce coverage needs.
  • Polyploid genome phasing for plant genomics and other organisms with complex ploidy levels.
  • Tagging and splitting sequencing reads by haplotype for downstream population genetics or structural variant analysis.

Pros

  • Highly Accurate Results: Peer-reviewed research (Martin et al.) confirms fast and accurate phasing performance across read types.
  • Easy Installation and Use: Available via Conda or pip; simple workflow of passing in a VCF and BAM files to get a phased VCF out.
  • Open Source and Free: Released under the MIT license with full source code, extensive documentation, and an active issue tracker.

Cons

  • Command-Line Only: Requires familiarity with the command line and bioinformatics pipelines; no graphical user interface is available.
  • Niche Domain: Designed specifically for genomics research; not applicable outside bioinformatics and sequencing data analysis.
  • Compute-Intensive for Large Datasets: Phasing large whole-genome datasets can be computationally demanding and may require significant hardware resources.

Frequently Asked Questions

What types of sequencing reads does WhatsHap support?

WhatsHap supports reads from Illumina (short reads), PacBio, Oxford Nanopore, and other sequencing technologies. It is especially optimized for long reads but works well with short reads too.

What input and output formats does WhatsHap use?

WhatsHap takes a VCF file and one or more BAM files as input and produces a phased VCF file as output. The output is standard-compliant and can optionally be formatted for compatibility with ReadBackedPhasing.

What is pedigree phasing in WhatsHap?

Pedigree phasing uses sequencing reads from related individuals (e.g., parent-child trios) to improve phasing accuracy and reduce the coverage needed for reliable results.

How do I install WhatsHap?

WhatsHap can be installed easily using Conda (`conda install -c bioconda whatshap`) or pip (`pip install whatshap`). Development versions can also be installed directly from the GitHub repository.

Is WhatsHap free to use?

Yes, WhatsHap is completely free and open source, released under the MIT license. The full source code is available on GitHub.

Reviews

No reviews yet. Be the first to review this tool.

Alternatives

See all