NanoCaller

NanoCaller

open_source

NanoCaller is an open-source deep convolutional neural network tool for SNP and indel variant calling from long-read sequencing data, using haplotype-aware pileup analysis.

About

NanoCaller is a bioinformatics tool designed for accurate detection of single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) from long-read sequencing data such as Oxford Nanopore and PacBio reads. At its core, NanoCaller integrates a deep convolutional neural network that exploits long-range haplotype structure — it generates predictions for each SNP candidate by considering pileup information from other candidate sites sharing the same reads, rather than analyzing each site in isolation. For indel calling, NanoCaller performs read phasing followed by local realignment of phased read sets and all reads at each indel candidate site. It then creates consensus sequences to predict indel sequences with high fidelity. This two-stage approach — SNP calling informed by haplotype context, followed by phasing-guided indel detection — makes NanoCaller particularly well-suited for the error profiles inherent in long-read technologies. The tool is distributed as a Dockerfile and Conda environment for reproducible deployment and is actively maintained with regular updates. NanoCaller is aimed at computational biologists, genomics researchers, and bioinformaticians working with Nanopore or PacBio long-read sequencing pipelines. Its open-source MIT license makes it freely available for both academic and commercial research applications.

Key Features

  • Deep CNN Variant Calling: Uses a convolutional neural network to call SNP candidates by analyzing pileup information across candidate sites sharing long reads.
  • Haplotype-Aware SNP Detection: Leverages long-range haplotype structure so that each SNP candidate is evaluated in the context of other nearby variant sites, improving accuracy.
  • Phasing-Guided Indel Calling: Performs read phasing then local realignment of phased and unphased reads at each indel site, followed by consensus sequence generation for indel prediction.
  • Dockerized & Conda-Ready Deployment: Provides a Dockerfile and environment.yml for reproducible installation and deployment across computing environments.
  • Actively Maintained: Regular releases with bug fixes and improvements; v3.6.2 released March 2025 with bcftools temp directory fixes and new options.

Use Cases

  • Calling germline SNPs and indels from whole-genome Nanopore sequencing of human samples
  • Detecting somatic variants in cancer genomics studies using long-read sequencing data
  • Phasing heterozygous variants to reconstruct haplotypes in population genetics research
  • Benchmarking and comparing long-read variant callers in genomics tool development pipelines
  • Integrating into clinical or research bioinformatics pipelines for rare disease genomic diagnosis

Pros

  • Open Source & Free: Fully open-source under the MIT License, making it accessible for academic, clinical, and commercial genomics research without licensing costs.
  • Haplotype-Informed Accuracy: The use of long-range haplotype context sets NanoCaller apart from simple pileup callers, yielding improved precision on complex genomic regions.
  • Reproducible Environment: Docker and Conda support make it easy to deploy in HPC clusters, cloud environments, or local workstations with consistent results.

Cons

  • Requires Bioinformatics Expertise: Installation and operation assume familiarity with command-line tools, long-read sequencing pipelines, and genomics workflows.
  • Computationally Intensive: Deep learning inference on large sequencing datasets can require significant CPU/GPU resources and processing time.
  • Limited to Long-Read Data: NanoCaller is specifically designed for Nanopore and PacBio long reads and is not intended for short-read Illumina sequencing data.

Frequently Asked Questions

What sequencing platforms does NanoCaller support?

NanoCaller is designed for long-read sequencing platforms, primarily Oxford Nanopore Technologies (ONT) and PacBio. It is not optimized for short-read platforms like Illumina.

What types of variants can NanoCaller detect?

NanoCaller detects both single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) from long-read sequencing data.

How do I install NanoCaller?

NanoCaller can be installed via Docker using the provided Dockerfile or through Conda using the environment.yml file in the repository. Both methods ensure a reproducible environment.

Is NanoCaller free to use?

Yes, NanoCaller is fully open-source and distributed under the MIT License, allowing free use for both academic and commercial purposes.

How does NanoCaller use haplotype information for SNP calling?

NanoCaller's neural network considers pileup information from multiple candidate SNP sites that share the same long reads, exploiting long-range haplotype linkage to make more informed variant predictions at each site.

Reviews

No reviews yet. Be the first to review this tool.

Alternatives

See all