pairtools is a simple and fast command-line framework to process sequencing data from a Hi-C experiment. pairtools perform various operations on Hi-C pairs and occupy the middle position in a typical Hi-C data processing pipeline:

The diagram of a typical processing pipeline for Hi-C data

In a typical Hi-C pipeline, DNA sequences (reads) are aligned to the reference genome, converted into ligation junctions and binned, thus producing a Hi-C contact map.

pairtools aim to be an all-in-one tool for processing Hi-C pairs, and can perform following operations:

  • detect ligation junctions (a.k.a. Hi-C pairs) in aligned paired-end sequences of Hi-C DNA molecules
  • sort .pairs files for downstream analyses
  • detect, tag and remove PCR/optical duplicates
  • generate extensive statistics of Hi-C datasets
  • select Hi-C pairs given flexibly defined criteria
  • restore .sam alignments from Hi-C pairs

pairtools produce .pairs files compliant with the 4DN standard.

pairtools uses a two-character notation to define pair types (see table _section-pair-types)

The full list of available pairtools:

Pairtool Description
dedup Find and remove PCR/optical duplicates.
filterbycov Remove pairs from regions of high coverage.
flip Flip pairs to get an upper-triangular matrix.
markasdup Tag pairs as duplicates.
merge Merge sorted .pairs/.pairsam files.
parse Find ligation junctions in .sam, make .pairs.
phase Phase pairs mapped to a diploid genome.
restrict Assign restriction fragments to pairs.
select Select pairs according to some condition.
sort Sort a .pairs/.pairsam file.
split Split a .pairsam file into .pairs and .sam.
stats Calculate pairs statistics.