The clc_correct_pacbio_reads tool (beta)

The clc_correct_pacbio_reads tool performs error-correction of PacBio reads.

IMPORTANT NOTICE: This tool relies on certain methods that are the intellectual property of Pacific Biosciences. Consequently, the use of this tool with any data other than data generated on a Pacibic Biosciences instrument constitutes a violation of the end-user license agreement that users of the CLC Assembly Cell agree to during installation.

A typical PacBio run produces a wide range of different read lengths. All other things being equal, the longer a read is, the more useful it is for de novo assembly. The primary reason for this is the ability of long reads to span longer repeats and connect with more unique sequence surrounding the repeat, clearly delimiting that repeat region in the final assembly.

Raw PacBio reads exhibit a much higher rate of (random) errors than short read technologies, such as Illumina. Unaddressed, this added noise would confuse the assembler, leading to a poor assembly.

The clc_correct_pacbio_reads tool performs optional, but highly recommended, pre-processing of the raw reads that alleviates this problem. It takes as input the raw reads and produces a new FASTA file5.3 containing error-corrected versions of those long reads.

The error correction itself is a step-wise process. Somewhat simplified, it consists of three steps:



Footnotes

... file5.3
The corrected reads will not have quality scores, as it is unclear how to calculate these in any meaningful way. However, this is not a problem, as quality scores are not used in the subsequent assembly process.


Subsections