Genome of an organism stores important genetic information which are needed to be decoded for understanding sequence variation among individuals. Hence, raw reads first need to be assessed for quality and then assembled using appropriate approaches depending on your research.

Image credit: https://www.pexels.com/search/jigsaw%20puzzle/

With the advancement of sequencing technologies complete DNA sequence of an organism can be determined. Genome sequencing helps to decipher the genetic information encoded in organism’s DNA. However, the sequencing data obtained needs to be processed further from quality check to assembly and annotation for identifying coding regions in a genome and determining what those genes do. So, if you have a raw data with you and trying to figure out the way for processing it further you can surely have a look below.

Once genome sequencing is done, what we get are raw reads which need to be processed and assembled for making a meaning out of it. You can understand it in a form of jigsaw puzzle. All the scattered pieces of a puzzle represent raw reads obtained after sequencing and now you must assemble all the pieces together to build a meaningful picture, which will be your assembled genome.

Before proceeding towards assembly, reads are first needed to be checked for quality and trimmed using various tools available. Cleaned reads are then assembled either using de-novo or reference-based approaches.  In a de-novo approach, you are only provided with pieces of puzzle, and you have to attach those pieces together and build a genome. In a reference-based approach you have a reference or a picture with you using which assembly is done. Assembly can be up to contig level, scaffold level or chromosome level depending on the data available.

Long read sequencing technology like Oxford nanopore have high error rates as compared to short read technologies like Illumina. Therefore, hybrid assembly of nanopore reads is generally preferred in which illumina reads are used for error correction as part of the assembly process. Contiguity and completeness of the assembly is then assessed using tools like QUAST, which provides overview of various assembly parameters.

Assembled genomes can be further used for annotation in which structure and location of protein coding genes is determined a.k.a gene prediction, followed by attaching a biological information to predicted genes referred to as functional annotation.

3 Comments

  1. Anshu June 22, 2024 at 9:17 pm - Reply

    A very unique way to understand the sequencing protocol.

  2. Pragya June 22, 2024 at 11:34 pm - Reply

    This was so informative and explained in such easy terms. It gives a clear picture of the sequencing steps required and why we perform each step.

  3. Simran Yadav June 23, 2024 at 10:56 am - Reply

    Short and simple explanation of genome sequence , very useful .

Leave A Comment