Counting strategy
Count the number of reads per annotated gene
To compare the expression of single genes between different conditions (e.g. with or without Pasilla depletion), an essential first step is to quantify the number of reads per gene.
From the image above, we can compute:
Number of reads per exons
Gene | Exon | Number of reads |
---|---|---|
gene1 | exon1 | 3 |
gene1 | exon2 | 2 |
gene2 | exon1 | 3 |
gene2 | exon2 | 4 |
gene2 | exon3 | 3 |
- The gene1 has 4 reads, not 5 (gene1 - exon1 + gene1 - exon2) because of the splicing of the last read.
- The gene2 has 6 reads (3 spliced reads)
Counting tools
Two main tools could be used for that: HTSeq-count (Anders et al, Bioinformatics, 2015) or featureCounts (Liao et al, Bioinformatics, 2014). FeatureCounts is considerably faster and requires far less computational resources, so we will use it here.
In principle, the counting of reads overlapping with genomic features is a fairly simple task. But there are some details that need to be given to featureCounts: for example the strandness.