Skip to content

Count with STAR

Using RNA STAR for both alignment and read counting

We have already used the STAR aligner. But, for the sake of simplicity, we did not used its integrated fonction which allows to counts reads after alignments, still using the appropriate GTF input file.

This is what we are going to do in this section.

At first, navigate to the history STAR Alignments which we previously generated in the section STAR alignments.

From this history, copy (using the menu copy datasets item in the wheel history menu)

  • The three fastq.gz collections 5: Dc, 10: Mo, and 15: Oc
  • and the GTF file Mus_musculus.GRCm38.102.chr.gtf

in a new history that you will name STAR alignments AND counting.

Navigate to this new history and run RNA STAR with the following settings

RNA STAR settings

  • Single-end or paired-end reads

    → Single-end

  • RNA-Seq FASTQ/FASTA file

    → select the collection icon and then the collection 5: Dc

  • Custom or built-in reference genome

    → Use a built-in index

  • Reference genome with or without an annotation

    → use genome reference without builtin gene-model but provide a gtf

  • Select reference genome

    → GRCm38_w/o_GTF

  • Gene model (gff3,gtf) file for splice junctions

    → Mus_musculus.GRCm38.102.chr.gtf

  • In Output filter criteria, Exclude the following records from the BAM output

    → check Select all

  • Per gene/transcript output

    → This time, select Per gene read counts (GeneCounts)

  • Output filter criteria, Exclude the following records from the BAM output

    → check Select all

The tool will run during several minutes, generating four new dataset collections, whose name is self-explanatory. Take benefit of the run time, to rename at least 3 of these collections with more meaningful names:

  • RNA STAR on collection 5: logDc STAR log
  • RNA STAR on collection 5: mapped.bamDc RNA STAR mapped.bam
  • RNA STAR on collection 5: reads per geneDc nbre of reads per gene (STAR)

⚠ Reminder: we understand it is a bit borring to rename datasets but these renaming operations are essential to the readibility of your histories.

Re-run the RNA STAR tool for the collections:

  • 10: Mo
  • 15: Oc

💡 Do not wait the completion of the first RNA STAR run to trigger the 2 other ones.

This time, each run of RNA STAR generate a 5th dataset collection named RNA STAR on collection X: reads per gene.

Rename these collections Dc STAR counts, Mo STAR counts and Oc STAR counts, respectively. You can do this, even is the runs are not finished.

Mapping statistics with MultiQC tool

You can re-run MultiQC on the 3 RNA STAR log collection but note that we already permormed this operation in the history STAR alignments with the section 18_star

MultiQC settings
  • 1: Results
  • Which tool was used generate logs?

    → STAR

  • Click "Insert STAR output"

  • Type of STAR output?

    → Log

  • STAR log output

    → Click first the collection icon

    → Select the 3 collections Dc, Mo and Oc RNA STAR log, holding down the Cmd key

  • Leave the other settings as is

  • Press Execute !

This is the occasion to use the window managerwhich you can trigger by clicking this icon (becomes yellow when activated).

  • Click first on the eye of the collection MultiQC on ... and others: Webpage in the history STAR alignments AND counting.
  • The web report opens in a floatting window in the center of the screen.
  • Switch to the history HISAT Alignments using the history switch menu at the top of the history:

  • Click on the eye of the collection MultiQC on ... and others: Webpage in the history HISAT Alignments.
  • You can now compare the results from both aligners, sided by side in the center of the screen.

Adapt the format of STAR counts collections

One issue with the tables of read counts returned by RNAstar is that their format is not consistent:

The 4 first lines correspond to counts that should not be taken into accounts in the next step by the statistical tools DESeq2 or EdgeR. Namely, N_unmapped, N_multimapping, N_noFeature and N_ambiguous are relevant metrics to evaluate the quality of the counting (are they are indeed taken into account by MultiQC tool), but not for the statistical analysis of differential expression.

Thus, in this part, we are going to manipulate the RNA STAR count outputs and make them compatible with DESeq2 and EdgeR.

At firt, note that RNA STAR is reporting counts for all three possible library strandness.

Thus the first column should be used for unstranded libraries, the second for stranded, forward libraries, and the third for stranded, reverse libraries.

Since the PRJNA630433 are reverse stranded, we are going to remove the 2nd and 3rd columns of the RNA STAR count collections, using the galaxy tool Advanced Cut columns from a table (cut).

Advanced Cut columns settings

  • File to cut

    → Click and select Dc STAR counts

  • operation

    → Leave Keep

  • Delimited by

    Tab (indeed these datasets are tabular files)

  • Cut by

    fields

  • List of Fields

    → Select columns 1 and 4

  • Press Execute / Run tool

Repeat the same operation

For collections Mo STAR counts and Oc Star counts

Remove first 4 lines in cut counts

Next, we remove the irrelevant 4 first lines that remains in the cut datasets, using the tool Remove beginning of a file.

Remove beginning of a file settings

  • Remove first

    4

  • from

    → Click and select Advanced Cut on collection 20

  • Press Execute / Run tool

Repeat the same operation

For collections Advanced Cut on collection 40 and Advanced Cut on collection 60

Add a proper header

It will be easier to manipulate these datasets if they have a meaningful header.

We are going to do that using the tool Add Header

Add Header settings

  • List of Column headers (comma delimited, e.g. C1,C2,...)

    genes,counts

  • Data File (tab-delimted)

    → Click and select Remove beginning on collection 82

  • Press Execute / Run tool

Repeat the same operation

For collections Remove beginning on collection 87 and Remove beginning on collection 92

👏 We are now ready for the next steps