Skip to content

HISAT2

Now that we clearly know the strandness of our libraries in the PRJNA630433 project, we can perform their read alignments.

Before all, it is important to remember the structure of the data as mentioned in the introduction to the use case PRJNA630433.

For the sake of simplicity, we report here again the table 3 from the section data upload:

Table 3
run_accession   sample_title
SRR11688222 Mo rep2
SRR11688221 Dc rep2
SRR11688228 Dc rep4
SRR11688227 Mo rep4
SRR11688218 Dc rep1
SRR11688219 Mo rep1
SRR11688220 Oc rep1
SRR11688223 Oc rep2
SRR11688224 Dc rep3
SRR11688225 Mo rep3
SRR11688226 Oc rep3
SRR11688229 Oc rep4

Let us reorder the table as follows:

run_accession   sample_title
SRR11688218 Dc rep1
SRR11688221 Dc rep2
SRR11688224 Dc rep3
SRR11688228 Dc rep4
SRR11688219 Mo rep1
SRR11688222 Mo rep2
SRR11688225 Mo rep3
SRR11688227 Mo rep4
SRR11688220 Oc rep1
SRR11688223 Oc rep2
SRR11688226 Oc rep3
SRR11688229 Oc rep4

From this data structure, we see that it will be more convenient to treat the data as three collections of replicates (1, 2, 3 and 4): collections Dc, Mo and Oc, respectively.

Let's do this directly without even creating a new history "HISAT2 mapping" before (we will do it on the fly)

  • Go to the data library Libraries / IOC_bulk_RNAseq / PRJNA630433 / FASTQ files ( You know how to do this from the previous section)
  • Check first the datasets SRR11688218, SRR11688221, SRR11688224 and SRR11688228
  • click the Export to History tab and, this time, select as a Collection
  • on the next pop up panel, type the name of a new history: HISAT alignments and press the Continue button
  • In the next panel Create a collection from a list of datasets, you could reorder the replicates but there is little sense doing this: replicates are replicates, and unless specific design of the experiment, the number associated to a replicate is generally meaningless.

Anyway you can order the samples alphabetically using the corresponding icon in the panel.

Most importantly, give the collection a name ! In this case, we took all Dc replicates, thus it is logical to name it Dc.

Finally, press the button Create Collection - [x] Now, if you click the house/home icon of the main Galaxy menu, you will access to the newly created history HISAT alignments and its first collection that contains the 4 fastqsanger.gz datasets SRR11688218, SRR11688221, SRR11688224 and SRR11688228

Next, let's proceed with the second collection: - [x] Return to the data library Libraries / IOC_bulk_RNAseq / PRJNA630433 / FASTQ files - [x] Check now the datasets SRR11688219, SRR11688222, SRR11688225 and SRR11688227 - [x] click the Export to History tab and select again as a Collection - [x] on the next pop up panel, do not type the name of a new history but instead, select the previously created history HISAT alignments, and press the Continue button - [x] Give the collection the name Mo and press Create Collection

Proceed the same way with the third collection and the remaining datasets - [x] SRR11688220, SRR11688223, SRR11688226 and SRR11688229 - [x] Give the collection the name Oc

You can now navigate to the history HISAT alignments and verify that it is containing 3 collections of 4 fastqsanger.gz datasets each.

  • Go a last time to the data library Libraries / IOC_bulk_RNAseq / Mouse reference files
  • Check the GTF file Mus_musculus.GRCm38.102.chr.gtf and export it to the history HISAT alignments as a dataset. We also will need the GTF annotation file for the next step.

We are ready to perform HISAT2 alignments of this three dataset collections ! Your history should look like this:

HISAT2 alignments of the three collections Dc, Mo and Oc.

HISAT2 settings

  • Source for the reference genome

    → Use a built-in genome

  • Select a reference genome

    → GRCm38

  • Is this a single or paired library

    → single

  • FASTA/Q file

    → Click first the collection icon , and select 5: DC

  • Specify strand information

    → Reverse (R) (we know this from the previous analysis with infer experiment !)

  • Summary Options

    → Output alignment summary in a more machine-friendly style. Yes

    → Print alignment summary to a file. Yes (for MultiQC)

  • Leave Advanced Options as is

  • Press Execute !

The tool will run during several minutes, generating two new dataset collections, whose name is self-explanatory. However, take benefit of this run time, to rename these collections with more meaningful names.

Thus, click first on the running collection (yellow) HISAT2 on collection 5: aligned reads (BAM), click the pencil icon of the collection content, type Dc HISAT2 alignments (BAM) and click the Save button.

Rename in the same way the collection HISAT2 on collection 5: Mapping summary to Dc Mapping summary

⚠ Don't be lazzy, although a bit borring, these renaming operations are essential to the readibility of your histories.

Re-run a tool !

We still have two dataset collections to align with HISAT2. Since we will use the exact same HISAT2 settings, with the exception of the input collection, we are going to use a powerful feature of Galaxy: the possibilité to re-run a tool.

Let's do it first for the alignment of the Mo dataset collection:

  • first, click on the previous output collection which you have renamed Dc HISAT2 alignments (BAM) (note that you could follow the same procedure using the other Output collection Dc Mapping summary).
  • You should now see the content of the collection, ie, the 4 bam datasets with labels SRR11688218, SRR11688221, SRR11688224 and SRR11688228.
  • Click on any of the 4 datasets, which will result in the deployment of the dataset within the collection view.

  • Now you can click on the re-run icon as indicated above. This will bring up the HISAT2 form, with the same settings used to generate the dataset.
  • Here, the only important thing is to change the input dataset. In this specific case, click on the collection icon () and select the collection 10: Mo
  • You can now Execute HISAT2 on this collection and, as we did before, rename the two new output collections Mo HISAT2 alignments (BAM) and Mo Mapping summary, respectively.

As you can expect now, it remains to repeat the exact same operation sequence to align the Remaining input collection 15: Oc.

⚠ Do not forget to rename your output collection appropriately !

Mapping statistics with MultiQC tool

MultiQC settings

  • 1: Results

    → HISAT2

  • Output of HISAT2

    → Click first the collection icon

    → Select the 3 collections Dc, Mo and Oc Mapping summary, holding down the Cmd key

  • Leave the other settings as is

  • Press Execute !

When MultiQC has run, look at the aggregated mapping statistics by clicking the eye icon of the dataset MultiQC on data 46, data 44, and others: Webpage