Skip to content

Filtering datasets to remove or trim low quality sequences

This step is optional and should be performed by 50% of attendees.

Cutadapt with single reads


  1. Create a new history Cutapdapt (wheelCreate New)
  2. Copy the fastq files from the RNAseq data library to this new history (wheelCopy datasets)
  3. Select the Cutadapt tool
  4. Start with selecting Single-end in the Single-end or Paired-end reads? menu
  5. Select the multiple datasets button for this menu
  6. Cmd-Click for discontinuous multiple selection of single fastq.gz files (3 datasets)
  7. Filter Options
    • Minimum length: 20
  8. Read Modification Options
    • Quality cutoff: 20
  9. Output Options
    • Report: Yes
  10. Do not change the other available parameters and click Execute

Cutadapt with paired-end reads


Repeat the same procedure as above, except that you select Paired-endin step 4: Re-Run the tool using the re-run button on one Cutadapt instance and just select Paired-end instead of Single-end

  • Then you have two input boxes, one for file #1 and one for file #2.

  • In the box file #1 click the multiple datasets button and carefully Select the fastq.gz files with the _1 suffix

  • In the box file #2 click the multiple datasets button and carefully Select the fastq.gz files with the _2 suffix

  • Do not change the other parameters (they are set to the same value as previously because you used the re-run button).

  • Click the Execute button


Run MultiQC on Cutadapt jobs


  1. Select MultiQC tool
  2. Select Cutadapt/Trim Galore! in the menu Which tool was used generate logs?
  3. Cmd-Select the Report datasets generated by Cutadapt
  4. Press Execute
  5. Now, the boring but essential job: Rename carefully the Output datasets generated by Cutadapt. To do so, help yourself to the Info button at the bottom of dataset green boxes.

    Example: Rename Cutadapt on data 10 and data 9: Read 2 Output in GSM461181_2_treat_paired.fastq.gz

  6. Trash the 11 unfiltered/trimmed fastq.gz files. This is important to avoid mixing filtered and non filtered datasets in the next steps.