Filtering datasets to remove or trim low quality sequences¶

This step is optional and should be performed by 50% of attendees.¶

Cutadapt with single reads ¶

Create a new history Cutapdapt (wheel → Create New)
Copy the fastq files from the RNAseq data library to this new history (wheel → Copy datasets)
Select the Cutadapt tool
Start with selecting Single-end in the Single-end or Paired-end reads? menu
Select the multiple datasets button for this menu
Cmd-Click for discontinuous multiple selection of single fastq.gz files (3 datasets)
Filter Options
- Minimum length: 20
Read Modification Options
- Quality cutoff: 20
Output Options
- Report: Yes
Do not change the other available parameters and click Execute

Cutadapt with paired-end reads ¶

Repeat the same procedure as above, except that you select Paired-endin step 4: Re-Run the tool using the re-run button on one Cutadapt instance and just select Paired-end instead of Single-end

Then you have two input boxes, one for file #1 and one for file #2.
In the box file #1 click the multiple datasets button and carefully Select the fastq.gz files with the _1 suffix
In the box file #2 click the multiple datasets button and carefully Select the fastq.gz files with the _2 suffix
Do not change the other parameters (they are set to the same value as previously because you used the re-run button).
Click the Execute button

Run MultiQC on Cutadapt jobs ¶

Select MultiQC tool
Select Cutadapt/Trim Galore! in the menu Which tool was used generate logs?
Cmd-Select the Report datasets generated by Cutadapt
Press Execute
Now, the boring but essential job: Rename carefully the Output datasets generated by Cutadapt. To do so, help yourself to the Info button at the bottom of dataset green boxes.

Example: Rename Cutadapt on data 10 and data 9: Read 2 Output in GSM461181_2_treat_paired.fastq.gz
Trash the 11 unfiltered/trimmed fastq.gz files. This is important to avoid mixing filtered and non filtered datasets in the next steps.