Filtering datasets to remove or trim low quality sequences
This step is optional and should be performed by 50% of attendees.
Cutadapt with single reads
- Create a new history
Cutapdapt
(wheel
-->Create New
) - Copy the fastq files from the RNAseq data library to this new history (
wheel
-->Copy datasets
) - Select the
Cutadapt
tool - Start with selecting
Single-end
in theSingle-end or Paired-end reads?
menu - Select the multiple datasets button for this menu
- Cmd-Click for discontinuous multiple selection of
single
fastq.gz files (3 datasets) Filter Options
Minimum length
: 20
Read Modification Options
Quality cutoff
: 20
Output Options
Report
: Yes
- Do not change the other available parameters and click
Execute
Cutadapt with paired-end reads
Repeat the same procedure as above, except that you select Paired-end
in step 4:
Re-Run the tool using the re-run button on one Cutadapt instance and just select Paired-end
instead of Single-end
-
Then you have two input boxes, one for file #1 and one for file #2.
-
In the box
file #1
click themultiple datasets
button and carefully Select the fastq.gz files with the_1
suffix -
In the box
file #2
click themultiple datasets
button and carefully Select the fastq.gz files with the_2
suffix -
Do not change the other parameters (they are set to the same value as previously because you used the re-run button).
-
Click the
Execute
button
Run MultiQC on Cutadapt jobs
- Select
MultiQC
tool - Select
Cutadapt/Trim Galore!
in the menuWhich tool was used generate logs?
- Cmd-Select the
Report
datasets generated by Cutadapt - Press
Execute
-
Now, the boring but essential job: Rename carefully the
Output
datasets generated by Cutadapt. To do so, help yourself to theInfo
button at the bottom of dataset green boxes.Example: Rename
Cutadapt on data 10 and data 9: Read 2 Output
inGSM461181_2_treat_paired.fastq.gz
-
Trash the 11 unfiltered/trimmed fastq.gz files. This is important to avoid mixing filtered and non filtered datasets in the next steps.