Visualisation of differential expression¶
Now we would like to extract the most differentially expressed genes due to the treatment, and then visualize them using an heatmap of the normalized counts and also the z-score for each sample.
We will proceed in several steps:
- Extract the most differentially expressed genes using the DESeq2 summary file
- Extract the normalized counts for these genes for each sample, using the normalized count file generated by DESeq2
- Plot the heatmap of the normalized counts
- Compute the Z score of the normalized counts
- Plot the heatmap of the Z score of the genes
Extract the most differentially expressed genes¶
- Select the tool
Filter data on any column using simple expressions
to extract genes with a significant change in gene expression (adjusted p-value below 0.05) between treated and untreated samples:Filter
: the DESeq2 result fileWith following condition
: c7<0.05
The file with the independent filtered results can be used for further downstream analysis as it excludes genes with only few read counts as these genes will not be considered as significantly differentially expressed.
The generated file contains too many genes (632/STAR, ) to get a meaningful heatmap. Therefore, in the next step, we will take only the genes with an absolute fold change > 2 (log2(fold change) > 1)
- Select the tool
Filter data on any column using simple expressions
Filter
: the differentially expressed genes (output of previousFilter
tool)With following condition
: abs(c3)>1
We now have a table with 84/STAR, /HISAT2 lines corresponding to the most differentially expressed genes. And for each of the gene, we have its id, its mean normalized counts (averaged over all samples from both conditions), its log2FC and other information.
We could plot the log2FC for the different genes, but here we would like to look at a heatmap of expression for these genes in the different samples. So we need to extract the normalized counts for these genes.
We will join the normalized count table generated by DESeq2 with the table we just generated, to conserve only the lines corresponding to the most differentially expressed genes.
Extract the normalized counts of the most differentially expressed genes¶
-
Create a Pasted Entry from the header line of the Filter output:
- Copy the header of the final Filter output
- Using the Upload tool select Paste/Fetch data and paste the copied data
- Set the Type to tabular and select Start to upload a new Pasted Entry
- Concatenate datasets tool to add this header line to the Filter output:
- select the
Concatenate datasets tail-to-head
tool - select the Pasted entry dataset
+ Insert Dataset
- select the final
Filter output
- select the
This ensures that the table of most differentially expressed genes has a header line and can be used in the next step.
-
join the normalized count table generated by DESeq2 with the table we just generated, to conserve only the lines corresponding to the most differentially expressed genes
- select the
Join two Datasets side by side on a specified field
toolJoin
: the Normalized counts file (output of DESeq2 tool)using column
: Column: 1with
: most differentially expressed genes (output of the Concatenate tool tool)and column
: Column: 1Keep lines of first input that do not join with second input
: NoKeep the header lines
: Yes
- select the
The generated file has more columns than we need for the heatmap. In addition to the columns with mean normalized counts, there is the log2FC and other information. We need to remove the extra columns.
-
Cut tool to extract the columns with the gene ids and normalized counts:
- Select the
Cut columns from a table
toolCut columns
: c1-c8Delimited by
: TabFrom
: the joined dataset (output of Join two Datasets tool)
- Select the
We now have a table with 85 lines (the most differentially expressed genes) and the normalized counts for these genes in the 7 samples.
-
Plot the heatmap of the normalized counts of these genes for the samples
- Select the
heatmap2
tool to plot the heatmap:Input should have column headers
: the generated table (output of Cut tool)Data transformation
: Log2(value+1) transform my dataEnable data clustering
: YesLabeling columns and rows
: Label columns and not rowsColoring groups
: Blue to white to red
- Select the
You should obtain something similar to: