GO
Prepare the datasets for GOSeq¶
- Select
Compute an expression on every row
tool withAdd expression
: bool(c7<0.05)as a new column to
: the DESeq2 result file
Cut
tool withCut columns
: c1,c8Delimited by
: TabFrom
: the output of the Compute tool
Change Case
tool withFrom
: the output of the previous Cut toolChange case of columns
: c1Delimited by
: TabTo
: Upper case
This generates the first input for goseq. We need as second input for goseq, the gene lengths. We can use there the gene length generated by featureCounts tool and reformat it a bit.
- Copy one output of type
...: Feature lengths
of the 7 featureCounts runs in the historySTAR
/HISAT2
- Rename it
Lengths
Change Case
tool withFrom
: the feature lengths (output of featureCounts tool)Change case of columns
: c1Delimited by
: TabTo
: Upper case
We have now the two required input files for goseq.
Perform GO analysis¶
- Select
goseq
tool withDifferentially expressed genes file
: first file generated by Change Case tool on previous stepGene lengths file
: second file generated by Change Case tool on previous stepGene categories
: Get categoriesSelect a genome to use
: Fruit fly (dm6)Select Gene ID format
: Ensembl Gene IDSelect one or more categories
: GO: Cellular Component, GO: Biological Process, GO: Molecular Function
goseq generates a big table with the following columns for each GO term:
Column | Description |
---|---|
category | GO category |
over_rep_pval | p-value for over representation of the term in the differentially expressed genes |
under_rep_pval | p-value for under representation of the term in the differentially expressed genes |
numDEInCat | number of differentially expressed genes in this category |
numInCat | number of genes in this category |
term | detail of the term |
ontology | MF (Molecular Function - molecular activities of gene products), CC (Cellular Component - where gene products are active), BP (Biological Process - pathways and larger processes made up of the activities of multiple gene products) |
p.adjust.over_represented | p-value for over representation of the term in the differentially expressed genes, adjusted for multiple testing with the Benjamini-Hochberg procedure |
p.adjust.under_represented | p-value for over representation of the term in the differentially expressed genes, adjusted for multiple testing with the Benjamini-Hochberg procedure |
To identify categories significantly enriched/unenriched below some p-value cutoff, it is necessary to use the adjusted p-value.