Comparing two populations¶
One last thing we might ask is what makes two populations distinct.
This could be two conditions or even two clusters. This is possible
with the FindMarkers
function which works almost like FindAllMarkers
since it calls the former. Many of the parameters are equivalent,
however FindMarkers
allows you to perform a differential expression
analysis between two populations which are defined with the ident.1
and ident.2
parameters. By default, these are cell identities present
in the active.ident
but we can select another variable contained in
the metadata using the group.by
parameter. So if we want to study the
impact of gender in the cells of the platelet then we would run:
FindMarkers(objectName, ident.1 = "female", ident.2 = "male", group.by = "sex", subset.ident = "Platelet")
(if we had a "sex" column in the metadata slot). If we want to study
the impact of sex in all cells then we leave the subset.ident
parameter
as default (i.e. NULL
).
Here we will test the difference between our NK and CD8+ T clusters which
were very difficult to differentiate. The results are similar to
FindAllMarkers
with the difference that there is no gene
column
because as a differential analysis it will not be possible to have a
gene overexpressed in both NK and CD8+ T cells.
NK_CD8_diff_markers <- FindMarkers(pbmc_small,
ident.1 = "NK",
ident.2 = "CD8+ T")
## Merge markers results with biomart annotation
NK_CD8_diff_markers_annotated <- merge(x = NK_CD8_diff_markers, #First df to merge
y = annotated_hg19, #Second df to merge
by.x = 0, #Column name of first df used for matching lines, 0 for rownames
by.y = "ensembl_gene_id", #Column name of second df used for matching lines
all.x = TRUE) #Keep all lines from first df even if there is no match with second df
## Filter dataset based on Fold change and p-value adjusted
NK_CD8_diff_markers_annotated_signif <- subset(NK_CD8_diff_markers_annotated,
p_val_adj < 0.05 &
abs(avg_log2FC) >= 0.25) #Filter dataframe based on p_val_adj column
## Sorting results by average log2(Fold Change)
NK_CD8_diff_markers_annotated_signif <- NK_CD8_diff_markers_annotated_signif %>% #Rearrange df with dplyr package
arrange(desc(avg_log2FC)) #Sort lines by descending the column avg_log2FC and by group
## Most DE gene marker for each cluster
kable(NK_CD8_diff_markers_annotated_signif[(c(1:3, (nrow(NK_CD8_diff_markers_annotated_signif)-2):nrow(NK_CD8_diff_markers_annotated_signif))),])
Row.names | p_val | avg_log2FC | pct.1 | pct.2 | p_val_adj | external_gene_name | description | gene_biotype | chromosome_name | |
---|---|---|---|---|---|---|---|---|---|---|
1 | ENSG00000163453 | 0 | 7.673621 | 0.551 | 0.006 | 0.00e+00 | IGFBP7 | insulin-like growth factor binding protein 7 [Source:HGNC Symbol;Acc:5476] | protein_coding | 4 |
2 | ENSG00000135077 | 0 | 5.656534 | 0.314 | 0.009 | 0.00e+00 | HAVCR2 | hepatitis A virus cellular receptor 2 [Source:HGNC Symbol;Acc:18437] | protein_coding | 5 |
3 | ENSG00000175294 | 0 | 5.513915 | 0.109 | 0.000 | 2.03e-05 | CATSPER1 | cation channel, sperm associated 1 [Source:HGNC Symbol;Acc:17116] | protein_coding | 11 |
224 | ENSG00000167286 | 0 | -3.971611 | 0.083 | 0.885 | 0.00e+00 | CD3D | CD3d molecule, delta (CD3-TCR complex) [Source:HGNC Symbol;Acc:1673] | protein_coding | 11 |
225 | ENSG00000137078 | 0 | -4.379914 | 0.006 | 0.232 | 1.01e-05 | SIT1 | signaling threshold regulating transmembrane adaptor 1 [Source:HGNC Symbol;Acc:17710] | protein_coding | 9 |
226 | ENSG00000172116 | 0 | -5.098008 | 0.019 | 0.368 | 0.00e+00 | CD8B | CD8b molecule [Source:HGNC Symbol;Acc:1707] | protein_coding | 2 |
Here are the results for the three most over-expressed genes in NK cells
(avg_log2FC
positive) and the 3 most over-expressed genes in CD8+ T
cells (avg_log2FC
negative).
You can totally use the different methods of gene cluster analysis on this one.