Skip to content

Cluster Annotation

We start to have a more precise idea of the identity of our clusters! We are going to use Seurat's tutorial which gives us directly the markers of the cell populations that allow us to identify the clusters. This will allow us to confirm or not the different results obtained previously and potentially help us to differentiate certain clusters between them:

kable(data.frame(Marker = c("IL7R, CCR7", "CD14, LYZ", "IL7R, S100A4", "MS4A1", "CD8A", "FCGR3A, MS4A7", "GNLY, NKG7", "FCER1A, CST3", "PPBP"),
                 CellType = c("Naive CD4+ T", "CD14+ Mono", "Memory CD4+", "B cells", "CD8+ T", "FCGR3A+ Mono", "NK", "DC", "Platelet"),
                 Cluster = 0:8), caption = "Identification of clusters based on specific cell types")
Marker CellType Cluster
IL7R, CCR7 Naive CD4+ T 0
CD14, LYZ CD14+ Mono 1
IL7R, S100A4 Memory CD4+ 2
MS4A1 B cells 3
CD8A CD8+ T 4
FCGR3A, MS4A7 FCGR3A+ Mono 5
GNLY, NKG7 NK 6
FCER1A, CST3 DC 7
PPBP Platelet 8

Identification of clusters based on specific cell types

## Retrieve specific markers based on their gene name
markers_pop <- subset(annotated_hg19,
                      subset = external_gene_name %in%
                        c("CCR7",                                #Naive CD4+ T
                          "CD14",                                #CD14+ Mono
                          "S100A4",                              #Memory CD4+
                          "MS4A1",                               #B
                          "CD8A",                                #CD8+ T
                          "FCGR3A",                              #FCGR3A+ Mono
                          "NKG7",                                #NK
                          "CST3",                                #DC
                          "PPBP"))                               #Platelet

## Generate violin plots and stock them into variable
vln_plots <- VlnPlot(pbmc_small,                                 #SeuratObject
                     features = markers_pop$ensembl_gene_id,     #Vector of genes to plot
                     combine = FALSE)                            #Return list of plot objets instead of a combined plot (easier to process)

## Add gene name as title
vln_plots <- lapply(vln_plots,                                   #List of plots
                    add_title_gene_name,                         #Function to apply to the list
                    gene_format = annotated_hg19)                #Fill in the dataframe parameter

## Remove unecessary legend
vln_plots <- lapply(vln_plots, function(plot){
  plot <- plot + theme(legend.position = "none")
  return(plot)
})

## Plot list of plots
grid.arrange(grobs = vln_plots)

So we have managed to identify a large part of the clusters with the enrichment analyses. Now that we know which cluster is which cell type we will rename the cell identities in the Seurat object.

To do this we will use the RenameIdents function where it is provided with a mapping from the old active identities (cluster number: 0, 1, 2,...) to the new identities (names of the different cell types) via a named vector.

## Vector of new cluster labels         #Correspond to cluster :
new_cluster_ids <- c("Naive CD4+ T",    #0
                     "CD14+ Mono",      #1
                     "Memory CD4+",     #2
                     "B",               #3
                     "CD8+ T",          #4
                     "FCGR3A+ Mono",    #5
                     "NK",              #6
                     "DC",              #7
                     "Platelet")        #8

## Create a named vector with the actual cell identifiers
names(new_cluster_ids) <- levels(pbmc_small)

## Renamed cell identities in the Seurat Object
pbmc_small <- RenameIdents(pbmc_small, new_cluster_ids)

## Plot
UMAPPlot(pbmc_small,
         label = TRUE,
         pt.size = 0.5)