Histories for Use Cases 2-1, 2-2
Now that are more familiar with manipulations in Galaxy with the Use Cases 1-1 to 1-4, we will describe the other Use Case analyses more concisely. If you experience lack of skills in basic Galaxy operations (tool usage, copy of datasets, etc), do not hesitate to go back and examine the previous chapters step by step.
Discovery of novel viruses
Here we are going to use Metavisitor workflows to discover new viruses infecting a laboratory colony of Anopheles coluzzii mosquitoes.
-
Workflow for Use Case 2-1:
Takes reads from EBI SRA ERP012577 (small RNAs), assembles contigs, blastx them against vir2, selects contigs hitting Dicistroviridae proteins, re-assembles selected contigs to align them to the Drosophila C virus (DCV) genome and integrates them to its sequence
-
Worflow for Use Case 2-2:
Takes reads from EBI SRA ERS977505 (mRNA), assembles contigs, and blastx them against vir2
Input data for Use Cases 2-1 and 2-2
As for the previous Use Case 1, the first step is to collect all input data in an history
that we will name Input data for Use Cases 2-1 and 2-2
- Create a new history
- Rename this history
Input data for Use Cases 2-1 and 2-2
- For the small RNA sequence datasets (ERP012577) in this study, we are going to use
another tool to upload to the Galaxy Metavisitor server: the
EBI SRA ENA SRA
tool which in the "Get data" section of the left tool bar.- Click on this tool and enter ERP012577 in the search field that shows up in the
European Nucleotide Archive web page, and search. Click on the
ERP012577
link. In the column "Submitted files (galaxy)" of the table, click on the first "fastq file 1". This action should send you back to your Galaxy page automatically and you see the fastq dataset loading (yellow dataset in the history bar). - Repeat the exact same operation, for the three other "fastq file 1".
- At final you should have uploaded four fastq datasets corresponding to the sequencing runs "post_infected_rep1.fastq", "post_infected_rep2.fastq", "post_non-infected_rep1.fastq" and "post_non-infected_rep2.fastq"
- Select a dataset to make sure its datatype is
fastqsanger.gz
. If not click on the crayon button of any one of these four datasets and select theDatatypes
tab and set it tofastqsanger.gz
. - Repeat this operation for the other 3 datasets.
- Click on this tool and enter ERP012577 in the search field that shows up in the
European Nucleotide Archive web page, and search. Click on the
- Create a dataset collection as previously explained
(step 9) and name it
Small RNA reads ERP012577
- For the RNA sequence datasets (ERS977505) that will be used in Use Case 2-2, use again
the
EBI SRA ENA SRA
tool which in the "Get data" section of the left tool bar.- Click on this tool and enter ERS977505 in the search field that shows up in the
European Nucleotide Archive web page, and search. Click on the
ERS977505
link (Sample 1 result found). In the column "Submitted files (galaxy)" of the table, click on the first "fastq file 1". This action should send you back to your Galaxy page automatically and you see the fastq dataset loading (yellow dataset in the history bar). - Repeat the exact same operation for the other "fastq file 1" and the two other "fastq file 2"
- In the end you should have uploaded four additional fastq datasets corresponding to the sequencing runs "IP-isoT-1_AGTCAA_L001_R_1.fastq", "IP-isoT-1_AGTCAA_L001_R_2.fastq", "IP-isoT-2_ATGTCA_L002_R_1.fastq" and "IP-isoT-2_ATGTCA_L002_R_2.fastq"
- Click on this tool and enter ERS977505 in the search field that shows up in the
European Nucleotide Archive web page, and search. Click on the
- Create a dataset collection as explained in the previous chapter and name it
long read RNAseq datasets
. Note that we are not handling the files as paired-ends, thus use theBuild dataset list
command and not theBuild list of dataset pairs
command. - Use the
Retrieve FASTA from NCBI
, pastephix174[title]
in the "Query to NCBI in entrez format" field and selectNucleotide
for the NCBI database. This will upload 154 fasta sequences from phix174. - Use the wheel icon at the top of the history bar to copy
nucleotide vir2 blast database
,protein vir2 blast database
andP. berghei
from the historyReferences
to the current historyInput data for Use Cases 2-1 and 2-2
. If you don't remember well how to copy datasets between histories, you may read again the explanation here (step 3)
Your are now ready for generating Uses Cases 2-1 and 2-2
History for Use Case 2-1
- Stay in the current history
Input data for Use Cases 2-1 and 2-2
! - In the
Workflow
menu, select the workflowMetavisitor: Workflow for Use Case 2-1
and directly selectRun
(you may also look at the workflow using theedit
option). - Be careful at selecting
Small RNA reads ERP012577
in step 1 (Input Dataset Collection). - Be careful in selecting
P. berghei
in step 2. - Be careful in step 3 select :
Retrieve FASTA from NCBI (Nucleotide) with queryString 'phix174[title]'
- In step 4, the option
protein vir2 blast database
is forced, because the workflow is expecting of protein blast database in this step and only one dataset with this datatype is available in the history - Click the
Send results to a new history
checkbox and rename the history to "History for Use Case 2-1". - Run Workflow !
You may follow the link to the new history when the workflow has started.
History for Use Case 2-2
- If you are not already in, go back to the history
Input data for Use Cases 2-1 and 2-2
- In the
Workflow
menu, select the workflowMetavisitor: Workflow for Use Case 2-2
and directly selectRun
(you may also look at the workflow using theedit
option) - Be careful at selecting
long read RNAseq datasets
in step 1 (Input Dataset Collection) - In step 2, the option
protein vir2 blast database
is forced, because the workflow is expecting of protein blast database in this step and only one dataset with this datatype is available in the history - Click the
Send results to a new history
checkbox and rename the history to "History for Use Case 2-2". - Run Workflow.
Re-mapping of the small RNA reads (ERP012577) to the AnCV genome (KU169878).
The previous Workflow for Use Case 2-2 allowed to assemble a large contig of 8919 nt which
significantly matched structural and non-structural polyproteins of Drosophila C Virus and
Cricket Paralysis Virus in blastx alignments (see the dataset
blastx Filter sequences by length on data 17 vs 'protein BLAST database from data 2'
of the history). This large contig corresponds to the genome of a new Anopheles C Virus
deposited to the NCBI nucleotide database under accession number KU169878 (see the
companion Metavisitor article and
Carissimo et al).
Here, we are going to perform manually a few steps, before using another workflow in the history 2-2 to remap the ERP012577 small RNA reads to the AnCV genome.
- Look at the
blast analysis, by subjects
dataset and copy the name of the 8919 nt contig that aligned to DCV and CrPV sequences. It is noteworthy that the names may vary from one Oases run to another because the Oases algorithm is not totally deterministic. In the companion Metavisitor article, this name was Locus_69_Transcript_1/1_Confidence_0.000_Length_8919. - Copy this name, find the tool
Pick Fasta sequences with header satisfying a query string
in the Galaxy tool bar, and paste the name in the fieldSelect sequences with this string in their header
of the tool form. Select the datasetOases_optimiser on data 21: Denovo assembled transcripts
as a source file, and run the tool. -
Now, we are going to change the header of the previously extracted fasta sequences using the tool
Regex Find And Replace
.- Select the previous dataset
Concatenated datasets
as input dataset for this tool. Click on+ Insert Check
. Use>.*Confidence(.*)_Length_8919
(or the equivalent you extracted) as Find Regex and>Anopheles_C_Virus|KU169878_confidence\1
as Replacement.
- Select the previous dataset
-
Copy the dataset collection
Small RNA reads ERP012577
from the historyInput data for Use Cases 2-1 and 2-2
into the current historyUse Case 2-2
. You may have to refresh the history bar to see this collection and the attached datasets popping up.
We are now ready to run the workflow.
- In the workflow menu, pick up the workflow
Metavisitor: Workflow for remapping in Use Cases 2-1,2
and select therun
option. - In the workflow form, ensure that
Small RNA reads ERP012577
are selected in step 1 andRegex Find And Replace on data 28
is selected in step 2 (this should be the case if you followed the instructions). - This time, do not check the box
Send results to a new history
and directly click theRun workflow
button.
This workflow will provide you with a graphical view of ERP012577 small RNA mapping to the AnCV genome.