Skip to content

LOAD INPUT DATA

For the training, we need three types of datasets

  • The reference sequences that will be used to align sequencing reads (full genome, miRNA, transposons, etc.)
  • libraries of sequencing reads from small RNAs (for analysis of piRNAs)
  • Librairies of sequencing reads from mRNA (for Gene differential expression analysis)

All these data have been deposited in 2 differents repositories. A first one is a so-called S3 Amazon bucket. The second one is a Nextcloud server located at Sorbonne-UniversitΓ©. You may get your input data from one or the other repositories.

Get data "by URL"

We are going to focus on one method to upload data in galaxy, which is applicable when these data are available through a URL (Universal Resource Location).

The other methods to upload data in Galaxy are:
  • transfering data from your local machine (the one that is running your web browser) to Galaxy
  • uploading data to your Galaxy FTP account and then transfering these data from your Galaxy FTP directory to one of your Galaxy histories. We are not going to use them in this training, and invite you to look at one of the "Galaxy tours" available in the menu Help β–Ά Interactive tours

1. Single URL, simple trial.

  • Click the Upload Data button at the top-left corner of the Galaxy interface:

  • Stay with the regular tab and click the Paste/Fetch data button

  • Paste the following url in the open text field,
    https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=PlacW.fasta
    
  • Paste PlacW.fasta in the name text field (instead of New File)
  • Finally, press the dark-blue Start button.

a dataset should appear soon in your current history and turn green when the upload is complete.

2. Upload of reference files as a batch of multiple URLs βž• Programmatic file naming

Delete the previously uploaded dataset, we are going to re-upload it in a batch.

  • Click the Upload Data button at the top-left corner of the Galaxy interface.
  • This time, Click the Rule-basedtab !
  • Leave Upload data as Datasets and Load tabular data from Pasted Table
  • In the text field Tabular source data to extract collection files and metadata from, paste the following Tabular source data:

🍩, 🍨 and 🍬

Reference URLs for 🍬 team
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-r6.18.gtf   dmel-all-r6.18.gtf
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-miscRNA-r6.18.fasta miscRNA
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=PlacW.fasta  PlacW
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-ncRNA-r6.18.fasta   ncRNA
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-miRNA-r6.18.fasta   miRNA
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-intron-r6.18.fasta  introns
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-gene-r6.18.fasta    genes
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=Dmel_piRNA_clusters.fasta    piRNA_clusters
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=Dmel_all-transposon_merge.fasta  all-transposons
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-chromosome-r6.18.fasta  dmel-r6.18
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-transcript-r6.18.fasta  transcripts
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/B433xtdmdQqdFYd/download?path=%2F&files=dmel-all-tRNA-r6.18.fasta    tRNA
Reference URLs for 🍨 team
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/PlacW.fasta   PlacW
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-ncRNA-r6.18.fasta    ncRNA
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-miscRNA-r6.18.fasta  miscRNA
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-miRNA-r6.18.fasta    miRNA
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-intron-r6.18.fasta   introns
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-gene-r6.18.fasta genes
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-chromosome-r6.18.fasta   dmel-r6.18
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/Dmel_piRNA_clusters.fasta piRNA_clusters
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/Dmel_all-transposon_merge.fasta   transposons
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-r6.18.gtf    dmel-all-r6.18.gtf
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-transcript-r6.18.fasta   transcripts
https://analyse-genomes.s3.eu-west-3.amazonaws.com/References/dmel-all-tRNA-r6.18.fasta tRNA
Reference URLs for 🍩 team
https://storage.googleapis.com/analyse-genome-coupon-1/References/PlacW.fasta   PlacW
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-ncRNA-r6.18.fasta    ncRNA
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-miscRNA-r6.18.fasta  miscRNA
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-miRNA-r6.18.fasta    miRNA
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-intron-r6.18.fasta   introns
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-gene-r6.18.fasta genes
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-chromosome-r6.18.fasta   dmel-r6.18
https://storage.googleapis.com/analyse-genome-coupon-1/References/Dmel_piRNA_clusters.fasta piRNA_clusters
https://storage.googleapis.com/analyse-genome-coupon-1/References/Dmel_all-transposon_merge.fasta   transposons
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-r6.18.gtf    dmel-all-r6.18.gtf
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-transcript-r6.18.fasta   transcripts
https://storage.googleapis.com/analyse-genome-coupon-1/References/dmel-all-tRNA-r6.18.fasta tRNA
  • Click the Build button
  • In the Build Rules ... pannel that opened, click the and choose Add/Modify Column Definitions
  • Click a first time on Add Definition and Select URL. Leave the URL column to A
  • Click a second time on Add Definition, select Name and choose the column B for Name
  • Now, click the Apply button
  • And to finish the job, click on the dark-blue button Upload
  • After the upload is complete, rename the history "References"

πŸŽ‰ 🎊 🎈

3. Upload of small RNA sequencing datasets βž• Programmatic dataset naming.

Before all, create a new history by clicking the + icon in the history header and immediately renaming the new history as "Small RNA sequence datasets".

  • Click the Upload Data button at the top-left corner of the Galaxy interface.
  • Click the Rule-basedtab as we just did with the reference datasets
  • Leave Upload data as Datasets and Load tabular data from Pasted Table
  • In the text field Tabular source data to extract collection files and metadata from, paste the following Tabular source data:
small RNAseq datasets for 🍨
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=GRH-103_R1.fastq.gz  GRH-103
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=GRH-104_R1.fastq.gz  GRH-104
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=GRH-105_R1.fastq.gz  GRH-105
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=GRH-106_R1.fastq.gz  GRH-106

Or

small RNAseq datasets for 🍩
https://analyse-genomes.s3.eu-west-3.amazonaws.com/smRNAseq/GRH-103_R1.fastq.gz GRH-103
https://analyse-genomes.s3.eu-west-3.amazonaws.com/smRNAseq/GRH-104_R1.fastq.gz GRH-104
https://analyse-genomes.s3.eu-west-3.amazonaws.com/smRNAseq/GRH-105_R1.fastq.gz GRH-105
https://analyse-genomes.s3.eu-west-3.amazonaws.com/smRNAseq/GRH-106_R1.fastq.gz GRH-106
small RNAseq datasets for 🍬
https://storage.googleapis.com/analyse-genome-coupon-1/smRNAseq/GRH-103_R1.fastq.gz GRH-103
https://storage.googleapis.com/analyse-genome-coupon-1/smRNAseq/GRH-104_R1.fastq.gz GRH-104
https://storage.googleapis.com/analyse-genome-coupon-1/smRNAseq/GRH-105_R1.fastq.gz GRH-105
https://storage.googleapis.com/analyse-genome-coupon-1/smRNAseq/GRH-106_R1.fastq.gz GRH-106
  • Click the Build button
  • In the Build Rules ... pannel that opened, click the and choose Add/Modify Column Definitions
  • Click a first time on Add Definition and Select URL. Leave the URL column to A
  • Click a second time on Add Definition, select Name and choose the column B for Name
  • Now, click the Apply button
  • select the Type "fastqsanger.gz" at the bottom of the panel

  • And to finish the job, click on the dark-blue button Upload

    πŸŽ‰ 🎊 🎈 πŸŽ‰ 🎊 🎈

4. RNAseq datasets (for gene differential expression analysis)

  • Create a new history in Galaxy and rename it RNA sequence datasets
  • Click the Upload Data button at the top-left corner of the Galaxy interface.
  • Click the Rule-basedtab as we just did with the reference datasets
  • Leave Upload data as Datasets and Load tabular data from Pasted Table
  • In the text field Tabular source data to extract collection files and metadata from, paste the following Tabular source data:
RNAseq datasets for 🍬
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=WT1_R1.fastq.gz  WT1
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=WT2_R1.fastq.gz  WT2
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=WT3_R1.fastq.gz  WT3
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=SF1_R1.fastq.gz  SF1
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=SF2_R1.fastq.gz  SF2
https://usegalaxy.sorbonne-universite.fr/nextcloud/index.php/s/LqKb3Qmy8m9RXtk/download?path=%2F&files=SF3_R1.fastq.gz  SF3

Or

RNAseq datasets for 🍩
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/WT1_R1.fastq.gz   WT1
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/WT2_R1.fastq.gz   WT2
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/WT3_R1.fastq.gz   WT3
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/SF1_R1.fastq.gz   SF1
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/SF2_R1.fastq.gz   SF2
https://analyse-genomes.s3.eu-west-3.amazonaws.com/RNAseq/SF3_R1.fastq.gz   SF3

Or

RNAseq datasets for 🍨
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/WT1_R1.fastq.gz   WT1
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/WT2_R1.fastq.gz   WT2
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/WT3_R1.fastq.gz   WT3
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/SF1_R1.fastq.gz   SF1
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/SF2_R1.fastq.gz   SF2
https://storage.googleapis.com/analyse-genome-coupon-1/RNAseq/SF3_R1.fastq.gz   SF3
  • Click the Build button
  • In the Build Rules ... pannel that opened, click the and choose Add/Modify Column Definitions
  • Click a first time on Add Definition and Select URL. Leave the URL column to A
  • Click a second time on Add Definition, select Name and choose the column B for Name
  • Click the Apply button
  • select the Type "fastqsanger.gz" at the bottom of the panel

  • And to finish the job, click on the dark-blue button Upload

πŸŽ‰ 🎊 🎈 πŸŽ‰ 🎊 🎈 πŸŽ‰ 🎊 🎈 πŸŽ‰ 🎊 🎈

5. Uncompress datasets

At this stage, we have uploaded small RNA and RNA sequencing datasets as fastqsanger.gz. To simplify the subsequent analyzes we are going to uncompress all these datasets, whose datatype will therefore become fastqsanger.

Procedure for a single dataset
  1. Go to your small RNA input datasets history (or whatever you named it).
  2. Click on the pencil icon of the first dataset.
  3. Click on the tab Convert , NOT on the tab datatype .

    Why 'Convert file' is different from 'Change Datatype' ?
    • Let's imagine a Galaxy dataset whose name is Hamlet
    • the content of this dataset is:
      To be, or not to be, that is the question:
      
    • Would you agree that the datatype of this dataset is english? I think so.
    • Let's put it all together in the form of:
      @name: Hamlet
      @datatype: english
      @content:
      To be, or not to be, that is the question:
      

    Now, what if you change the Datatype of this dataset from english to french using the edit attribute panel? This →

    @name: Hamlet
    @datatype: french
    @content:
    To be, or not to be, that is the question:
    
    This does not seem correct ! Do you aggree ?

    If you Convert instead this dataset from english to french, you will have This →

    @name: Hamlet
    @datatype: french
    @content:
    Être ou ne pas Γͺtre, telle est la question
    
    It is looking better, isn't it ?

    In contrast, if your starting dataset was as this:

    @name: Hamlet
    @datatype: english
    @content:
    Être ou ne pas Γͺtre, telle est la question
    
    There, you would "just" change the Datatype of the dataset from english to french and get:
    @name: Hamlet
    @datatype: french
    @content:
    Être ou ne pas Γͺtre, telle est la question
    

  4. Select Convert compressed file to uncompressed

  5. Click on

A new dataset is created. During the decompression job, its name looks like 5: Convert compressed file to uncompressed. on data 1. But when the job finishes, the name of the dataset changes to more self-explanatory: 5: GRH-103 uncompressed.

Repeat the same procedure for every small RNAseq dataset.
Repeat the same procedure for every RNAseq dataset.

Naturally, you can launch as many jobs as you need in the same time

When all datasets are decompressed
  • Delete the compressed datasets (by clicking on the cross icon of datasets).
  • Rename the uncompressed datasets by removing the uncompressed suffix.
  • Purge the deleted datasets. This is done by clicking the wheel icon of the top history menu, and selecting Purge Deleted Datasets in the Datasets Actions section.

    • ⚠ If you do not perform this last action, the deleted datasets remain on your instance disk !

6. Dataset collections 🌌 πŸ‘½

If we have enough time, we are going to organize our various datasets using an additional structure layer: the Galaxy Collection.

A Galaxy Collection is a container object which is very convenient to treat together multiple equivalent datasets, such as a list of sequencing dataset, of text labels, of fasta sequences, etc.

For those of you who are a bit familiar with Python language, a Galaxy Collection is actually just a dictionary, whose keys are the names of the datasets in the collection (in Galaxy these names are referred to as element identifiers), and values are the paths to the corresponding datasets. Well, a dictionary as I said 😜

A. Making a collection of the small RNA sequence datasets.

For clarity, we are going first to copy the small RNA sequence dataset from their initial history to a new history.

  • Go to your small RNAseq sequence datasets.
  • Click on the wheel icon of the history top menu

  • Select the submenu Copy Datasets in the section Dataset Actions

  • In the pop-up panel, Source History:, check-in the 4 small RNA sequencing datasets
  • In the same pop-up panel, Destination History:, field New history named, write
    small RNAs in collection
    
  • Click the Copy History Items button.
  • Still on the same pop-up panel, at the top in a green area, you have now a πŸ”— to the new history that was created and where the datasets were copied. Click on that link !

    When you copy datasets in that way...

    The new datasets actually do not take any space on your disk. New symbolic links to the actual files are only created.

  • Now, that your are in the new history, click on the checkbox icon in the top area of the history.

  • Check-in the 4 small RNA datasets
  • In the menu Pour toute la sΓ©lection (also in the top area of the history), select Build Dataset List
  • In the pop-up panel, just write a meaningful name in the field Name, something like
    Small RNA collection
    
  • Press the button Create Collection
What do you see when you click on name of the new dataset collection? (please not the βœ–...)

You see the content of the collection, with datasets identified with names called `element_identifiers.

Click on the recycling icon , or, the < back to the Small RNA Collection link, to come back to the normal history view.

what do you see if you click the hidden hyperlink at the top right corner ?

You see the actual dataset contained in the Collection. If you click on unhide for each of these datasets, you will actually see both the container collection and the contained datasets !

B. Making 2 collections RNA sequence datasets.

For RNAseq datasets, collections are also very convenient. However, it is even better to anticipate the type of analysis that you are going to perform. Indeed, you are going to compare 3 "test" (mutant, treated, whatever...) datasets with 3 control datasets.

Therefore, we are going to organise the RNAseq datasets as 2 collections: a collection WT and a collection SF.

  • Go back to your RNAseq input datasets history
  • As before, copy the 6 RNAseq dataset to a new history which you will name RNAseq dataset Collections
  • This time, create first a collection by only checking the three datasets WT1, WT2 and WT3, which you will name:
    WT
    
  • Create also a second collection by only checking the three datasets SF1, SF2 and SF3, which you will name:
    SF
    

This is the end of this training session, you deserve β˜• or 🍺 !