Reformat VCF files for visualization in the UCSC genome browser
For visualisation in the UCSC genome browser, lines in the vcf file have to be sorted first in the order of chromosomes and secondly in the order of coordinates.
In the following steps we are going to reorganising the lumpy vcf output to follow these rules.
In addition, since we are going to focus on translocations, we will filter out the other variations we are not interested in.
1. Save the headers of the vcf files using the tool:¶
Select lines that match an expression (Galaxy Version 1.0.1)
- Select lines from:
9: Variant Lumpy Calling
(be careful to toggle theDataset collection
mode) - that:
matching
- the pattern:
^#
2. Save the rest of the vcf files in another Dataset Collection using the same tool:¶
Select lines that match an expression (Galaxy Version 1.0.1)
- Select lines from:
9: Variant Lumpy Calling
(be careful to toggle theDataset collection
mode) - that:
matching
- the pattern:
SVTYPE=BND
Tip
Just re-play the previous tool run by clicking the circular arrow at the bottom of the dataset box,
just changing the pattern from ^#
to SVTYPE=BND
This step allows to kill two birds with the same stone (tool):
- Selecting non-header part of the vcf
- filtering out the variations that are not of type
BND
(Bondaries)
3. Reorder the vcf lines describing the genetic variations using the tool:¶
Sort data in ascending or descending order (Galaxy Version 1.1.1)
- Sort Query**:
16: Select on collection 9
(be careful to toggle theDataset collection
mode) - Number of header lines:
0
- 1: Column selections
- on column:
1
- in:
Ascending order
- Flavor:
Natural/Version sort (-V)
- on column:
- 2: Column selections
- on column:
2
- in:
Ascending order
- Flavor:
Fast numeric sort (-n)
- on column:
- Output unique values:
No
- Ignore case:
No
4. Reassemble the saved headers with the sorted/filtered vcf parts using¶
Concatenate multiple datasets tail-to-head by specifying how (Galaxy Version 1.4.1)
Pay extra attention to the name and the version of the tool, because there is a number of concatenation tools with the same name
- What type of data do you wish to concatenate?:
2 Collections
- Depending on the type of input selected the concatenation options will differ
- Input first collection:
21: Select on collection 12
- Input second collection:
27: Sort on collection 24
- Input first collection:
- Include dataset names?:
No
- Number of lines to skip at the beginning of each concatenation:
0
5. Rename the last generated collection from Concatenation by pairs
→¶
→
Do not forget to press the Enter
Key !!!