exonerate_to_gff
Command exonerate_to_gff converts exonerate output to gff.
usage
$ ingenannot -v 2 exonerate_to_gff > match_prot.gff
positional arguments:
Gff_genes |
Gene Annotation file in GFF/GTF file format |
optional arguments:
-h, –help |
show this help message and exit |
-m {prot,nuc}, –mode {prot,nuc} |
Mode: [prot, nuc], default=prot |
-p PREFIX, –prefix PREFIX |
Add a prefix to the feature name, usefull if you ran exonerate in a split mode |
inputs
Output of exonerate, ran with p2g model and other options such:
exonerate --model p2g --showvulgar no --showalignment no --showquerygff no --showtargetgff yes --percent 80 --ryo "AveragePercentIdentity: %pi " protein_db.pep target_genome.fasta
expected “compliant” output of exonerate:
Command line: [exonerate --model protein2genome --showvulgar no --showalignment no --showtargetgff yes --showquerygff no --minintron 4 --maxintron 5000 --percent 50 --ryo AveragePercentIdentity: %pi\n ../data/UNIPROTKB_no_zymo/UNIPROTKB.Dothideomycetes.15072020.NoZymo.fasta_chunk_0000000 /work/nlapalu/gmove/data/chr/chr_1.fasta]
Hostname: [node021]
# --- START OF GFF DUMP ---
#
#
##gff-version 2
##source-version exonerate:protein2genome:local 2.2.0
##date 2020-08-17
##type DNA
#
#
# seqname source feature start end score strand frame attributes
#
chr_1 exonerate:protein2genome:local gene 3034345 3034872 685 - . gene_id 1 ; sequence tr|W6YK43|W6YK43_COCCA ; gene_orientation .
chr_1 exonerate:protein2genome:local cds 3034345 3034872 . - .
chr_1 exonerate:protein2genome:local exon 3034345 3034872 . - . insertions 0 ; deletions 0
chr_1 exonerate:protein2genome:local similarity 3034345 3034872 685 - . alignment_id 1 ; Query tr|W6YK43|W6YK43_COCCA ; Align 3034873 8 528
# --- END OF GFF DUMP ---
#
AveragePercentIdentity: 71.59
# --- START OF GFF DUMP ---
#
#
##gff-version 2
##source-version exonerate:protein2genome:local 2.2.0
##date 2020-08-17
##type DNA
#
#
# seqname source feature start end score strand frame attributes
#
chr_1 exonerate:protein2genome:local gene 3357525 3358309 868 - . gene_id 1 ; sequence tr|W6YD21|W6YD21_COCCA ; gene_orientation +
chr_1 exonerate:protein2genome:local cds 3358268 3358309 . - .
chr_1 exonerate:protein2genome:local exon 3358268 3358309 . - . insertions 0 ; deletions 0
chr_1 exonerate:protein2genome:local splice5 3358266 3358267 . - . intron_id 1 ; splice_site "GT"
chr_1 exonerate:protein2genome:local intron 3358206 3358267 . - . intron_id 1
chr_1 exonerate:protein2genome:local splice3 3358206 3358207 . - . intron_id 0 ; splice_site "AG"
chr_1 exonerate:protein2genome:local cds 3357525 3358205 . - .
chr_1 exonerate:protein2genome:local exon 3357525 3358205 . - . insertions 9 ; deletions 1
chr_1 exonerate:protein2genome:local similarity 3357525 3358309 868 - . alignment_id 1 ; Query tr|W6YD21|W6YD21_COCCA ; Align 3358310 1 42 ; Align 3358206 15 456 ; Align 3357741 167 114 ; Align 3357627 206 102
# --- END OF GFF DUMP ---
#
AveragePercentIdentity: 72.69
outputs
Output on stdout:
chr_1 exonerate_to_gff match 3034345 3034872 71.59 - . ID=match.1;Dbxref=exonerate:0;Name=tr|W6YK43|W6YK43_COCCA
chr_1 exonerate_to_gff match_part 3034345 3034872 71.59 - . ID=match.1.0;Parent=match.1;Dbxref=exonerate:tr|W6YK43|W6YK43_COCCA;Target=tr|W6YK43|W6YK43_COCCA 8 184
chr_1 exonerate_to_gff match 3357525 3358309 72.69 - . ID=match.2;Dbxref=exonerate:1;Name=tr|W6YD21|W6YD21_COCCA
chr_1 exonerate_to_gff match_part 3357525 3358205 72.69 - . ID=match.2.1;Parent=match.2;Dbxref=exonerate:tr|W6YD21|W6YD21_COCCA;Target=tr|W6YD21|W6YD21_COCCA 15 167
chr_1 exonerate_to_gff match_part 3358268 3358309 72.69 - . ID=match.2.0;Parent=match.2;Dbxref=exonerate:tr|W6YD21|W6YD21_COCCA;Target=tr|W6YD21|W6YD21_COCCA 1 15