curation
Command curation generates a ranking for manual curation based on AED scores and associated penalties.
usage
$ ingenannot -v 2 curation genes.gff genes.manual.curation.gff
positional arguments:
Input |
GFF File with AED tags |
Output |
GFF File with new curation tag |
optional arguments:
-h, –help |
show this help message and exit |
–graphout GRAPHOUT |
output filename of the graph, default=curation.png |
–graphtitle GRAPHTITLE |
output title of the graph, default=AED categories for manual curation |
inputs
Gff_genes in GFF/GTF format with AED scores added with ingenannot aed.
outputs
We defined 7 categories for transcript confidence based on protein and transcriptomic evidence such as:
cat1: high confidence
cat2: good confidence
cat3: good confidence, supported by one evidence type
cat4: moderate confidence
cat5: high to moderate confidence, with penalty on structure
cat6: bad confidence
cat7: no support, only ab-initio prediction
We expect:
cat1: gene structures validated by very reliable protein and transcript support
cat2: gene structures validated by protein and transcript support, with a lower score for one or other of this evidence
cat3: gene structures validated mainly by one evidence type (protein or transcriptomics evidence). Could contain false coding gene structures (only transcriptomics data) or annotation error from protein databank (only protein structure)
cat4: gene structures with weak evidence support
cat5: gene structures difficult to define. Penalty on junctions not well supported.
cat6: gene structurs with very weak evidence support
cat7: gene structures inferred by ab-initio methods
The outputs are 1) the gff file annotated with the “curation” category for each transcript and 2) a graphical representation of the AED scores for all transcripts with color of the associated curation category; a AED plot.
# input gff:
chr_1 ingenannot gene 109690 112065 . - . ID=ZtIPO323_000010;Name=ZtIPO323_000010;locus_tag=ZtIPO323_000010;
chr_1 ingenannot mRNA 109690 112065 . - . ID=ZtIPO323_000010.1;Name=ZtIPO323_000010.1;Parent=ZtIPO323_000010;ev_tr=SCA3419A90.2.1_357-2395;aed_ev_tr=0.2359;ev_tr_penalty=no;ev_pr=None;aed_ev_pr=1.0000;ev_lg=PB.1.1;aed_ev_lg=0.2734;ev_lg_penalty=no;utr_refine_evidence=PB.1.1;product=uncharacterized protein MYCGRDRAFT_88584;Dbxref=InterPro:-,MobiDBLite:mobidb-lite;locus_tag=ZtIPO323_000010;
chr_1 ingenannot exon 109690 112065 . - . ID=exon:ZtIPO323_000010.1;Parent=ZtIPO323_000010.1;locus_tag=ZtIPO323_000010;
chr_1 ingenannot CDS 110070 111146 . - 0 ID=cds:ZtIPO323_000010.1;Parent=ZtIPO323_000010.1;locus_tag=ZtIPO323_000010;
chr_1 ingenannot five_prime_UTR 111147 112065 . - . ID=five_prime_UTR_ZtIPO323_000010.1_001;Parent=ZtIPO323_000010.1;locus_tag=ZtIPO323_000010;
chr_1 ingenannot three_prime_UTR 109690 110069 . - . ID=three_prime_UTR_ZtIPO323_000010.1_001;Parent=ZtIPO323_000010.1;locus_tag=ZtIPO323_000010;
chr_1 ingenannot gene 112203 116391 . + . ID=ZtIPO323_000020;Name=ZtIPO323_000020;locus_tag=ZtIPO323_000020;
chr_1 ingenannot mRNA 112203 116391 . + . ID=ZtIPO323_000020.1;Name=ZtIPO323_000020.1;Parent=ZtIPO323_000020;ev_tr=SRR6215485.1.1;aed_ev_tr=0.0152;ev_tr_penalty=no;ev_pr=None;aed_ev_pr=1.0000;ev_lg=PB.2.2;aed_ev_lg=0.0266;ev_lg_penalty=no;utr_refine_evidence=PB.2.2;product=Structure-specific endonuclease subunit SLX4;Dbxref=InterPro:-,InterPro:IPR000637,InterPro:IPR017956,InterPro:IPR018574,MobiDBLite:mobidb-lite,Pfam:PF09494,ProSitePatterns:PS00354,SMART:SM00384;Ontology_term=GO:0003677,GO:0005634,GO:0006260,GO:0006281,GO:0006355,GO:0033557;locus_tag=ZtIPO323_000020;
chr_1 ingenannot exon 112203 116391 . + . ID=exon:ZtIPO323_000020.1;Parent=ZtIPO323_000020.1;locus_tag=ZtIPO323_000020;
chr_1 ingenannot CDS 112306 116271 . + 0 ID=cds:ZtIPO323_000020.1;Parent=ZtIPO323_000020.1;locus_tag=ZtIPO323_000020;
chr_1 ingenannot five_prime_UTR 112203 112305 . + . ID=five_prime_UTR_ZtIPO323_000020.1_001;Parent=ZtIPO323_000020.1;locus_tag=ZtIPO323_000020;
chr_1 ingenannot three_prime_UTR 116272 116391 . + . ID=three_prime_UTR_ZtIPO323_000020.1_001;Parent=ZtIPO323_000020.1;locus_tag=ZtIPO323_000020;
# output gff:
chr_1 ingenannot gene 109690 112065 . - . ID=ZtIPO323_000010;
chr_1 ingenannot mRNA 109690 112065 . - . ID=ZtIPO323_000010.1;Name=ZtIPO323_000010.1;Parent=ZtIPO323_000010;ev_tr=SCA3419A90.2.1_357-2395;aed_ev_tr=0.2359;ev_tr_penalty=no;ev_pr=None;aed_ev_pr=1.0000;ev_lg=PB.1.1;aed_ev_lg=0.2734;ev_lg_penalty=no;utr_refine_evidence=PB.1.1;product=uncharacterized protein MYCGRDRAFT_88584;Dbxref=InterPro:-,MobiDBLite:mobidb-lite;locus_tag=ZtIPO323_000010;curation=cat6;
chr_1 ingenannot exon 109690 112065 . - . ID=exon:ZtIPO323_000010.1;Parent=ZtIPO323_000010.1;
chr_1 ingenannot CDS 110070 111146 . - 0 ID=cds:ZtIPO323_000010.1;Parent=ZtIPO323_000010.1;
chr_1 ingenannot five_prime_UTR 111147 112065 . - . ID=five_prime_UTR_ZtIPO323_000010.1_001;Parent=ZtIPO323_000010.1;
chr_1 ingenannot three_prime_UTR 109690 110069 . - . ID=three_prime_UTR_ZtIPO323_000010.1_001;Parent=ZtIPO323_000010.1;
chr_1 ingenannot gene 112203 116391 . + . ID=ZtIPO323_000020;
chr_1 ingenannot mRNA 112203 116391 . + . ID=ZtIPO323_000020.1;Name=ZtIPO323_000020.1;Parent=ZtIPO323_000020;ev_tr=SRR6215485.1.1;aed_ev_tr=0.0152;ev_tr_penalty=no;ev_pr=None;aed_ev_pr=1.0000;ev_lg=PB.2.2;aed_ev_lg=0.0266;ev_lg_penalty=no;utr_refine_evidence=PB.2.2;product=Structure-specific endonuclease subunit SLX4;Dbxref=InterPro:-,InterPro:IPR000637,InterPro:IPR017956,InterPro:IPR018574,MobiDBLite:mobidb-lite,Pfam:PF09494,ProSitePatterns:PS00354,SMART:SM00384;Ontology_term=GO:0003677,GO:0005634,GO:0006260,GO:0006281,GO:0006355,GO:0033557;locus_tag=ZtIPO323_000020;curation=cat5;
chr_1 ingenannot exon 112203 116391 . + . ID=exon:ZtIPO323_000020.1;Parent=ZtIPO323_000020.1;
chr_1 ingenannot CDS 112306 116271 . + 0 ID=cds:ZtIPO323_000020.1;Parent=ZtIPO323_000020.1;
chr_1 ingenannot five_prime_UTR 112203 112305 . + . ID=five_prime_UTR_ZtIPO323_000020.1_001;Parent=ZtIPO323_000020.1;
chr_1 ingenannot three_prime_UTR 116272 116391 . + . ID=three_prime_UTR_ZtIPO323_000020.1_001;Parent=ZtIPO323_000020.1;
Output AED plot with curation colors:
