rescue_effectors

Command rescue_effector predicts potential missed effector genes.

Analysis of non-used assembled transcripts (as a transribed gene) to find new potential effector genes. This tool uses the effector_predictor module to compute the probability of a protein to be annotated as an effector. Fungal effector genes could be difficult to be predicted or annotated from evidence sources due to their short length and mono-exonic structure. rescue_effector searches for unannotated transcripts and tests their potential as effector genes. To avoid false positives, the length of the protein and the associated mRNAs must have a ratio close to 0.2 (max). The process clusterizes overlapping/colinear transcripts, which could therefore prevent the prediction of colinear effectors (only one gene is predicted per cluster).

usage

$ ingenannot -v 2 rescue_effector genes.gff transcripts.gff genome.fasta

positional arguments:

Gff_genes

Gene Annotation file in GFF/GTF file format

Gff_transcripts

Gff file of transcript evidence, compressed with bgzip and indexed with tabix

Genome

Genome in fasta format

optional arguments:

-h, –help

show this help message and exit

–signalp SIGNALP

Path to signalp, default=/usr/local/bin/signalp (from system lookup)

–tmhmm TMHMM

Path to tmhmm, default=/usr/local/bin/tmhmm-2.0c/bin/tmhmm (from system lookup)

–targetp TARGETP

Path to targetp, default=/usr/local/bin/targetp (from system lookup)

–effectorp EFFECTORP

Path to signalp, default=None (from system lookup)

–signalp_cpos SIGNALP_CPOS

Maximal position of signal peptide cleavage site, default=25

–effectorp_score EFFECTORP_SCORE

Minimal effectorp score, default=0.7

–max_len MAX_LEN

Maximal length of protein in aa, default=300

–min_len MIN_LEN

Minimal length of protein in aa, default=30

–min_intergenic_len MIN_INTERGENIC_LEN

Minimal intergenic length to consider, default=100

–size_ratio SIZE_RATIO

Minimal ratio length of CDS/mRNA, default=0.2

–unstranded

Allow analysis of unstranded transcripts, default=False, only stranded transcripts are considered

–nested

Consider nested proteins, not only first start, default=False

-o OUTPUT, –output OUTPUT

Output Annotation file in GFF file format, default=effectors.gff3

inputs

Gene Annotation file in GFF/GTF file format, Gff file of transcript evidence, compressed with bgzip and indexed with tabix, Genome in fasta format

outputs

Output Gff with effectors:

# gff file
chr_1       ingenannot-effector-rescue      gene    847863  848039  .       -       .       ID=gene:effector_1;
chr_1       ingenannot-effector-rescue      mRNA    847863  848039  .       -       .       gene_id=MSTRG.188;transcript_id=MSTRG.188.2;signalp=Y;signalp_pos=21;effectorp_score=0.831;tmhmm=0;targetp=S;len_aa=58;ID=mRNA::effector_1;Parent=gene:effector_1;
chr_1       ingenannot-effector-rescue      exon    847863  848039  .       -       .       ID=exon:effector_1_1;Parent=mRNA::effector_1
chr_1       ingenannot-effector-rescue      CDS     847863  848039  .       -       0       ID=cds:effector_1;Parent=mRNA::effector_1
chr_1       ingenannot-effector-rescue      gene    2513243 2513563 .       -       .       ID=gene:effector_2;
chr_1       ingenannot-effector-rescue      mRNA    2513243 2513563 .       -       .       gene_id=MSTRG.666;transcript_id=MSTRG.666.1;signalp=Y;signalp_pos=23;effectorp_score=0.857;tmhmm=0;targetp=S;len_aa=58;ID=mRNA::effector_2;Parent=gene:effector_2;
chr_1       ingenannot-effector-rescue      exon    2513243 2513285 .       -       .       ID=exon:effector_2_1;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      exon    2513373 2513440 .       -       .       ID=exon:effector_2_2;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      exon    2513498 2513563 .       -       .       ID=exon:effector_2_3;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      CDS     2513243 2513285 .       -       1       ID=cds:effector_2;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      CDS     2513373 2513440 .       -       0       ID=cds:effector_2;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      CDS     2513498 2513563 .       -       0       ID=cds:effector_2;Parent=mRNA::effector_2
chr_1       ingenannot-effector-rescue      gene    2591231 2591416 .       +       .       ID=gene:effector_3;
chr_1       ingenannot-effector-rescue      mRNA    2591231 2591416 .       +       .       gene_id=MSTRG.690;transcript_id=MSTRG.690.1;signalp=Y;signalp_pos=19;effectorp_score=0.891;tmhmm=0;targetp=S;len_aa=61;ID=mRNA::effector_3;Parent=gene:effector_3;
chr_1       ingenannot-effector-rescue      exon    2591231 2591416 .       +       .       ID=exon:effector_3_1;Parent=mRNA::effector_3
chr_1       ingenannot-effector-rescue      CDS     2591231 2591416 .       +       0       ID=cds:effector_3;Parent=mRNA::effector_3