isoform_ranking
Command isoform_ranking ranks isoforms based on RNA-Seq coverage from bam file.
usage
ingenannot -v 2 isoform_ranking transcripts.gff -f file.fof --alt_threshold 0.1 --rescue
positional arguments:
Gff_transcripts |
Gff file of transcripts |
optional arguments:
-h, –help |
show this help message and exit |
-p PREFIX, –prefix PREFIX |
Prefix for output annotation files in GFF file format, default=isoforms |
-b BAM, –bam BAM |
bam file to analyze |
–paired |
The bam file is paired or not, default=False |
–stranded |
The bam file is stranded or not, default=False |
-f FOF, –fof FOF |
File of bam files, <bam>TAB<type>TAB<stranded> |
–sj_threshold SJ_THRESHOLD |
threshold used as ratio of coverage to keep a junction for ranking, default=0.05 |
–cov_threshold COV_THRESHOLD |
threshold of the median use to excluded bases in coverage count , default=0.05 |
–alt_threshold ALT_THRESHOLD |
threshold of the isoform to keep it in the isoform.alternatives.gff, based on junction coverage , default=0.1 |
–rescue |
If set, in case of no transcript was selected due to unsupported junctions, keep at least one, based on the coverage, default=False |
–sj_full |
Junctions supported by only one side will be analyzed as shared junction, if set both sides need to overlap all transcript to be considered in ranking, default=False |
inputs
Gff_file in GFF/GTF format.
outputs
Several outputs are expected:
isoforms.ranking.gff: all selected transcripts with rank
isoforms.top.gff: top isoform for all selected transcripts
isoforms.alternatives.gff: top isoform with best alternatives isoforms
isoforms.unclassif.gff: removed isoforms (non-supported junction, below abundance threshold)
isoform_ranking groups together isoforms with the same structure with possible different UTRs. This implies that only one isoform of each structure is conserved in the alternatives isoform file. Let have a look at the example below:

We use 2 bam files to analyze the coverage of each isoform. We have 4 isoforms, among them 2 have the same structure but different UTRs (PB.112.2 and PB.112.3). These both isoforms correspond to the major isoform based on splicing coverage. In the isoforms.ranking.gff file, they will be rank on the top as the 2 most probable isoforms whatever the coordinates of UTRs. The second major structure isoform is the PB.112.4 with a smaller exon, then PB.112.1 with a longer second exon. So the expected rank in a such case is: PB.112.2 / PB.112.3, PB.112.4, PB.112.1. To discriminate the rank 1 and 2 for PB.112.2 and PB.112.3, a coverage analysis is performed to order the isoforms based on the suitablity to the median. At the end we obtain:

chr_1 ingenannot-isoform-ranking transcript 646708 648848 . - . gene_id "PB.112";transcript_id "PB.112.3";rank "1";
chr_1 ingenannot-isoform-ranking exon 646708 647301 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 648441 648848 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking transcript 646700 648827 . - . gene_id "PB.112";transcript_id "PB.112.2";rank "2";
chr_1 ingenannot-isoform-ranking exon 646700 647301 . - . gene_id "PB.112"; transcript_id "PB.112.2"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.2"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.2"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.2"
chr_1 ingenannot-isoform-ranking exon 648441 648827 . - . gene_id "PB.112"; transcript_id "PB.112.2"
chr_1 ingenannot-isoform-ranking transcript 646716 648836 . - . gene_id "PB.112";transcript_id "PB.112.4";rank "3";
chr_1 ingenannot-isoform-ranking exon 646716 647301 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647541 647573 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 648441 648836 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking transcript 646476 648836 . - . gene_id "PB.112";transcript_id "PB.112.1";rank "4";
chr_1 ingenannot-isoform-ranking exon 646476 647301 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647669 648299 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 648441 648836 . - . gene_id "PB.112"; transcript_id "PB.112.1"
The top isoform is PB.112.3, so the isoforms.top.gff only contains this transcript.

chr_1 ingenannot-isoform-ranking transcript 646708 648848 . - . gene_id "PB.112";transcript_id "PB.112.3";rank "1";
chr_1 ingenannot-isoform-ranking exon 646708 647301 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 648441 648848 . - . gene_id "PB.112"; transcript_id "PB.112.3"
The isoforms.alternatives.gff file contains one version of each selected structure, avoiding UTRs isoforms, providing a file more suitable for differential expression analysis or annotation of gene isoforms. In this case, the ranking is reodered to remove UTRs isoforms.

chr_1 ingenannot-isoform-ranking transcript 646708 648848 . - . gene_id "PB.112";transcript_id "PB.112.3";rank "1";
chr_1 ingenannot-isoform-ranking exon 646708 647301 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking exon 648441 648848 . - . gene_id "PB.112"; transcript_id "PB.112.3"
chr_1 ingenannot-isoform-ranking transcript 646716 648836 . - . gene_id "PB.112";transcript_id "PB.112.4";rank "2";
chr_1 ingenannot-isoform-ranking exon 646716 647301 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647541 647573 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 647669 648293 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking exon 648441 648836 . - . gene_id "PB.112"; transcript_id "PB.112.4"
chr_1 ingenannot-isoform-ranking transcript 646476 648836 . - . gene_id "PB.112";transcript_id "PB.112.1";rank "3";
chr_1 ingenannot-isoform-ranking exon 646476 647301 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647389 647472 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647529 647573 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 647669 648299 . - . gene_id "PB.112"; transcript_id "PB.112.1"
chr_1 ingenannot-isoform-ranking exon 648441 648836 . - . gene_id "PB.112"; transcript_id "PB.112.1"
Here no isoform was filtered out, so the isoforms.unclassif.gff file is empty.