soclassif
Command soclassif performs SO classification.
usage
ingenannot -v 2 soclassif file.fof --clustranded --clatype exon
positional arguments:
fof |
File of files, <GFF/GTF>TAB<source> |
optional arguments:
-h, –help |
show this help message and exit |
–clutype CLUTYPE |
Feature type used to clusterize: [gene, cds], default=cds |
–clustranded |
Same strand orientation required to cluster features, default=False |
–clatype CLATYPE |
Feature type used to classify: [gene, cds], default=cds |
inputs
File of Files (FoF) with all files to analyze. One per line such: <GFF/GTF>TAB<source>. If you want to analyze only isoforms of one file, put one line in the file.
outputs
Statistics for each category:
11661 metagenes with only one transcript, not analyzed
Classification:
N:O:O:0
N:N:O:1
N:O:N:0
N:N:N:1
O:N:O:1198
O:N:N:160
O:O:N:384
unclassified:11661
nb classified metagenes with all transcripts sharing the same CDS: 189
Categories defined by the SO such:
Class |
definition |
---|---|
N:0:0 |
No transcript-pairs share any exon sequence |

Class |
definition |
---|---|
N:N:0 |
Some transcript-pairs share sequence, but none have common exon boundaries |

Class |
definition |
---|---|
N:0:N |
Some transcript-pairs share no sequence, others have common exon boundaries |

Class |
definition |
---|---|
N:N:N |
Some transcript-pairs share no sequence, others have common sequence and exon boundaries |

Class |
definition |
---|---|
0:N:0 |
All transcript-pairs share sequence in common, but none share exon boundaries |

Class |
definition |
---|---|
0:N:N |
All transcript-pairs share sequence in common and some share exon boundaries |

Class |
definition |
---|---|
0:0:N |
All transcript-pairs share some exons in common |

As described above, the SO classification was originally based on exon boundaries, that could be highly problematic for de-novo annotations with poorly defined UTR parts. To avoid such problem, you can choose to perform the same classification based on CDS coordinates. In this case you will obtained less biased results. We tried to summarize the pro and cons of classification feature type in the following table.
pros |
cons |
|
---|---|---|
–clatype gene |
complete gene structure analysis |
too sensitive in case of divergent set of annotations (ex UTR, vs no-UTR) |
–clatype cds |
limited to coding sequence, avoid background noise due to UTRs. Usefull in case of poorly predicted UTRs. |
structure inspection limited to cds |
Each analyzed locus, associated with a category, is exported in the corresponding gff file.