Add new gene isoforms to your annotations
Here we propose a protocole to add potential isoforms to your canonical gene models. This protocole is based on the extraction of Sqanti3 [1] results.
Workflow:

Steps:
The starting point is the gff file obtained from select_best_gene_models use case. If want to add isoforms on gff coming from another source, perform a validation of your gff with the validate command before.
1) Extract the most representative isoforms from long-read transcripts
If you have both long-read transcripts and short-read data, we recommend annotating isoforms using long-read transcripts based on coverage from short-reads. If short-read data is not available, you can proceed without filtering or use short-read transcripts but results will be of lower quality.
# write your file of files: bam.fof as below, such as "path to bam<tab>paired<tab>stranded"
/tmp/run1.singleton.stranded.bam False True
/tmp/run2.singleton.unstranded.bam False False
/tmp/run3.paired.stranded.bam True True
/tmp/run4.paired.unstranded.bam true False
# your longreads.gff file must look like a gtf file, with several trancript for the same gene, as below:
chr_1 ingenannot-isoform-ranking transcript 140523 145168 . - . gene_id "PB.12"; transcript_id "PB.12.1";
chr_1 ingenannot-isoform-ranking exon 140523 145168 . - . gene_id "PB.12"; transcript_id "PB.12.1";
chr_1 ingenannot-isoform-ranking transcript 140531 145121 . - . gene_id "PB.12"; transcript_id "PB.12.2";
chr_1 ingenannot-isoform-ranking exon 140531 145121 . - . gene_id "PB.12"; transcript_id "PB.12.2";
chr_1 ingenannot-isoform-ranking transcript 140579 144883 . - . gene_id "PB.12"; transcript_id "PB.12.3";
chr_1 ingenannot-isoform-ranking exon 140579 144883 . - . gene_id "PB.12"; transcript_id "PB.12.3";
chr_1 ingenannot-isoform-ranking transcript 140708 144901 . - . gene_id "PB.12"; transcript_id "PB.12.4";
chr_1 ingenannot-isoform-ranking exon 140708 144901 . - . gene_id "PB.12"; transcript_id "PB.12.4";
chr_1 ingenannot-isoform-ranking transcript 202365 205520 . - . gene_id "PB.22"; transcript_id "PB.22.1";
chr_1 ingenannot-isoform-ranking exon 202365 205031 . - . gene_id "PB.22"; transcript_id "PB.22.1";
chr_1 ingenannot-isoform-ranking exon 205203 205520 . - . gene_id "PB.22"; transcript_id "PB.22.1";
chr_1 ingenannot-isoform-ranking transcript 202939 205422 . - . gene_id "PB.22"; transcript_id "PB.22.2";
chr_1 ingenannot-isoform-ranking exon 202939 205422 . - . gene_id "PB.22"; transcript_id "PB.22.2";
chr_1 ingenannot-isoform-ranking transcript 202939 205488 . - . gene_id "PB.22"; transcript_id "PB.22.3";
chr_1 ingenannot-isoform-ranking exon 202939 205031 . - . gene_id "PB.22"; transcript_id "PB.22.3";
chr_1 ingenannot-isoform-ranking exon 205203 205488 . - . gene_id "PB.22"; transcript_id "PB.22.3";
# rank/filter your long-read transcripts based on junction support and coverage
ingenannot -v 2 isoform_ranking longreads.gff -f bam.fof --alt_threshold 0.1
2) Annotate isoforms with SQANTI3
# If your annotation file is in gff3, transform it in gtf with gffread
gffread genes.gff3 -T > genes.gtf
# run sqanti3 with only the best alternatives isoforms
sqanti3_qc.py --gtf isoforms.alternatives.gff genes.gtf genome.fasta --isoAnnotLite
3) Extract the list of isoforms to add from SQANTI3 results
# get list of transcripts to add
# limit to 3 types of annotations without non-coding with file extraction as below:
cut -f1,6,15,30,37 isoforms.alternatives_classification.txt | grep -v -P "TRUE|non_coding" | grep -P "\tgenic\t" | cut -f 1 > lIDS_genic.txt
cut -f1,6,15,30,37 isoforms.alternatives_classification.txt | grep -v -P "TRUE|non_coding" | grep -P "\tnovel_in_catalog\t" | cut -f 1 > lIDS_nic.txt
cut -f1,6,15,30,37 isoforms.alternatives_classification.txt | grep -v -P "TRUE|non_coding" | grep -P "\tnovel_not_in_catalog\t" | cut -f 1 > lIDS_nnic.txt
cat lIDS*.txt > all_IDs.txt
4) Add Isoforms to your canonical gene models
ingenannot -v 2 add_sqanti3_isoforms genes.gff3 isoforms.alternatives.gff all_IDs.txt > genes.isoforms.gff3
5) Analyze the Sequence Ontology Classification of your transcripts
SO Classification will help you to remove some possible artefacts (overlapping, …).
# write your file of files: genes.fof as below, such as "path to GFF/GTF genes<tab>source"
genes.isoforms.gff3 squanti3
# run classification
ingenannot -v 2 soclassif genes.fof --clustranded --clatype exon