check_gff {Rgff} | R Documentation |
Test consistency and order of a GFF file
Description
This function tests the consistency and order of a GFF file.
Usage
check_gff(inFile, fileType = c("AUTO", "GFF3", "GTF"))
Arguments
inFile |
Path to the input GFF file |
fileType |
Version of the input file (GTF/GFF3). Default AUTO: determined from the file name. |
Details
The following list indicates the code and description of the issues detected in GFF3 files
- NCOLUMNS_EXCEEDED
Input file contains lines with more than 9 fields
- NCOLUMNS_INFERIOR
Input file contains lines with less than 9 fields
- TOO_MANY_FEATURE_TYPES
Input file contains too many (more than 100) different feature types
- NO_IDs
ID attribute not found in any feature
- DUPLICATED_IDs
There are duplicated IDs
- ID_IN_MULTIPLE_CHR
The same ID has been found in more than one chromosome
- NO_PARENTs
Parent attribute not found in any feature
- MISSING_PARENT_IDs
There are missing Parent IDs
- PARENT_IN DIFFERENT CHR
There are features whose Parent is located in a different chromosome
- PARENT_DEFINED_BEFORE_ID
Feature ids referenced in Parent attribute before being defined as ID
- NOT_GROUPED_BY_CHR
Features are not grouped by chromosome
- NOT_SORTED_BY_COORDINATE
Features are not sorted by start coordinate
- NOT_VALID_WARNING
File cannot be recognized as valid GFF3. Parsing warnings.
- NOT_VALID_ERROR
File cannot be recognized as valid GFF3. Parsing errors.
The following list indicates the code and description of the issues detected in GTF files
- NCOLUMNS_EXCEEDED
Input file contains lines with more than 9 fields
- NCOLUMNS_INFERIOR
Input file contains lines with less than 9 fields
- TOO_MANY_FEATURE_TYPES
Input file contains too many (more than 100) different feature types
- NO_GENE_ID_ATTRIBUTE
gene_id attribute not found in any feature
- MISSING_GENE_IDs
There are features without gene_id attribute
- NO_GENE_FEATURES
Gene features are not included in this GTF file
- DUPLICATED_GENE_IDs
There are duplicated gene_ids
- GENE_ID_IN_MULTIPLE_CHR
The same gene_id has been found in more than one chromosome
- NO_TRANSCRIPT_ID_ATTRIBUTE
transcript_id attribute not found in any feature There are no elements with transcript_id attribute
- MISSING_TRANSCRIPT_IDs
There are features without transcript_id attribute
- NO_TRANSCRIPT_FEATURES
Transcript features are not included in this GTF file
- DUPLICATED_TRANSCRIPT_IDs
There are duplicated transcript_ids
- TRANSCRIPT_ID_IN_MULTIPLE_CHR
The same transcript_id has been found in more than one chromosome
- DUPLICATED_GENE_AND_TRANSCRIPT_IDs
Same id has been defined as gene_id and transcript_id
- NOT_GROUPED_BY_CHR
Features are not grouped by chromosome
- NOT_SORTED_BY_COORDINATE
Features are not sorted by start coordinate
- NOT_VALID_WARNING
File cannot be recognized as valid GTF. Parsing warnings.
- NOT_VALID_ERROR
File cannot be recognized as valid GTF. Parsing errors.
Value
A data frame of detected issues, including a short code name, a description and estimated severity each. In no issues are detected the function will return an empty data frame.
Examples
test_gff3<-system.file("extdata", "eden.gff3", package="Rgff")
check_gff(test_gff3)