check_gff {Rgff}R Documentation

Test consistency and order of a GFF file

Description

This function tests the consistency and order of a GFF file.

Usage

check_gff(inFile, fileType = c("AUTO", "GFF3", "GTF"))

Arguments

inFile

Path to the input GFF file

fileType

Version of the input file (GTF/GFF3). Default AUTO: determined from the file name.

Details

The following list indicates the code and description of the issues detected in GFF3 files

NCOLUMNS_EXCEEDED

Input file contains lines with more than 9 fields

NCOLUMNS_INFERIOR

Input file contains lines with less than 9 fields

TOO_MANY_FEATURE_TYPES

Input file contains too many (more than 100) different feature types

NO_IDs

ID attribute not found in any feature

DUPLICATED_IDs

There are duplicated IDs

ID_IN_MULTIPLE_CHR

The same ID has been found in more than one chromosome

NO_PARENTs

Parent attribute not found in any feature

MISSING_PARENT_IDs

There are missing Parent IDs

PARENT_IN DIFFERENT CHR

There are features whose Parent is located in a different chromosome

PARENT_DEFINED_BEFORE_ID

Feature ids referenced in Parent attribute before being defined as ID

NOT_GROUPED_BY_CHR

Features are not grouped by chromosome

NOT_SORTED_BY_COORDINATE

Features are not sorted by start coordinate

NOT_VALID_WARNING

File cannot be recognized as valid GFF3. Parsing warnings.

NOT_VALID_ERROR

File cannot be recognized as valid GFF3. Parsing errors.

The following list indicates the code and description of the issues detected in GTF files

NCOLUMNS_EXCEEDED

Input file contains lines with more than 9 fields

NCOLUMNS_INFERIOR

Input file contains lines with less than 9 fields

TOO_MANY_FEATURE_TYPES

Input file contains too many (more than 100) different feature types

NO_GENE_ID_ATTRIBUTE

gene_id attribute not found in any feature

MISSING_GENE_IDs

There are features without gene_id attribute

NO_GENE_FEATURES

Gene features are not included in this GTF file

DUPLICATED_GENE_IDs

There are duplicated gene_ids

GENE_ID_IN_MULTIPLE_CHR

The same gene_id has been found in more than one chromosome

NO_TRANSCRIPT_ID_ATTRIBUTE

transcript_id attribute not found in any feature There are no elements with transcript_id attribute

MISSING_TRANSCRIPT_IDs

There are features without transcript_id attribute

NO_TRANSCRIPT_FEATURES

Transcript features are not included in this GTF file

DUPLICATED_TRANSCRIPT_IDs

There are duplicated transcript_ids

TRANSCRIPT_ID_IN_MULTIPLE_CHR

The same transcript_id has been found in more than one chromosome

DUPLICATED_GENE_AND_TRANSCRIPT_IDs

Same id has been defined as gene_id and transcript_id

NOT_GROUPED_BY_CHR

Features are not grouped by chromosome

NOT_SORTED_BY_COORDINATE

Features are not sorted by start coordinate

NOT_VALID_WARNING

File cannot be recognized as valid GTF. Parsing warnings.

NOT_VALID_ERROR

File cannot be recognized as valid GTF. Parsing errors.

Value

A data frame of detected issues, including a short code name, a description and estimated severity each. In no issues are detected the function will return an empty data frame.

Examples

test_gff3<-system.file("extdata", "eden.gff3", package="Rgff")
check_gff(test_gff3)

[Package Rgff version 0.1.6 Index]