gpml-utility {WayFindR}R Documentation

Utility Functions to Parse GPML FIles

Description

Extract entities of different types from GPML files in order to convert the pathway to a mathematical graph that we can compute on.

Usage

collectEdges(xmldoc)
collectNodes(xmldoc)
collectGroups(xmldoc, allnodes)
collectAnchors(xmldoc)
collectLabels(xmldoc)
collectShapes(xmldoc)

Arguments

xmldoc

Either the name of an XML file meeting the specifications of the Genomic Pathway Markup Language (GPML), or an object of class XMLInternalDocument obtained by running such a file through the xmlParseDoc function of the XML package. (All of the functions described here will call xmlParseDoc if it hasn't already been used.)

allnodes

A data frame containing node information, in the format produced by the collectNodes function.

Details

These functions are primarily intended as utility functions that implement processes required by the main function in the package, GPMLtoIgraph. They have been made accessible to the end user for use in debugging problematic GPML files or to reuse the GPML files in contexts other than the one we focus on in this package.

While the meaning of nodes (known as DataNodes in GPML) and edges (known as Interactions in GPML) should be obvious, some of the other objects are less so. For example, an Anchor in GPML is an invisible object used to allow an edge to point to another edge instead of to a node. That structure isn't allowed in graphs in mathematics or computer science. WayFindR handles this by creating a new node type to represent the anchor position, breaking the target edge into two pieces separated by the anchor, and adding an edge from the source of the anchored edge to the new node.

In GPML, a Label is a text box allowing extra information to be placed on a pathway, and a Shape is a graphical display object. The definition type document (DTD) for GPML describes both of these entities as non-semantic, intending them for display purposes only. However, some authors of pathways in the WikiPathways database use such objects as the (usually, final or "leaf") targets of interaction edges. When that happens, the WayFindR package deals with it by creating actual nodes to represent such labels or shapes. Other labels and shapes are ignored.

GPML also uses the idea of a Group as a first class object in their DTD. These are defined as "A collection of structurally or functionally similar or related pathway elements." The GPML file subclassifies some groups as "Complexes", indicating that they represent physical interactions and bindings between two or more molecules. Other groups may simply indicate that there is a related set of molecules (for example, STAT2 and STA3) that play the same role at this point in the pathway. WayFindR deals with groups by creating a new node to represent the group as a whole and expanding the component genes into nodes with a single "contained" edge that points to the new group node.

Value

The collectEdges function returns a data frame with three columns (Source, Target, and MIM), where each row describes one edge (or "Interaction" in the GPML terminology) of the pathway/graph. The Source and Target columns are the alphanumeric identifiers of items decribing nodes. The MIM column is the edge type in GPML, which often contains terminology based on the "Molecular Interaction Map" standard. When creating an igraph object from a pathway, the first two columns are used as identifiers to define the nodes at each end of the edge.

The collectNodes function returns a data frame with three columns (GraphId, label, and Type), where each row describes node or vertex of the pathway/graph. The GraphId column is a unique alphanumeric identifier. The label column is a human-readable name for the node, often the official gene symbol. When creating an igraph object from a pathway, the first column is used as identifier to define the node. Also, the plot method for igraphs recognizes the term label as a column that defines the text that should be displayed in a node.

The collectAnchors function returns a list containing a nodes element (in the same format returned by collectNodes) and an edges element (in the same format returned by collectEdges). The collectGroups function returns a list with nodes and edges components, just like the one from collectAnchors.

Both the collectLabels and collectShapes functions return the same kind of data frame that is returned by collectNodes.

Author(s)

Kevin R. Coombes krc@silicovore.com, Polina Bombina pbombina@augusta.edu

Examples

xmlfile <- system.file("pathways/WP3850.gpml", package = "WayFindR")
xmldoc <- XML::xmlParseDoc(xmlfile)
edges <- collectEdges(xmldoc)
nodes <- collectNodes(xmldoc)
anchors <- collectAnchors(xmldoc)
labels <- collectLabels(xmldoc)
edges <- collectShapes(xmldoc)
groups <- collectGroups(xmldoc, nodes)

[Package WayFindR version 0.1.2 Index]