read.pdb {bio3d} | R Documentation |
Read PDB File
Description
Read a Protein Data Bank (PDB) coordinate file.
Usage
read.pdb(file, maxlines = -1, multi = FALSE, rm.insert = FALSE,
rm.alt = TRUE, ATOM.only = FALSE, hex = FALSE, verbose = TRUE)
read.pdb2(file, maxlines = -1, multi = FALSE, rm.insert = FALSE,
rm.alt = TRUE, ATOM.only = FALSE, verbose = TRUE)
## S3 method for class 'pdb'
print(x, printseq=TRUE, ...)
## S3 method for class 'pdb'
summary(object, printseq=FALSE, ...)
Arguments
file |
a single element character vector containing the name of the PDB file to be read, or the four letter PDB identifier for online file access. |
maxlines |
the maximum number of lines to read before giving up with large files. By default if will read up to the end of input on the connection. |
multi |
logical, if TRUE multiple ATOM records are read for all models in multi-model files and their coordinates returned. |
rm.insert |
logical, if TRUE PDB insert records are ignored. |
rm.alt |
logical, if TRUE PDB alternate records are ignored. |
ATOM.only |
logical, if TRUE only ATOM/HETATM records are stored. Useful for speed enhancements with large files where secondary structure, biological unit and other remark records are not required. |
hex |
logical, if TRUE enable parsing of hexadecimal atom numbers (> 99.999) and residue numbers (> 9.999) (e.g. from VMD). Note that numbering is assumed to be consecutive (with no missing numbers) and the hexadecimals should start at atom number 100.000 and residue number 10.000 and proceed to the end of file. |
verbose |
print details of the reading process. |
x |
a PDB structure object obtained from
|
object |
a PDB structure object obtained from
|
printseq |
logical, if TRUE the PDB ATOM sequence will be printed
to the screen. See also |
... |
additional arguments to ‘print’. |
Details
read.pdb
is a re-implementation (using Rcpp) of the slower but
more tested R implementation of the same function (called
read.pdb2
since bio3d-v2.3).
maxlines
may be set so as to restrict the reading to a portion
of input files. Note that the preferred means of reading large
multi-model files is via binary DCD or NetCDF format trajectory files
(see the read.dcd
and read.ncdf
functions).
Value
Returns a list of class "pdb"
with the following components:
atom |
a data.frame containing all atomic coordinate ATOM and HETATM data, with a row per ATOM/HETATM and a column per record type. See below for details of the record type naming convention (useful for accessing columns). |
helix |
‘start’, ‘end’ and ‘length’ of H type sse, where start and end are residue numbers “resno”. |
sheet |
‘start’, ‘end’ and ‘length’ of E type sse, where start and end are residue numbers “resno”. |
seqres |
sequence from SEQRES field. |
xyz |
a numeric matrix of class |
calpha |
logical vector with length equal to |
remark |
a list object containing information taken from 'REMARK'
records of a |
call |
the matched call. |
Note
For both atom
and het
list components the column names can be
used as a convenient means of data access, namely:
Atom serial number “eleno” ,
Atom type “elety”,
Alternate location indicator “alt”,
Residue name “resid”,
Chain identifier “chain”,
Residue sequence number “resno”,
Code for insertion of residues “insert”,
Orthogonal coordinates “x”,
Orthogonal coordinates “y”,
Orthogonal coordinates “z”,
Occupancy “o”, and
Temperature factor “b”.
See examples for further details.
Author(s)
Barry Grant
References
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
For a description of PDB format (version3.3) see:
http://www.wwpdb.org/documentation/format33/v3.3.html.
See Also
atom.select
, write.pdb
,
trim.pdb
, cat.pdb
,
read.prmtop
, as.pdb
,
read.dcd
, read.ncdf
,
read.fasta.pdb
, read.fasta
,
biounit
Examples
## Read a PDB file from the RCSB online database
#pdb <- read.pdb("4q21")
## Read a PDB file from those included with the package
pdb <- read.pdb( system.file("examples/1hel.pdb", package="bio3d") )
## Print a brief composition summary
pdb
## Examine the storage format (or internal *str*ucture)
str(pdb)
## Print data for the first four atom
pdb$atom[1:4,]
## Print some coordinate data
head(pdb$atom[, c("x","y","z")])
## Or coordinates as a numeric vector
#head(pdb$xyz)
## Print C-alpha coordinates (can also use 'atom.select' function)
head(pdb$atom[pdb$calpha, c("resid","elety","x","y","z")])
inds <- atom.select(pdb, elety="CA")
head( pdb$atom[inds$atom, ] )
## The atom.select() function returns 'indices' (row numbers)
## that can be used for accessing subsets of PDB objects, e.g.
inds <- atom.select(pdb,"ligand")
pdb$atom[inds$atom,]
pdb$xyz[inds$xyz]
## See the help page for atom.select() function for more details.
## Not run:
## Print SSE data for helix and sheet,
## see also dssp() and stride() functions
print.sse(pdb)
pdb$helix
pdb$sheet$start
## Print SEQRES data
pdb$seqres
## SEQRES as one letter code
aa321(pdb$seqres)
## Where is the P-loop motif in the ATOM sequence
inds.seq <- motif.find("G....GKT", pdbseq(pdb))
pdbseq(pdb)[inds.seq]
## Where is it in the structure
inds.pdb <- atom.select(pdb,resno=inds.seq, elety="CA")
pdb$atom[inds.pdb$atom,]
pdb$xyz[inds.pdb$xyz]
## View in interactive 3D mode
#view(pdb)
## End(Not run)