mvbutils.packaging.tools {mvbutils} | R Documentation |
How to create & maintain packages with mvbutils
Description
This document covers:
using
mvbutils
to create a new package from scratch;using
mvbutils
to maintain a package you've created (e.g. edit it while using it);converting an existing package into
mvbutils
-compatible format;how to customize the package-creation process.
For clarity, the simplest usage is presented first in each case. For how to do things differently, first look further down this document, then in the documentation for pre.install
and perhaps doc2Rd
.
You need to understand cd
and fixr
before trying any of this.
Setting up a package from scratch
First, the simplest case: suppose you have some pure R code and maybe data that you'd like to make into a package called "Splendid". The bare-minimum steps you need are:-
Make sure all the code & data lives in a single task called "Splendid".
-
cd
to the task above "Splendid" -
maintain.packages( Splendid)
-
pre.install( Splendid)
. This will create a "source package" in a subdirectory of Splendid's task directory. The subdirectory will be called "Splendid". Make sure you have all the R build tools installed and on your path– see "R-exts" for details (and NB that if you need to install Latex, then google MikTex & choose a minimal install).
-
install.pkg( Splendid)
to do what you'd expect. On Windows, you can alternatively first dobuild.pkg.binary( Splendid)
, then use R's menus to "Packages/Install from local zip files". -
library(Splendid)
; your package will be loaded for use, and is also ready for live-editing.
Your package will probably just about work now, but the result won't yet be perfect. The additional steps you'll likely need are these:
Sort out the Description file or object[(]see below[)]
Provide Documentation and metadata[(]see below[)]
Sort out any C/Fortran source code, pre-compiled code, demos, and other additional files (see
pre.install
)Move any subtasks of Splendid to one level up the task hierarchy (see
maintain.packages
)
Once you have set up "Splendid" so that maintain.packages
works, you won't need to cd
directly into "Splendid" again— which is good, because you're not allowed to.
Glossary
Task package is a folder with at least an ".RData" file, linked into the cd
hierarchy. It contains master copies of the objects in your package, plus perhaps a few other objects required to build the package (e.g. stand-alone items of documentation).
In-memory task package is an environment in the current R session that contains an image of the task package. Objects in it are never used directly, only as templates for editing. It is loaded by maintain.packages
, and Save.pos
uses it to update the task package (usually automatic).
Source package is a folder containing, yes, an R-style source package. It is created initially by pre.install
, and subsequently by patch.install
or pre.install
.
Installed package is a folder containing, yes, an R-style installed package. It is always created from the source package, initially by install.pkg
and subsequently by patch.install
or install.pkg
.
Loaded package is the in-memory version of an installed package, loaded by library
.
Tarball package is a zipped-up version of a source package, for distro on non-Windows-Mac platforms or submission to CRAN and subsequent installation via "R CMD INSTALL". Usually it will not contain DLLs of any low-level code, just the source low-level code. It is created by build.pkg
.
Binary package is a special zipped-up version for distro to Windows or Macs that includes actual DLLs, for installation via e.g. the "Packages/Install from local ZIP" menu. It is created by build.pkg.binary
.
Built package is a tarball package or binary package.
Converting an existing package
Suppose you have already have a source package "hardway", and would like to try maintaining it via mvbutils
. You'll need to create a task package, then create a new version of the source package, then re-install it. The first step is to call unpackage( hardway)
to creat the task package "hardway" in a subdirectory of the current task. Plain-text documentation will be attached to functions, or stored as ".doc" text objects. All functions and documentation must thereafter be edited using fixr
. The full sequence is something like:
# Create task package in subdirectory of current: unpackage( "path/to/existing/source/package/hardway") # # Load image into memory: maintain.packages( hardway) # # Make new version of source package: pre.install( hardway, ...) # use dir= to control where new source pkg goes # install.pkg( hardway) # or build.pkg.binary( hardway) followed by "install from local zip file" menu # library( hardway) # off yer go
If you get problems after maintain.packages
, you might need unmaintain.package( hardway)
to clear out the in-memory copy of the new task package.
Documentation and metadata
Documentation for functions can be stored as plain text just after a function's source code, as described in flatdoc
. Just about anything will do– you don't absolutely have to follow the conventional structure of R help if you are really in a hurry. However, the easiest way to add kosheR but skeletal documentation to your function brilliant
, is fixr( brilliant, new.doc=TRUE)
; again, see flatdoc
and doc2Rd
if you want to understand what's going on. The format is almost exactly as displayed in plain-text help, i.e. from help(..., help_type="text")
. My recommendation is to just start writing something that looks reasonable, and see if it works. To quickly test the ultimate appearance, you can use e.g. docotest(..Splendid$brilliant)
. More generally, run patch.install(Splendid)
which, as explained in Maintaining a package below, updates everything for your package including the help system, so you can then just do ?brilliant
. If you run into problems with writing documentation for your functions, then refer to doc2Rd
for further details of format, such as how to document several functions in the same file.
You can also provide three other types of documentation, for: (i) general use of your package (please do! it helps the user a lot; packages where the doco PDF consists only of an alphabetical list of functions/objects are a pain); (ii) more specific aspects of usage that are not tied to individual functions, such as this file; and (iii) datasets. These types of documentation should be stored in the package as text objects whose name ends in ".doc"; examples of the three types could be "Splendid.package.doc", "glitzograms.with.Splendid.doc", and "earlobes.doc" if you have a dataset earlobes
. See doc2Rd
for format details.
You must document every function and dataset that the user will see, but you don't need to document any others. The foregoing applies iff your package has a Namespace, which it must for R 2.14 up.
Description file or object
When you first create a package from a task via pre.install
, there probably won't be any DESCRIPTION information, so mvbutils
will create a default "DESCRIPTION" file in your task folder, which it then copies to the source package. However, the default won't really be what you really want, as you'll realize if you type library( help=Splendid)
. You can either manually edit the default "DESCRIPTION" file, or you can use fixtext(Splendid.DESCRIPTION, pkg="Splendid")
to create a text object in your task package, which you then populate with the contents of the default "DESCRIPTION" file, and then edit. If a Splendid.DESCRIPTION
object exists, mvbutils
will use it in preference to a file; I find this tidier, because more of the package metadata lives in a single place, viz. inside the task package.
Apart from the obvious changes needed to the default "DESCRIPTION" file or text object, the most important fields to add are "Imports:" (or "Depends:" for packages that are pre-R2.14 and that also don't have a namespace), to say what other packages are needed by "Splendid". The DESCRIPTION file/text should rarely need to be updated, since the "autoversion" feature (see pre.install
doco) can be used to take care of version numbering. The most common reason to change the DESCRIPTION is probably to add/remove packages in "Imports"; at present, this pretty much requires you to unload & reload the package, but I may try to expedite this in future versions.
Vignettes
In time, I plan to get mvbutils
working nicely with knitr
. At present (Jan 2013), the easiest way to create vignettes with mvbutils
is to produce your own "homebrewed" PDFs however you prefer, and put them into the "inst/doc" folder. pre.install/patch.install
will sort them out and link them into the help system. To provide more information than the filename, use fixtext
to create a text object in your task package called e.g. mypack.VIGNETTES
, with lines as follows:
my.first.vignette: Behold leviathan, mate my.second.vignette: What a good idea, to write a vignette
As a very experimental feature, you can also include R code for a homebrewed vignette, via a file with the same name but extension ".R" also in "inst/doc". Users can access it as normal for vignette code, via edit( vignette( "my.first.vignette", package="mypack"))
or via doing something to system.file( file.path( "doc", "my.first.vignette.R"), package="mypack")
.
You can put full-on Sweave-style vignettes into a "vignettes" folder, and they should be set up correctly in the source package. Currently, though, they are not re-installed by patch.install
; you need to use build.pkg
and install.pkg
(partly defeating the point of these package-building utilities).
Very technical details about homebrewed vignettes
"Rnw stubs" are created for all homebrewed vignettes so that the help system finds them. A rudimentary index will be created for vignettes not mentioned in <<mypack>>.VIGNETTES
. If you create your own "inst/doc/index.html" file, this takes precedence over mvbutil's versions, so that <<mypack>>.VIGNETTES
is not used.
Namespace
Usually this is automatic. pre.install
etc automatically creates a "NAMESPACE" file for your package, ensuring inter alia that all documented objects are user-visible. To load DLLs, add a .onLoad
function that contains the body code of generic.dll.loader
in package mvbutils (thus avoiding dependence on mvbutils
). For more complicated fiddling, see Customizing package creation.
Packages without namespaces pre r 2 14
Namespaces only became compulsory with R 2.14. If you're setting up your package in an earlier version of R, mvbutils
will not create a namespace unless it finds a .onLoad
function. To trigger namespacing, just create a .onLoad
with this definition: function( libname, pkgname) {}
.
Maintaining a package
Once you have successfully gotten your "Splendid" package installed and loaded the first time, you should rarely need to call install.pkg
or build.pkg
etc again, except when you are about to distribute to others. In your own work, after calling maintain.packages
and library
in an R session, you can modify, add and delete functions, datasets, and documentation in your package via the standard functions fixr
, move
, and rm.pkg
(or directly), and these changes will mostly be immediately manifested in the loaded package within your R session– this is "live editing". The changes are made first to the in-memory task package, which will be called e.g. ..Splendid
, and then propagated to the loaded package. Don't try to manipulate the loaded package's namespace directly. See maintain.packages
for details.
To update the installed package (on disk), call patch.install( Splendid)
; this also calls pre.install
to update the source package, updates the help system in the current session, and does a few other synchronizations. You need to call patch.install
before quitting R to ensure that the changes are manifest in the loaded package the next time you start R; otherwise they will only exist in the in-memory task package, and won't be callable.
Troubleshooting
In rare cases, you may find that maintain.packages( Splendid)
fails. If that happens, there won't be a ..Splendid
environment, which means you can't fix whatever caused the load failure. The load failure is (invariably in my experience) caused by a hidden attempt to load a namespaced package, which is failing for yet another reason, usually something in its .onLoad
; that package might or might not be "Splendid" itself. If you can work out what other package is trying to load itself– say badpack
– you can temporarily get round the problem by making use of the character vector partial.namespaces
, which lives in the "mvb.session.info" search environment, as follows:
partial.namespaces <<- c( partial.namespaces, "badpack")
That will prevent execution of badpack:::.onLoad
. Consequently badpack
won't be properly loaded, but at least the task package will be loaded into ..Splendid
, so that you can make a start on the problem. If you can't work out which package is causing the trouble, try
partial.namespaces <<- "EVERY PACKAGE"
After that, no namespaced package will load properly, so remember to clear partial.namespaces <<- NULL
before resuming normal service.
Occasionally (usually during patch.install
), you might see R errors like "cannot allocate vector of size 4.8Gb". I think this happens when some internal cache gets out-of-synch. It doesn't seem to cause much damage to the installed package, but once it's happened in an R session, it tends to happen again. I usually quit & restart R.
You might also find find.lurking.envs
useful, via eapply( ..Splendid, find.lurking.envs)
; this will show any functions (or other things) in ..Splendid
that have accidentally acquired a non-standard environment such as a namespace, which can trigger a "hidden" package load attempt. The environment for all functions in ..Splendid
should probably be .GlobalEnv
; the environments in the loaded package will be different, of course.
It's rare to need to manually inspect either the source package or the installed package. But if you do, then spkg
helps for the former, e.g. dir( spkg( mypack))
; and system.file
helps for the latter, e.g. system.file( package="mypack")
, or system.file( file.path( "help", "AnIndex"), package="mypack")
.
Distributing and checking
build.pkg
calls R CMD BUILD to create a "tarball" of the package (a ".tar.gz" file), which is the appropriate format for distribution to Unix folk and submission to CRAN. build.pkg.binary
creates a binary package (a ".zip" file), suitable for Windows or Macs. check.pkg
runs R CMD CHECK (but see next paragraph for a quicker alternative), which is required by CRAN and sometimes useful at other times. These .pkg
functions are pretty simple wrappers to the R CMD tools with similar names. However, for those with imperfect memories and limited time, there are enough arcane and mutable nuances with the "raw" R CMD commands (including the risk of inadvertently deleting existing installations) to make the wrappers in mvbutils
useful.
Various functions in the tools package can be used to quickly check specific aspects of an installed package, without needing a full-on, and slow, R CMD CHECK. In particular, I sometimes use
codoc( spkg( mypack)) # also spkg( "mypack"), spkg( ..mypack) undoc( spkg( ..mypack))
Nothing is printed unless a problem is found, so a blank result is good news! It's also possible to run other tools such as checkTnF
and checkFF
similarly.
By default, mvbutils
adds code to the source package to circumvent the CRAN checks for "no visible function/binding", which I consider to be a waste of time; for example, unless circumvented they generate 338 false positives for package mvbutils. If for some reason you actually want these checks, see "Overriding defaults" in pre.install
.
Folders and different r versions
Life can get complicated when there are several versions of R around, particularly when they require different package formats at source or build or install time (eg R 2.10, 2.12, R 3.0). install.pkg
etc do their best to simplify this for you. You won't normally need to know the details unless you are trying to maintain several versions of your package for different versions of R for distribution to other people who use those different R versions. But if you do need to know the details, then the default folder structure is as follows. If the task package lives in folder "mypack", then the source package is created by pre.install
in "mypack/mypack", and the built package(s) will go into folders such as "mypack/R2.15" depending on what R version is running.
Note that your task package can only ever have one version; if different behaviour is required for different R versions, then you need to code this up your functions, or via some trickery in .onLoad
.
Built packages
Building comes first: the tarballed/zipped packages from build.pkg
and build.pkg.binary
are placed in a folder parallel to the source package, with a name of the form "Rx.y". mvbutils
tries to be sensible about what "x.y" should be. It will never be newer than the running R version. It will never be older than the most recent major R version that required mandatory package rebuilds (eg R 3.0 and R 2.12). If one or more folders already exist that satisfy those properties, the highest-numbered one will be used. If not, a new folder will be created with the current R major version (eg R 2.15.3 will trigger a folder "R2.15"). You can create your own "Rx.y" folder, for instance if the current version of your package requires an R feature only found in R version "x.y". Also, mvbutils
knows which R versions change the format of built packages, and will create a new folder for such a version if required.
The default behaviour is therefore that build.pkg.<binary>
will keep building into the same folder. For example, if at some point a "mypack/R2.12" folder was created, then that's where all builds will be sent regardless of the running R version, until you either manually create an "mypack/Rx.y" folder that's closer to the running R version, or the latter hits 3.0 which automatically triggers the creation of a new "mypack/R3.0" folder. Thanks to the "autoversion" feature of pre.install
, the version number of the build will change whenever <pre/patch>.install
is used. (Note that old built packages are not removed until/unless you explicitly call cull.old.builds
, although it's "good housekeeping" to do the latter occasionally.) By manually creating new "Rx.y" folder when necessary, you can ensure that there won't be any updates to built packages for R older than "x.y", which gives a kind of "checkpoint" feature; your built packages for older versions of R (ie for distribution to users of those older R versions) won't be accidentally zapped by cull.old.builds
housekeeping, and you can be sure that old code running under old versions of R will still work.
What this does not let you do easily, is use your current R version to create updated versions of your package for R-versions that pre-date the most up-to-date "Rx.y" folder. For example, if you are running R3.0, there is guaranteed to be an "R3.0" folder, so calling build.pkg<.binary>
won't build new packages in an "R2.15" folder. Again, usually this doesn't matter, because new "Rx.y" folders are only rarely created automatically, so builds will tend to stay in the same folder and the newest version will be accessible to all. But sometimes it is a hassle... Nevertheless, I have managed to maintain parallel versions of my packages across the R2.15-R3.0 change, by (sequentially) running two R versions and calling build.pkg<.binary>
from each. (Note that build.pkg<.binary>
can only build in the format of running R version– you can't "cross-build" for different built formats from the same R session.)
Source packages
R occasionally demands a change in source package format, as opposed to built package format (as with R 3.0). (IIRC one example is R 2.10, with the change in helpfile format.) Then you face the problem of how to keep several source packages. This can be controlled by options("mvbutils.sourcepkgdir.postfix")
, which is appended to the name of the folder where your source package will be created and used for building or installing. The default is the empty string ""
, so that the default source package folder for "mypack" is "mypack/mypack". To allow for multiple source package versions, you could put something like this in your .First
or ".Rprofile":
if( getRversion() >= numeric_version( '4.0')) { # New source package format options( mvbutils.sourcepkgdir.postfix='[R4]') }
Everything should then work automatically; all source-package operations will refer to "mypack/mypack[R4]" if you are running version 4 or above, or to "mypack/mypack" if you are running an earlier R version, and you should never really need to know the source package foldername yourself (build.pkg
etc do it all for you). This depends on you setting the option yourself, and has not been tested yet. Eventually I may hardwire the feature automatically into mvbutils
(or is it better for each source package to go into an appropriate built-package folder? but that sounds a bit like version hell).
Customizing package creation
You can customize many aspects of the mvbutils package-creation process, by adding a function pre.install.hook.Splendid
to your package. See pre.install
for further details.