Deprecated: Function create_function() is deprecated in /customers/d/c/f/dominikschauer.com/httpd.www/wp-includes/pomo/translations.php on line 208 How to create an R package (including documentation, vignettes and a manual) – Dominik Schauer

How to create an R package (including documentation, vignettes and a manual)

 

Recently I created my first R package. While doing this I realized that there is no resource that covers in just one article:

  • how to create an package from scratch
  • add function documentation
  • add a vignette
  • and a reference manual

So this is exactly what I cover in this post. Don’t get me wrong, There are lots of great resources. Namely these are those that I used:

To create your first R package you will need two packages, “devtools” and “roxygen”.

install.package("devtools")
install.package("roxygen2")
library(devtools)
library(roxygen2)

1) Creating a package body

After installing these, you can create a frame for the contents of your package by calling

setwd("your/package/directory")
devtools::create("yourpackagename")

This will create a folder called “yourpackagename” in the directory “your/package/directory”. This folder will contain:

  • a sub-folder called “R”
  • a file called .gitignore
  • a file called .Rbuildignore
  • a file called DESCRIPTION
  • a file called NAMESPACE
  • an R project called “yourpackagename”

blogpost1

You can achieve a similar result by calling the function “package.skeleton()” from the package “utils” but here I won’t go into this.

2) Adding Functions

This will most likely take the most time. At the same time it is also not really related to the documentary work that is the topic of this post. The core purpose of your package is to execute R functions. To give your package this ability you need to write R files containing these functions and to place them inside the sub-folder /yourpackagename/R. Here is an example I copied from Fong Chun Chan’s Blog:

load_mat <- function(infile){   
  in.dt <- data.table::fread(infile, header = TRUE)
  in.dt <- in.dt[!duplicated(in.dt[, 1]), ]
  in.mat <- as.matrix(in.dt[, -1, with = FALSE])
  rownames(in.mat) <- unlist(in.dt[, 1, with = FALSE])  
  in.mat 
}

Save this as a load_mat.R to your R directory. For demonstration purposes I will keep it to just one file and one function but you can include multiple R files and each R file can contain multiple R functions. So for example your file could also look like this:

load_mat <- function(infile){   
  in.dt <- data.table::fread(infile, header = TRUE)
  in.dt <- in.dt[!duplicated(in.dt[, 1]), ]
  in.mat <- as.matrix(in.dt[, -1, with = FALSE])
  rownames(in.mat) <- unlist(in.dt[, 1, with = FALSE])  
  in.mat 
}

load_mat2 <- funciton(infile) {
  ...
}

In general, try to group together related functions into the same .R file (e.g. if you have a bunch of loading functions then putting them in R/load.R would be a good idea). One important thing to note here, is you need to add the @export tag above your function to indicate this function to be “exposed” to users to use. For example:

#' @export
load_mat <- function(infile){   
  in.dt <- data.table::fread(infile, header = TRUE)
  in.dt <- in.dt[!duplicated(in.dt[, 1]), ]
  in.mat <- as.matrix(in.dt[, -1, with = FALSE])
  rownames(in.mat) <- unlist(in.dt[, 1, with = FALSE])  
  in.mat 
}

The #' @export syntax is actually an Roxygen tag which we will discuss in Section 3. By doing this, this ensures that the load_mat() function gets added to the NAMESPACE (when you run devtools::document()) to indicate that it needs to be exposed. In case you wonder what the NAMESPACE is, you don’t need to worry about it at this point. We will discuss this in Section 5.

blogpost3

3) Documenting Functions

This always seemed like the most intimidating step to me. I’m here to tell you — it’s super quick. The package roxygen2 that makes everything amazing and simple. The way it works is that you add special comments to the beginning of each function, that will later be compiled into the correct format for package documentation. The details can be found in the roxygen2 documentation — I will just provide an example for our load_mat() function.

3.1 Writing the Documentation

So how do you get that nice documentation in R when I go ?load_mat. We can leverage off the roxygen2 which provides a very simple way of documenting our functions and then produces man/load_mat.Rd files which is what we see when we go ?load_mat. Both Hilary (Step 3: Add documentation) and Hadley (Object documentation) discuss this at length and I refer you to there pages.

#' Load a Matrix
#'
#' This function loads a file as a matrix. It assumes that the first column
#' contains the rownames and the subsequent columns are the sample identifiers.
#' Any rows with duplicated row names will be dropped with the first one being
#' kepted.
#'
#' @param infile Path to the input file
#' @return A matrix of the infile
#' @export
load_mat <- function(infile){   
  in.dt <- data.table::fread(infile, header = TRUE)
  in.dt <- in.dt[!duplicated(in.dt[, 1]), ]
  in.mat <- as.matrix(in.dt[, -1, with = FALSE])
  rownames(in.mat) <- unlist(in.dt[, 1, with = FALSE])  
  in.mat 
}

3.2 Processing the Documentation

Once you’ve got your documentation completed, you can simply run:

devtools::document()

This will generate the load_mat.Rd file from your annotations and put them in the automatically created yourpackagename/man folder  :

% Generated by roxygen2 (4.1.0): do not edit by hand
% Please edit documentation in R/load.R
\name{load_mat}
\alias{load_mat}
\title{Load a Matrix}
\usage{
load_mat(infile)
}
\arguments{
\item{infile}{Path to the input file}
}
\value{
A matrix of the infile
}
\description{
This function loads a file as a matrix. It assumes that the first column
contains the rownames and the subsequent columns are the sample identifiers.
Any rows with duplicated row names will be dropped with the first one being
kepted.
}

You will get one .Rd file for each function in your R package. Executing the document()  function also updates the NAMESPACE file in the main directory.

blogpost4

4) Installing your package (Bonus)

Go to the directory that contains the package and run.

setwd("..")
devtools::install("yourpackagename")

Now it is as simple as installing the package! You need to run this from the parent working directory that contains the yourpackagename folder.

Now you have a real, live, functioning R package. For example, try typing ?load_mat. You should see the standard help page pop up!

blogpost10

4) Creating Vignettes

Vignettes are extremely important to give people a high-level understanding of what your R package can do. To get started with generating a vignette, you can use the devtools::use_vignette() function for this. For instance,

devtools::use_vignette("introduction")

This will create a vignette/introduction.Rmd file. This is a vignette template Rmarkdown file that you can then use to fill out steps on how you can use your package. It doesn’t use the comments you have written earlier or any other information of your package. This function just gives you a template for the Vignette but you still have to write the Vignette yourself.

In case you are using R Studio executing this function will automatically open the introduction.Rmd. Clicking on “Knit” will then parse the Rmarkdown file and create an HTML page in the same format and also put it into the yourpackagename/man folder.

You can also create a PDF version of your Vignette. To do this you simply have to change a single line in the Rmarkdown file. Just replace

output: rmarkdown::html_vignette

with

output: rmarkdown::pdf_document

and click again on “Knit”.

blogpost6

5) Editing the DESCRIPTION

The DESCRIPTION file is used in two different places, in your automatically generated manual and on the CRAN package page (in case it gets published on CRAN). You can read about the DESCRIPTION file in detail here and I strongly suggest that you do so. The most important bits are the following. The file is basically a text file that defines some parameters which represent meta information about the package, such as the author, the required packages, the publishing date and more. “Devtools” manages some of these for you but the following parameters need to be adjusted manually. Follow the link above to see what can and what needs to be done.

6) Creating a PDF reference manual

As mentioned above, the DESCRIPTION file is used to create the PDF manual. And once again creating the file itself is the easy part. By writing the function documentation and editing the DESCRIPTION you already prepared everything and just need to run a single command:

setwd("your/path/yourpackagename")
system("R CMD Rd2pdf . --title=Package yourpackagename --output=./manual.pdf --force --no-clean --internals")

This ensures your working directory is the main folder of the package, e.g. /yourpackagename/. It will create a file named manual.pdf in the main folder.

blogpost7

Internally, R creates a Latex document and related files first to generate the PDF. These intermediate files are usually deleted, but you may want to manually change something. Including --no-clean in the command skips the deletion. You will also find a folder named .Rd2pdfxxxxx in your main directory. This folder contains the same PDF but with another name (Rd2.pdf) and more importantly, the file Rd2.tex.

blogpost8

You can then open this file in your preferred Latex editor. In my case that is TexMaker. When you just open the file and try to compile it, you will get an error message telling you that “Rd.sty” is missing. The Rd.sty comes with base R since it is needed for the creation of package documentation. Its location depends on the OS you are using. The standard install path is R_HOME/share/texmf. You just need to go this derictory and copy the file into your yourpackagename/.Rd2pdfxxxxx folder. Now you can edit the Rd2.tex file and compile it to a custom PDF reference manual.

blogpost9

7) Making the package a GitHub repository (Bonus)

At this point, you already did what the preface of the post promised! You may want to to additionally put your package on GitHub for others to access it. This is not a post about learning to use git and GitHub though. For that I recommend Karl Broman’s Git/GitHub Guide. The benefit, however, to putting your package onto GitHub is that you can use the devtools install_github() function to install your new package directly from the GitHub page. For example I you might run

devtools::install_github("yourgithubusername/yourpackagename")

8) Do you have an R file I can download?

I do. Here it is.

Leave a Reply

Your email address will not be published. Required fields are marked *