Understanding base documentation functions

15 minute read

Published:

On ? and help()

I am working on a project dealing with documentation in R and recently did a deep-dive into how ? and help() work. This post summarizes what I’ve learned about these functions, first briefly discussing how they “work” in a general sense, then going through their implementations line-by-line to understand the functions at a low-level.

How they “work”

The ? operator is just a convenience function, allowing users to retrieve documentation on objects specified in a variety of ways. Below I’ve included a few examples which all do the same thing, showcasing how flexible ? is:

?anova
?anova()
?anova(lm(speed ~ dist, cars))
?anova(stop())
?"anova"
?stats::anova

It achieves this flexibility by using functions like substitute() and eval() to parse its input, eventually leading to a call to help(), help.search(), .helpForCall(), or .tryHelp().

These are all very similar, and it is sufficient to look into the help() function to understand what is going on. The main way help() works is by using the functions loadedNamespaces(), find.package(), and utils:::index.search() to find the relevant package files and documentation. Again help() has been implemented to be very flexible, accepting arguments in a variety of forms.

How they work: the nitty-gritty

?

First, let’s look at the definition of ?:

`?`
function (e1, e2) 
{
    if (missing(e2)) {
        type <- NULL
        topicExpr <- substitute(e1)
    }
    else {
        type <- substitute(e1)
        topicExpr <- substitute(e2)
    }
    search <- (is.call(topicExpr) && topicExpr[[1L]] == "?")
    if (search) {
        topicExpr <- topicExpr[[2L]]
        if (is.call(te <- topicExpr) && te[[1L]] == "?" && is.call(te <- topicExpr[[2L]]) && 
            te[[1L]] == "?") {
            cat("Contacting Delphi...")
            flush.console()
            Sys.sleep(2 + stats::rpois(1, 2))
            cat("the oracle is unavailable.\nWe apologize for any inconvenience.\n")
            return(invisible())
        }
    }
    if (is.call(topicExpr) && (topicExpr[[1L]] == "::" || topicExpr[[1L]] == 
        ":::")) {
        package <- as.character(topicExpr[[2L]])
        topicExpr <- topicExpr[[3L]]
    }
    else package <- NULL
    if (search) {
        if (is.null(type)) 
            return(eval(substitute(help.search(TOPIC, package = PACKAGE), 
                list(TOPIC = as.character(topicExpr), PACKAGE = package))))
        else return(eval(substitute(help.search(TOPIC, fields = FIELD, 
            package = PACKAGE), list(TOPIC = as.character(topicExpr), 
            FIELD = as.character(type), PACKAGE = package))))
    }
    else {
        if (is.null(type)) {
            if (is.call(topicExpr)) 
                return(.helpForCall(topicExpr, parent.frame()))
            topic <- if (is.name(topicExpr)) 
                as.character(topicExpr)
            else e1
            return(eval(substitute(help(TOPIC, package = PACKAGE), 
                list(TOPIC = topic, PACKAGE = package))))
        }
        else {
            type <- if (is.name(type)) 
                as.character(type)
            else e1
            topic <- if (is.name(topicExpr)) 
                as.character(topicExpr)
            else {
                if (is.call(topicExpr) && identical(type, "method")) 
                  return(.helpForCall(topicExpr, parent.frame(), 
                    FALSE))
                e2
            }
            if (type == "package") 
                package <- topic
            h <- .tryHelp(topicName(type, topic), package = package)
            if (is.null(h)) {
                if (is.language(topicExpr)) 
                  topicExpr <- deparse(topicExpr)
                stop(gettextf("no documentation of type %s and topic %s (or error in processing help)", 
                  sQuote(type), sQuote(topicExpr)), domain = NA)
            }
            h
        }
    }
}
<bytecode: 0x55ed6ac2cbb8>
<environment: namespace:utils>

That’s a big wall of code. We’re going to go through the definition in chunks to better understand what is going on.


Immediately, there’s something I didn’t know before: ? is a function of two arguments! After reading through the documentation, it looks like the optional second argument, e2, allows for documentation of S4 methods. For most use-cases, e2 will never be specified, and the if (missing(e2)) condition will always evaluate to TRUE. This means type will almost always be NULL, and topicExpr will always be e1 (the only argument supplied to ?).

function (e1, e2)
{
    if (missing(e2)) {
        type <- NULL
        topicExpr <- substitute(e1)
    }
    else {
        type <- substitute(e1)
        topicExpr <- substitute(e2)
    }

Above, note the use of substitute(). Advanced R covers how substitute() is used for quoting. This is exactly what’s going on here—it is being used to capture the unevaluated argument, e1, assigning it to the variable topicExpr (technically, substitute() returns a “parse tree”). The rest of the code is going to deal with picking apart topicExpr to determine what documentation to serve up.


Next up, we’re defining a Boolean variable: search:

    search <- (is.call(topicExpr) && topicExpr[[1L]] == "?")

See that search is TRUE whenever 1.) topicExpr is an unevaluated function call and 2.) the first element of the parse tree returned by substitute() is the function ?. Why does this matter? It turns out, ?? is not a function— it is the composition of two ? operators! For example: when you run ??tibble you are actually executing `?`(?tibble).

So, search is TRUE when the double question mark has been used. This makes sense, as ?? is used for a more general search of the documentation (using the help.search() function, as we’ll see later).


Next up is an Easter Egg. Yup, an Easter Egg. But first, if search is TRUE, we remove the additional ? function (the first element of topicExpr) on line 13:

    if (search) {
        topicExpr <- topicExpr[[2L]]
        if (is.call(te <- topicExpr) && te[[1L]] == "?" && is.call(te <- topicExpr[[2L]]) && 
            te[[1L]] == "?") {
            cat("Contacting Delphi...")
            flush.console()
            Sys.sleep(2 + stats::rpois(1, 2))
            cat("the oracle is unavailable.\nWe apologize for any inconvenience.\n")
            return(invisible())
        }
    }

Now, the Easter Egg. The conditional on lines 14-15 evaluates to TRUE if there were four nested ? functions. In this case, a message about the Oracle of Delphi is printed at the console.

????sum
## Contacting Delphi...the oracle is unavailable.
## We apologize for any inconvenience.

Moving on, we now deal with the double and triple colon operators (:: and :::):

    if (is.call(topicExpr) && (topicExpr[[1L]] == "::" || topicExpr[[1L]] == 
        ":::")) {
        package <- as.character(topicExpr[[2L]])
        topicExpr <- topicExpr[[3L]]
    }
    else package <- NULL

In the presence of these operators, we assign the relavent package to the package variable and the function to the topicExpr variable. Below, we have included an example of how this works:

topicExpr <- substitute(ggplot2::geom_point)

topicExpr[[1]]
## `::`
topicExpr[[2]]
## ggplot2
topicExpr[[3]]
## geom_point

If there is no colon operator designating the desired package, package is set to NULL on line 28.


Finally, it’s time to actually access the documentation. First, the case of search being TRUE (??) is taken care of:

    if (search) {
        if (is.null(type)) 
            return(eval(substitute(help.search(TOPIC, package = PACKAGE), 
                list(TOPIC = as.character(topicExpr), PACKAGE = package))))
        else return(eval(substitute(help.search(TOPIC, fields = FIELD, 
            package = PACKAGE), list(TOPIC = as.character(topicExpr), 
            FIELD = as.character(type), PACKAGE = package))))
    }

We see that the function searching through the documentation is help.search() – the variables we have specified thusfar are provided as arguments and we’re done.


Now we take care of the case where search is FALSE and type is NULL. Remember, type is NULL whenever the argument e2 is not supplied— the most common use-case.

    else {
        if (is.null(type)) {
            if (is.call(topicExpr)) 
                return(.helpForCall(topicExpr, parent.frame()))
            topic <- if (is.name(topicExpr)) 
                as.character(topicExpr)
            else e1
            return(eval(substitute(help(TOPIC, package = PACKAGE), 
                list(TOPIC = topic, PACKAGE = package))))
        }

A few interesting things to note. First, we deal with the scenario where topicExpr is a call, in which case the function being used to access documentation is the unexported utils:::.helpforCall(). I haven’t dug through its body, but I think this is to allow users to execute code like ?sum() (instead of the more typical ?sum) But, I’ve noticed that it doesn’t work for everything—for an example run ?c() .

Starting on line 41, we have the main way ? leads to documentation. This is how code like ?sum is evaluated, via a call to help() on line 44. Notice that substitute() is being used in a slightly different way than before, substituting the values in the “environment” defined on line 45 before eval() is run. (It is used the same way in the previous code chunk, on lines 31 and 33.)


The rest of the code is just dealing with the case where type and topic was specified by e1 and e2 arguments, respectively. It’s really just repeating what we’ve seen before, with the small addition of using utils:::.tryHelp() and utils:::topicName() functions.

        else {
            type <- if (is.name(type)) 
                as.character(type)
            else e1
            topic <- if (is.name(topicExpr)) 
                as.character(topicExpr)
            else {
                if (is.call(topicExpr) && identical(type, "method")) 
                  return(.helpForCall(topicExpr, parent.frame(), 
                    FALSE))
                e2
            }
            if (type == "package") 
                package <- topic
            h <- .tryHelp(topicName(type, topic), package = package)
            if (is.null(h)) {
                if (is.language(topicExpr)) 
                  topicExpr <- deparse(topicExpr)
                stop(gettextf("no documentation of type %s and topic %s (or error in processing help)", 
                  sQuote(type), sQuote(topicExpr)), domain = NA)
            }
            h
        }
    }
}

So, what have we learned? First, ? is a convenience function serving as a wrapper around functions like help() and help.search(). It has a lot of quoting going on so that the user can use syntax they would expect to refer to objects (e.g. ?ggplot2::geom_point, ??knn, ?sum(), ?sum).

help()

Now that we have a good grasp on what’s going on with ?, let’s see how help() works. Let’s start by echoing the body of the function:

help
function (topic, package = NULL, lib.loc = NULL, verbose = getOption("verbose"), 
    try.all.packages = getOption("help.try.all.packages"), help_type = getOption("help_type")) 
{
    types <- c("text", "html", "pdf")
    help_type <- if (!length(help_type)) 
        "text"
    else match.arg(tolower(help_type), types)
    if (!missing(package)) 
        if (is.name(y <- substitute(package))) 
            package <- as.character(y)
    if (missing(topic)) {
        if (!is.null(package)) {
            if (interactive() && help_type == "html") {
                port <- tools::startDynamicHelp(NA)
                if (port <= 0L) 
                  return(library(help = package, lib.loc = lib.loc, 
                    character.only = TRUE))
                browser <- if (.Platform$GUI == "AQUA") {
                  get("aqua.browser", envir = as.environment("tools:RGUI"))
                }
                else getOption("browser")
                browseURL(paste0("http://127.0.0.1:", port, "/library/", 
                  package, "/html/00Index.html"), browser)
                return(invisible())
            }
            else return(library(help = package, lib.loc = lib.loc, 
                character.only = TRUE))
        }
        if (!is.null(lib.loc)) 
            return(library(lib.loc = lib.loc))
        topic <- "help"
        package <- "utils"
        lib.loc <- .Library
    }
    ischar <- tryCatch(is.character(topic) && length(topic) == 
        1L, error = function(e) FALSE)
    if (!ischar) {
        reserved <- c("TRUE", "FALSE", "NULL", "Inf", "NaN", 
            "NA", "NA_integer_", "NA_real_", "NA_complex_", "NA_character_")
        stopic <- deparse1(substitute(topic))
        if (!is.name(substitute(topic)) && !stopic %in% reserved) 
            stop("'topic' should be a name, length-one character vector or reserved word")
        topic <- stopic
    }
    paths <- index.search(topic, find.package(if (is.null(package)) 
        loadedNamespaces()
    else package, lib.loc, verbose = verbose))
    try.all.packages <- !length(paths) && is.logical(try.all.packages) && 
        !is.na(try.all.packages) && try.all.packages && is.null(package) && 
        is.null(lib.loc)
    if (try.all.packages) {
        for (lib in .libPaths()) {
            packages <- .packages(TRUE, lib)
            packages <- packages[is.na(match(packages, .packages()))]
            paths <- c(paths, index.search(topic, file.path(lib, 
                packages)))
        }
        paths <- paths[nzchar(paths)]
    }
    structure(unique(paths), call = match.call(), topic = topic, 
        tried_all_packages = try.all.packages, type = help_type, 
        class = "help_files_with_topic")
}
<bytecode: 0x55ed6cd5bc70>
<environment: namespace:utils>

Of course, we’ll break this down into more digestible chunks again.


First, we are determining what format of documentation to get. There’s weird argument matching going on, but the main idea is that help can be served up in three forms: text, html, and pdf. By default, help() looks at the global option "help_type" for this.

function (topic, package = NULL, lib.loc = NULL, verbose = getOption("verbose"),
  try.all.packages = getOption("help.try.all.packages"), help_type = getOption("help_type")) 
{
    types <- c("text", "html", "pdf")
    help_type <- if (!length(help_type)) 
        "text"
    else match.arg(tolower(help_type), types)

Next, if the package argument is specified, we check that it is a name after it’s been quoted. If it is, the quoted argument is coerced into a string for later.

    if (!missing(package)) 
        if (is.name(y <- substitute(package))) 
            package <- as.character(y)

Now, we deal with the scenario where topic is not specified. This is not the typical case, topic is the first formal of help. For example, when you run help(geom_point) you’re setting topic = geom_point. However, this allows for things like help(package = ggplot2).

This is what’s going on in lines 13-27, help() is figuring out how to call library() correctly, given the user’s environment. It turns out that in addition to loading/attaching packages, when the help argument of library() is specified it returns information regarding the specified package (in an object of class "packageInfo").

    if (missing(topic)) {
        if (!is.null(package)) {
            if (interactive() && help_type == "html") {
                port <- tools::startDynamicHelp(NA)
                if (port <= 0L) 
                  return(library(help = package, lib.loc = lib.loc, 
                    character.only = TRUE))
                browser <- if (.Platform$GUI == "AQUA") {
                  get("aqua.browser", envir = as.environment("tools:RGUI"))
                }
                else getOption("browser")
                browseURL(paste0("http://127.0.0.1:", port, "/library/", 
                  package, "/html/00Index.html"), browser)
                return(invisible())
            }
            else return(library(help = package, lib.loc = lib.loc, 
                character.only = TRUE))
        }
        if (!is.null(lib.loc)) 
            return(library(lib.loc = lib.loc))
        topic <- "help"
        package <- "utils"
        lib.loc <- .Library
    }

Note, we’re making use of the lib.loc argument. It specifies the location of the R library trees on the user’s machine. By default, its value is NULL—this corresponds to the libraries according to .libPaths().


Next is some simple cleaning-up of topic (which, at this point, we know was specified). Lines 35 and 36 are interesting, note the wrapping of the conditional in a TryCatch().

    ischar <- tryCatch(is.character(topic) && length(topic) == 
        1L, error = function(e) FALSE)
    if (!ischar) {
        reserved <- c("TRUE", "FALSE", "NULL", "Inf", "NaN", 
            "NA", "NA_integer_", "NA_real_", "NA_complex_", "NA_character_")
        stopic <- deparse1(substitute(topic))
        if (!is.name(substitute(topic)) && !stopic %in% reserved) 
            stop("'topic' should be a name, length-one character vector or reserved word")
        topic <- stopic
    }

After this chunk, we know that topic is a string of length 1.


The next step is to use the unexported function utils:::index.search() to search through relevant package for topic. This involves either searching through the entire set of packages in loadedNamespaces() or the specified package.

    paths <- index.search(topic, find.package(if (is.null(package)) 
        loadedNamespaces()
    else package, lib.loc, verbose = verbose))

Below, I’ve included (truncated) output from a few of these functions so that you can see what’s going on:

loadedNamespaces()[1:4]
## [1] "grDevices" "digest"    "R6"        "jsonlite"

find.package(loadedNamespaces())[1:4]
## [1] "/usr/lib/R/library/grDevices"                           
## [2] "/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/digest"  
## [3] "/usr/local/lib/R/site-library/R6"                       
## [4] "/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.1/jsonlite"

utils:::index.search("anova", find.package(loadedNamespaces()))
## [1] "/usr/lib/R/library/stats/help/anova"

Next, we’re cleaning up the try.all.packages argument. This sequence of logical operators works together to 1.) coerce try.all.packages into a logical and 2.) ensure try.all.packages is FALSE if at all possible (according to the documentation, if try.all.packages is TRUE there might be performance issues).

By default (and when it is called from ?), try.all.packages is FALSE, so this isn’t of much consequence.

    try.all.packages <- !length(paths) && is.logical(try.all.packages) && 
        !is.na(try.all.packages) && try.all.packages && is.null(package) && 
        is.null(lib.loc)

Here is where try.all.packages is used. If it is TRUE, an index.search() is performed for topic in every package in the.libPaths() directory with results being included in paths.

    if (try.all.packages) {
        for (lib in .libPaths()) {
            packages <- .packages(TRUE, lib)
            packages <- packages[is.na(match(packages, .packages()))]
            paths <- c(paths, index.search(topic, file.path(lib, 
                packages)))
        }
        paths <- paths[nzchar(paths)]
    }

Finally, we have the end of help(). This is the code that fetches/loads the relevant documentation. If everything has gone correctly, R will try to find a way to show you the corresponding documentation. (If try.all.packages is TRUE, a search results page will be shown instead). In Rstudio, for example, the documentation file will appear in the “Help” pane.

    structure(unique(paths), call = match.call(), topic = topic, 
        tried_all_packages = try.all.packages, type = help_type, 
        class = "help_files_with_topic")
}

Why does this structure() call result in the documentation being displayed? I have no idea. I imagine it has something to do with the print method of the "help_files_with_topic" class— I haven’t been able to find great documentation on these details.


Try it for yourself! Run the following code, it should bring up the documentation for stats::anova():

temp_pkgs <- find.package(
  if (TRUE) loadedNamespaces() else "stats", 
  lib.loc = NULL, verbose = getOption("verbose")
)

temp_path <- utils:::index.search("anova", temp_pkgs)

structure(temp_path, call = match.call(), topic = "anova", 
    tried_all_packages = FALSE, type = "html", 
    class = "help_files_with_topic")

So, what have we learned? Documentation for functions of loaded packages are accessed via a combination of the functions: loadedNamespaces(), find.package(), utils:::index.search(). If we want to access documentation of functions for packages that are not loaded, we need to use the functions: .libPaths(), .packages(), and utils:::index.search().

We also know that the rendering of help pages is manages via the "help_files_with_topics" class. But, this merits further research—its still pretty unclear to me how it works.