Sept. 7-11, 2015

Writting and calling functions in R.

R Functions skeleton

Source : Hands-On Programming with R

  • The reserved word function is used to declare a function in R.
  • Followed by parenthesis containing the list of the formal argument or parameters. They are are a property of the function, whereas the actual or calling arguments can vary each time you call the function.
  • The statements within the curly braces form the body of the function (braces are optional if the body contains only a single expression).
  • This function object is bound to a name via the assignment operator.

R Functions skeleton

A simple example:

identity <- function(x = NULL) {
  x
}

identity
## function(x = NULL) {
##   x
## }
formals(identity)
## $x
## NULL
body(identity)
## {
##     x
## }

If you want to learn about a function and learn programming inspect experienced programmers code!

In RStudio, write gl() in the editor, place cursor on a function and press F2.

Can you figure out what it does?…

What happens is a function call

“To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call."
— John Chambers

Every operation in R is a function call, even the most unsuspected ones. For example:

x <- 1; y <- 3
x + y
## [1] 4
`+`(x, y)
## [1] 4

Also:

`<-`("a", 1) # actually assign() is more powerfull
a
## [1] 1

BTW, functions can return invisible values, which are not printed out by default when you call the function. The assign operator is one of them. You can force invisible values to be printed to the screen by enclosing them in between parenthesis:

a <- 2 # nothing is printed
(a <- 2)
## [1] 2

However the fact that assignment returns the assigned value allows to use it in constructs such as:

rm(x) # removing previous instance of object x
exists("x") # make sure x does not exist anymore
## [1] FALSE
y <- mean(x <- round(runif(4, max = 10)))
x
## [1]  2 10  5  4

Even when you type a variable name in the interpreter and press enter, you actually run a function:

eval(a)
## [1] 2
a
## [1] 2

A function call


Source: Advanced R

h <- function(x) {
  a <- 2
  x + a
}
h(x = 1)
## [1] 3
  • create a runtime or execution environment, a sort of cocoon.
  • a set of variables is created by associating the names (the tags) of the formal parameters to values of the call arguments.
  • R function arguments are only evaluated (in the calling environment) when they’re actually used: lazy evaluation
  • The code is executed within the runtime/execution environment.
  • When the function stops because it returns or for other reasons (errors), what is inside this coccon is lost unless something is done to explicitly export objects outside or modify external objects.

Note that for example in there:

h <- function(x) {
  a <- 2
  x + a
}
a <- 0
h(x = 1)
## [1] 3
a
## [1] 0

a in not modified by the call to h()! It is the cocoon effect! Variables oustide the execution environment are protected (to some extent).

Function calls: arguments matching rules

  • It is not necessary to specify all the arguments to R functions (because of default values).
  • Unamed arguments and partial argument names are also accepted.

Because of this fuzziness, it is important to be clear about which argument corresponds to which formal parameter of the function.
This argument matching process follows rules:

  • named argument are matched first (exact names, then partial names)
  • unnamed arguments are matched in the order given, to any unmatched formal arguments, in the order they appear in the function declaration.
# These calls are all equivalent
sample(x = 1:10, size = 10, replace = TRUE)
sample(1:10, 10, TRUE)
sample(x = 1:10, replace = TRUE, s = 10)

Function calls: arguments matching rules

A few more rules and advices when you call a function:

  • Only omit the name of the first 1-2 arguments, the most commonly used.
  • Avoid using positional matching for obscure arguments, and only use readable abbreviations with partial matching.
  • Named arguments should always come after unnamed arguments. If a function uses … (discussed in more detail below), you can only specify arguments listed after … with their full name.

Writing functions : arguments declaration et al.

args(write) # Let's display the formal arguments of a function:
## function (x, file = "data", ncolumns = if (is.character(x)) 1 else 5, 
##     append = FALSE, sep = " ") 
## NULL
  • In your functions, think carefully about the order of the arguments in the definition.
  • Default arguments can be expressed relative to the value of other arguments.
  • Default arguments are evaluated inside the function.

How can you specify default values of optional arguments in your functions:

  • Set an appropriate value to the formal argument in the function declaration with the "=". Note that Default values can be defined in terms of other arguments
  • Set the value to NULL and check in the body if the argument was supplied with is.null(). Do something if TRUE (assign a default value, issue a waraning, …)
  • Similarly, checking if the argument was supplied with missing(param). Drawback: it does not make it obvious to the user that a default value is required and that something will happen if the argument is missing..

Writing functions : arguments declaration et al.

The ... or ellipsis argument:

  • This argument will match any arguments not otherwise matched, and can be easily passed on to other functions.
  • This is useful if you want to collect arguments to call another function inside its body, but you don’t want to freeze their possible names in advance.
  • It is very frequent in plotting functions and those of the àpply familly (discussed later).
SquareRnorm <- function(n, ...) {
rnorm(n = n, ...) ^ 2
}

SquareRnorm(1)
## [1] 0.193066
SquareRnorm(1, mean = 200)
## [1] 39645.03
SquareRnorm(1, sd = 1000)
## [1] 507550.2

Writing functions : arguments declaration et al.

This is one of the first few lines of the body of colMeans():

if (!is.array(x) || length(dn <- dim(x)) < 2L) stop("'x' must be an array of at least two dimensions")

It is indeed typical at the beginning of a function to test that arguments (here x) meet certain conditions that are absolutely required for proper execution of the function.

If these conditions are not met, execution is stopped . Notice the || operator that is often used in this context.

Inside a function body you can use variables that are defined outside of the functions execution environment, R will look for them (scoping rules apply). But it is not recommended because this is not very transparent for a user. It is better to define a parameter that will be passed to the function body.

Return value and side effects

The result of the evaluation of the last expression in a function becomes the return value, the result of invoking the function.

A function can exit and return a value prematurely using the return(valueObject) function:

Bet <- function() {                  # Notice that it has no formal argument
  outcome <- sample(c(TRUE, FALSE), 1)
  if(!outcome) return("Looser!")      # Early termination
  cat("You are an oracle!! Here is your reward.\n")
  "100000000 Euros"
}
Bet()
## [1] "Looser!"

Return value and side effects

Functions can return only a single object. But this is not a limitation because you can return a list containing any number of objects.

sixnum <- function(x) {
  if (missing(x)) x <- runif(10)
  fiveNum <- fivenum(x)
  count <- length(x)
  list(TukeyFiveNumber = fiveNum, CountOfObs = count)
}
str(sixnum())
## List of 2
##  $ TukeyFiveNumber: num [1:5] 0.0606 0.219 0.4187 0.6665 0.8666
##  $ CountOfObs     : int 10

Return value and side effects

The function setwd() has a side effect (change the working directory) and an (invisible) return value: a character string of the working directory in effect before the change.

oldWd <- setwd(.libPaths()[1])
getwd()
## [1] "/home/cunnac/R/x86_64-pc-linux-gnu-library/3.2"
oldWd
## [1] "/media/cunnac/DONNEES/CUNNAC/Lab-Related/Communications/Teaching/R_trainning_module/slides/RmdFilesLatestVersions"
setwd(oldWd)

What is going on here?

Many graphical function have no return values (TRUE?) and are used only for their side effect that is to print a plot into the current graphic device as we will see later. Other than that, most R functions have a return value

Return value and side effects

It is advisable when you write your own functions to keep them as "pure" as possible by avoiding undesirable side effects.

For example, what happens here?

x <- 1
Increment <- function(by = 1) {x <<- x + by ; cat("Incremented 'x'!")}
Increment(by = 2)
## Incremented 'x'!
x
## [1] 3

Note the use of the "super assignment" operator that will reassign x in a parent environment of the function's execution environment. Use with caution…

It may be better to write Increment() differently to make it explicit what it is doing and not to mess with outside variables.

x <- 1
BetterIncrement <- function(numb, by = 1) {numb <- numb + by}
x <- BetterIncrement(numb = x, by = 2)
x
## [1] 3

Exercice

Source: Advanced R

Clarify the following list of odd function calls (use the help pages):

x <- sample(replace = TRUE, 20, x = c(1:10, NA))
y <- runif(min = 0, max = 1, 20)
cor(m = "k", y = y, u = "p", x = x)

What does this function return if called with no value? With f(ls())? Why? Which principle does it illustrate?

f <- function(x = ls()) {
  a <- 1
  x
}
# ls() evaluated inside f:
f()
## [1] "a" "x"

# ls() evaluated in global environment:
f(ls())
##  [1] "a"               "Bet"             "BetterIncrement"
##  [4] "f"               "h"               "identity"       
##  [7] "Increment"       "oldWd"           "sixnum"         
## [10] "SquareRnorm"     "x"               "y"

What does this function return? Why? Which principle does it illustrate?

z <- 1
f2 <- function(x = z) {
  z <- 100
  x
}
f2()
## [1] 100
z
## [1] 1

Exercice:

Create virtual cubic dice embedded in a function. Each time you toss the dice (invoke the function) it will return a value between 1 and 6.

RunDice <- function() {    # How did you call YOUR function BTW?
  ceiling(6*runif(1)) # could have been done with sample()
}
RunDice()
## [1] 5

What does ceilling() do? What for?

Exercice

Adapted from: Un peu d'R

Write a function that computes the perimeter and surface of a rectangle using l1 and l2, the measures of its sides.
It should return these values in a data frame together with the width and length.

Untangle <- function(l1 = NULL, l2 = NULL) {
  if (is.null(l1) && is.null(l2)) {l1 <- l2 <- 0} # what happens if called without arguments
  if (is.null(l1)) l1 <- l2                       # deal with the square issue
  if (is.null(l2)) l2 <- l1
  p <- (l1 + l2)*2
  s <- l1*l2
  data.frame(width = min(l1,l2), lenght = max(l1,l2), # decide on what is w and what is l
       perimeter = p, surface = s)
}
# Lets test the function:
Untangle()
##   width lenght perimeter surface
## 1     0      0         0       0
Untangle(2)
##   width lenght perimeter surface
## 1     2      2         8       4
Untangle(l1 = 2)
##   width lenght perimeter surface
## 1     2      2         8       4
Untangle(l2 = 2)
##   width lenght perimeter surface
## 1     2      2         8       4
Untangle(l1 = 1 , l2 = 2)
##   width lenght perimeter surface
## 1     1      2         6       2
Untangle(l1 = 2 , l2 = 1)
##   width lenght perimeter surface
## 1     1      2         6       2