Sept. 7-11, 2015

Mathematical and statistical computations on numeric and logical vectors: basic functions and operators

An outlook on built-in basic mathematical and statistical functions

  • Arithmetic Operators : just like on your pocket calculator:
x <- c(2, -1, 0) ; y <- 1:3
+y
## [1] 1 2 3
-x # change the sign
## [1] -2  1  0
x + y # addition
## [1] 3 1 3
x - y # substraction
## [1]  1 -3 -3
x * y # multiplication
## [1]  2 -2  0
x / y # division
## [1]  2.0 -0.5  0.0
x ^ y # power
## [1] 2 1 0
x %% y # modulo
## [1] 0 1 0
x %/% y # integer division
## [1]  2 -1  0

If you want a larger sample:

?Arithmetic 
?sqrt
?complex
?log
?Trig
?round
?Special # factorial, number of combinations of k elements in n...

R is stuffed with lots of useful functions to calculate esoteric things.
Take a look at the cheatsheet and use Google if you have specific needs.

  • Summary functions: work on vectors and return single (scalar) value in a vector.
x <- 10:5
mean(x)
## [1] 7.5
median(x)
## [1] 7.5
sd(x) # standard deviation
## [1] 1.870829
max(x)
## [1] 10
min(x) # see also range() and parallelized pmin et pmax
## [1] 5
sum(x)
## [1] 45
prod(x)
## [1] 151200
length(x) # more a 'language' function but is a summary as well
## [1] 6

Many of the functions returning a summary of a vector have the default argument na.rm = FALSE to decide on what to do in case the vector contains NAs.

  • Window functions
    • Unlike summary/aggregation function (e.g. sum()), a window function takes n inputs and returns n values.
    • Unlike simple vectorised functions like cos(), the computation of the returned value i depend on more elements of the input vector than the single ith one.
    • Offset functions dplyr::lead() and dplyr::lag() return values 'shifted' by one position relative to the index of the input vector.
x <- 1:5
dplyr::lead (x)
## [1]  2  3  4  5 NA
dplyr ::lag (x)
## [1] NA  1  2  3  4
    • Cumulative aggregates
cumsum(x)
## [1]  1  3  6 10 15
cummin(c(2:4, 0, 3:6))
## [1] 2 2 2 0 0 0 0 0
    • Ranking functions: return the ranks of the values in x, to be basics discussed later.

Exercice

  • Calculate 5 to the power 3.
5^3
## [1] 125
  • What will be the absolute values of applying this function x – 0.5 *pi to x when x is 0,1,2,3,4 and 1 million?
x <- c(1:4, 1e6)
abs(x-0.5*pi)
## [1] 5.707963e-01 4.292037e-01 1.429204e+00 2.429204e+00 9.999984e+05
  • Let's take the sequence of numbers:
x <- c(2, 4, 6, 8)

Using dplyr::lag(), calculate the difference between each element of x and the element immediately before. Is this an arithmetic sequence?

x - dplyr::lag(x)
## [1] NA  2  2  2

Relational Operators

  • Return logical vectors corresponding to the result of the test for numerical (character) equalities, inequalities, …
x < y: is x less than y?  
x > y: is x greater than y?  
x <= y: is x less than or equal to y?  
x >= y: is x greater than or equal to y?  
x == y: is x equal to y?  
x != y: is x not equal to y?
  • These binary operators are vectorized and apply vector recylcing rules
1:2 == (1+1)*(1:2)/2 # vectorized
## [1] TRUE TRUE
1:2 == 1:4 # recycling if not of same length
## [1]  TRUE  TRUE FALSE FALSE
  • Sameness (object identity) vs. content equality
2==2L # values are coerced to a common type and tested
## [1] TRUE
identical(2, 2L) # this tests object identity.
## [1] FALSE
NA == 2L # makes sense...
## [1] NA

  • More about equality testing : precision
    all.equal(....) tests equality and tolerates small differences.

  • Handy functions in connection to relational operators and logical vectors.
    • which() gives the TRUE indices of a logical object, allowing for array indices.
which(c(FALSE, TRUE, FALSE, TRUE))
## [1] 2 4
    • match() returns a vector of the positions of (first) matches of its first argument in its second.
match(x =c("c", "f"), table = letters, nomatch = 0) # index of first matches of x in table
## [1] 3 6
match(x = 2, table = c(1,2,3,1,2,3))
## [1] 2
match(x = c(1,2,3,1,2,3), table = 2, nomatch = 0)
## [1] 0 1 0 0 1 0
match(x = c(1,2,3,1,2,3), table = c(1,2), nomatch = 0)
## [1] 1 2 0 1 2 0

    • The %in% operator determines whether each value in the left operand can be matched with one of the values in the right operand and returns a logical vector. It is very handy for subsetting (will see a bit later).
# all return the same thing:
match(x = c(1,2,3,1,2,3), table = c(1,2), nomatch = 0) > 0
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
c(1,2,3,1,2,3) %in% c(1,2) 
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
is.element(c(1,2,3,1,2,3), c(1,2))
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
    • all(), any(), see exercice.

Exercice

  • Run this and try to understand the results:
0 == "0"
0 == FALSE
  • What should then be the result of "0" == FALSE
"0" == FALSE
## [1] FALSE
  • What about "2" == 2?
"2" == 2
## [1] TRUE

Exercice

Let's take the vectors v and w defined below:

v <- c(3,5)
w <- c(1,5,3,8)

I ran v >= w and got the follwing result: [1] TRUE FALSE TRUE TRUE.
Is there something wrong with my R engine? Why?

v >= w
## [1]  TRUE  TRUE  TRUE FALSE

Exercice

Using the sum() function on logical vectors, how would you emulate the all() and any() functions.
Test your code with x and y below.

x <- c(TRUE, TRUE, TRUE)
y <- c(FALSE, TRUE, FALSE, FALSE)
# all()
sum(x) == length(x)
## [1] TRUE
identical(sum(x) == length(x), all(x))
## [1] TRUE
sum(y) == length(y)
## [1] FALSE
# any()
sum(y) != 0
## [1] TRUE
any(y)
## [1] TRUE

# but, if you try that with
w <- c(FALSE, TRUE, NA, FALSE)
sum(w) != 0
## [1] NA
any(w)
## [1] TRUE

# and
sum(w) == length(w)
## [1] NA
all(w)
## [1] FALSE

# -> Need to be aware of missing values!!!
# -> A little code testing never hurts...

Operator precedence

  • Operator precedence rules define the order in which a series of operators have to be computed in an expression.
  • For example, in the mathematical expression below, * has precedence relative to +. The parenthesis force its content to be calculated first:
    • 2+3×4=14
    • (2+3)×4=20
  • Here is a table of R's unary and binary operators listed in precedence groups, in decrasing priority. Type ?Syntax for a more readable version of this table from the documentation.

Do not hesitate to use the parenthesis to more clearly delineate in what order things are to happen.
It helps you and people that read your code understand the underlying order.

Exercice:

Try to guess the result of the evaluation of these expressions: Advice: in nested operations like these, first resolve inner operations and then proceed outwardly.

2*1 + 2 >= 5

2 * (1+2) >= 5

!c("C", "logic") %in% c("A", "C", "D", "C")
c("C", "logic") %in% c("A", "C", "D", "C")

2*1+2 >= 5 & c("C", "logic") %in% c("A", "C", "D", "C")

!TRUE | 2*1+2 >= 5 & c("C", "logic") %in% c("A", "C", "D", "C")
FALSE | 2*(1+2) >= 5 & c("C", "logic") %in% c("A", "C", "D", "C")

FALSE || 2*(1+2) >= 5 & c("C", "logic") %in% c("A", "C", "D", "C")
(FALSE || 2*1+2 >= 5) & c("C", "logic") %in% c("A", "C", "D", "C")