To subset a vector x
, use an indexing vector idx
(that can be a scalar) that will be placed within the[]
operator and refers to the elements of x
that should be returned: x[idx]
idx
can be one of three types:
To illustrate these alternatives, let’s first create an integer vector:
x <- 101:105 # the ':' is a shorthand for the function seq() that we will see later.
# Name the elements
names(x) <- c("A", "B", "C", "D", "E")
# OR
x <- setNames(101:105, c("A", "B", "C", "D", "E"))
x
## A B C D E
## 101 102 103 104 105
Source : Hands-On Programming with R
x[c(2, 3, 3)] # several elements, the same position(s) can occur several times
x[3:5] # generate a sequence of numerics and subset with it.
x[c(1, 105)] # out of bound index values generate NAs
x[-2] # Everything but the second element.
NB: One cannot mix positive en negative indexes
x
but the first.length()
functionLogical vectors keep elements at positions corresponding to TRUE, recycled if necessary without warning.
Source : Hands-On Programming with R
x[c(TRUE, FALSE, FALSE, TRUE, TRUE)]
## A D E
## 101 104 105
x[c(T, F)] # if logical vector is too short, it is recycled.
## A C E
## 101 103 105
v <- x[c(T, T, T, T, T, T, T)]
v # if logical vector is too long, NAs are returned
## A B C D E <NA> <NA>
## 101 102 103 104 105 NA NA
Logical indexing is often employed to select specific subsets meeting some condition of interest:
idx <- x > 102 & x <= 104 # logical vector
idx
## A B C D E
## FALSE FALSE TRUE TRUE FALSE
x[idx]
## C D
## 103 104
Select all values in x
that are ±1 standard deviations away from the mean.
Select all elements in v
that are not NA
.
When you get the chance, type apropos("^is.")
and demo("is.things")
to have a sense of the tons of functions that allows to test the nature of R objects.
Character vectors select elements with matching names. Note that partial matching is not allowed.
x[c("A", "A", "D")]
## A A D
## 101 101 104
Can you ‘emulate’ name indexing by calling match()
on the vector of names?
idx <- match(c("A", "A", "D"), names(x))
idx
## [1] 1 1 4
x[idx]
## A A D
## 101 101 104
Name indexing is also very handy to create look-up tables (in the form oldValue = “new value”) to recode a variable :
v <- c("three", "four", "one", "two", "three", "one", "four") # Vector to be recoded.
lookUp <- c(one = "un", two = "deux", three = "trois", four = "quatre") # look-up table
v
## [1] "three" "four" "one" "two" "three" "one" "four"
unname(lookUp[v]) # recoding and getting ride of names
## [1] "trois" "quatre" "un" "deux" "trois" "un" "quatre"
Can you think of another way to do that with utilities for factors?
w <- as.factor(v)
levels(w)
## [1] "four" "one" "three" "two"
levels(w) <- c("quatre", "un", "trois", "deux")
as.character(w)
## [1] "trois" "quatre" "un" "deux" "trois" "un" "quatre"
All subsetting operators can be combined with assignment to modify selected values of the input vector. The rest of the vector is unaffected.
oldX <- x
x[1] <- 1
x < 103
## A B C D E
## TRUE TRUE FALSE FALSE FALSE
x[ x < 103 ] <- 0
x
## A B C D E
## 0 0 103 104 105
x[1:4] <- 0:1 # The right hand side vector is recycled once to match the length of the 'subseted' vector
x
## A B C D E
## 0 1 0 1 105
x[1:2] <- 10:14 # WARNING + the subset of x is modified with the first elements of the replacement vector
## Warning in x[1:2] <- 10:14: number of items to replace is not a multiple of
## replacement length
x
## A B C D E
## 10 11 0 1 105
x[c(1, NA)] <- c(1, 2) # You CAN'T combine integer indices with NA
## Error in x[c(1, NA)] <- c(1, 2): NAs are not allowed in subscripted assignments
x
## A B C D E
## 10 11 0 1 105
x[c(T, F, NA)] <- 1000 # in logical indices NA are treated as false
x
## A B C D E
## 1000 11 0 1000 105
To delete elements, just subset what you want and re-assign the name of your object to it.
Assignment with a logical vector is widely used as a substitute to for
-if
or ifelse()
constructs (described later).
x <- 1:4
isOddX <- as.logical(x %% 2) # modulo 2 is not 0
x[which(isOddX)] # even numbers
## [1] 1 3
x[isOddX] <- x[isOddX] + 1 # do something about odd numbers
x
## [1] 2 2 4 4
Will be illustrated in some Exercices further down the road!
The rev()
function returns a reversed version of its argument.
rev(LETTERS)
## [1] "Z" "Y" "X" "W" "V" "U" "T" "S" "R" "Q" "P" "O" "N" "M" "L" "K" "J"
## [18] "I" "H" "G" "F" "E" "D" "C" "B" "A"
Can you think of a way to reverse the LETTERS
vector without this function?
How would you append value(s) in x
at the right end of vector v
? There is a fastidious way and a simple one to do that.
x <- 5:8
v <- 1:4
?Syntax
How would you insert the values of x
at a specific location within v
, rather than the end?
Use append()
If you are curious, it can be interesting to look at what append()
is actually doing with the F2 key.
How can you extract consonants with the vector of vowels? Tip: the built-in constant letters
contains the 26 lower-case letters of the Roman alphabet.
It is often necessary to generate regular sequences or patterns of values, for exemple when you want to assign replicated levels of factors to experimental units.
In R there are at least two base functions to do this kind of work:
seq(from, to, by, length.out, along.with, ...)
which is a generalization of the from:to
operator."seq(from = 1, to = 6)
## [1] 1 2 3 4 5 6
seq(from = 1, to = 6, by = 2)
## [1] 1 3 5
seq(from = 1, by = 2, length.out = 3)
## [1] 1 3 5
seq(along.with = v)
## [1] 1 2 3 4
seq(5)
## [1] 1 2 3 4 5
seq(length.out = 4)
## [1] 1 2 3 4
rep(x, times, each, length.out)
s <- c("a", "b", "c")
rep(s, times = 2)
## [1] "a" "b" "c" "a" "b" "c"
rep(s, each = 2)
## [1] "a" "a" "b" "b" "c" "c"
rep(s, times = 1:length(s))
## [1] "a" "b" "b" "c" "c" "c"
rep(s, each = 3, times = 2)
## [1] "a" "a" "a" "b" "b" "b" "c" "c" "c" "a" "a" "a" "b" "b" "b" "c" "c"
## [18] "c"
rep(s, each = 2, length.out = 4)
## [1] "a" "a" "b" "b"
rep(s, each = 2, length.out = 10)
## [1] "a" "a" "b" "b" "c" "c" "a" "a" "b" "b"
gl()
generates factors by specifying the pattern of their levels.Write the expressions that generated the following patterns:
1 2 3 1 2 3 1 2 3
4 3 2 1 4 3 2 1 4 3 2 1
1 1 1 2 2 2 3 3 3 4 4 4
"un" "un" "un" "deux" "deux" "deux" "deux" "deux" "deux"
1.0 1.0 1.5 1.5 2.0 2.0 2.5 2.5 1.0 1.0 1.5 1.5 2.0 2.0 2.5 2.5
Let’s take:
x <- seq(1, 20, by = 2)
x
## [1] 1 3 5 7 9 11 13 15 17 19
Extract every third elements of x
. You will do that both using a logical and an integer index.
Integer indexing:
Logical indexing:
R boost one of the best random generators and offers functions to easily generate random numbers from various distributions.
Density (function prefix d
), cumulative distribution function (p
), quantile function (p
) and random variate generation (r
) for many standard probability distributions are available in the stats package. Look at ?distribution
.
set.seed()
that will set the seed of R‘s random number generator, which is useful for creating simulations or random objects that can be reproduced.An exemple, generating random samples from a normal distribution:
set.seed(124)
rnorm(n = 5 , mean = 10, sd = 3)
## [1] 5.844788 10.114970 7.710910 10.636918 14.276614
set.seed(421)
rnorm(n = 5 , mean = 10, sd = 3)
## [1] 12.41448 11.70813 13.04790 13.85125 10.22423
set.seed(124)
rnorm(n = 5 , mean = 10, sd = 3)
## [1] 5.844788 10.114970 7.710910 10.636918 14.276614
Or from a uniform distribution:
runif(n = 5, min = 0, max = 10)
## [1] 7.717069 8.568504 7.581080 8.503020 4.092967
The sample()
function is used to draw a random sample from a given population. It can be used to sample with or without replacement by using the replace
argument (the default is F).
A few examples:
sample(x = month.abb, size = 5)
## [1] "Jan" "Jul" "Aug" "Oct" "Dec"
sample(x = month.abb, size = 13)
## Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
sample(x = month.abb, size = 13, replace = TRUE )
## [1] "Aug" "Aug" "Jan" "May" "May" "Mar" "Nov" "Apr" "Oct" "May" "Oct"
## [12] "Apr" "Jun"
sample(x = c(0,1), size = 20, replace = TRUE, prob = c(0.1, 0.9))
## [1] 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
sort()
.x <- c(13,5,12,5)
sort(x, decreasing = TRUE)
## [1] 13 12 5 5
order()
is actually more flexible in the sense that it allows to sort objects based on several sorting keys. We will use it for data frames later.In contrast to sort()
, it does not return the input object but a vector of integer representing the indices of the elements of the input. These indices are permuted to reflect the increasing or decreasing order of the input object.
Let’s see an example…
someMonths <- c(sample(x = month.abb, size = 13, replace = TRUE ), NA)
someMonths
## [1] "Mar" "Mar" "May" "Mar" "Sep" "Mar" "Jul" "Jun" "Jun" "Jan" "Apr"
## [12] "Jul" "Jan" NA
idx <- order(someMonths, na.last = FALSE, decreasing = FALSE) # Note the optional arguments!!
idx
## [1] 14 11 10 13 7 12 8 9 1 2 4 6 3 5
someMonths[idx]
## [1] NA "Apr" "Jan" "Jan" "Jul" "Jul" "Jun" "Jun" "Mar" "Mar" "Mar"
## [12] "Mar" "May" "Sep"
rank()
that returns the sample ranks of the values in a vector:x <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5)
names(x) <- letters[1:11]
rank(x, ties.method = "first")
## a b c d e f g h i j k
## 4 1 6 2 7 11 3 10 8 5 9
rank(x, ties.method = "average")
## a b c d e f g h i j k
## 4.5 1.5 6.0 1.5 8.0 11.0 3.0 10.0 8.0 4.5 8.0
### ALWAYS BE AWARE OF HOW TIES ARE HANDELED!! ###
R includes some handy set operations, including these:
Function | Description |
---|---|
union(x,y) |
Union of the sets x and y |
intersect(x,y) |
Intersection of the sets x and y |
setdiff(x,y) |
Set difference between x and y , consisting of all elements of x that are not in y |
setequal(x,y) |
Test for equality between x and y |
is.element(el, set) ; c %in% y |
Membership, testing whether c is an element of the set y |
choose(n,k) |
Number of possible subsets of size k chosen from a set of size n |
Note that x
and y
are vectors of the same mode preferentially with no duplicated values. Replicate will not be returned.
Here are some simple examples of using these functions:
x <- 1:10
y <- c(3:6, 12, 12, 15, 18)
union(x, y)
## [1] 1 2 3 4 5 6 7 8 9 10 12 15 18
intersect(x, y)
## [1] 3 4 5 6
setdiff(x, y)
## [1] 1 2 7 8 9 10
setdiff(y, x)
## [1] 12 15 18
is.element(2, x)
## [1] TRUE
is.element(y, x)
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
let <- letters[1:2]
union(y, let) # Note the implicit type coercion
## [1] "3" "4" "5" "6" "12" "15" "18" "a" "b"
Create a character vector populated with 10 values of the name of the months randomly sampled (with replacement) from the built-in variable month.name
.
Replace values in this vector with the numbers of the corresponding months (e.g. March with 3).
Source: R for Biologists - Prof. Daniel Wegmann
Create a numerical vector f
containing the elements 1, −1, 2, −2, . . . , 100, −100
Create a vector of 100 elements that contains the numbers 1,2 and 3 in random order, but with twice as many 1s than 2s or 3s.
Source: R for Biologists - Prof. Daniel Wegmann
x
and y
containing 1000 random numbers normally distributed with sd=1 and mean=0 and mean=1, respectively.x[i]
, y[i]
) where y[i]>x[i]
.y
that are larger than the largest value in x
.x
that are larger than the 200 th smallest value in y
and less than two standard deviations away from the mean of x
.z
with all 999 differences between the neighboring elements of x
such that z[1]=x[2]-x[1], z[2]=x[3]-x[2], . . ..