Indexing Things in R

As Naaman pointed out, I took a couple of things for granted in my last tutorial. I assumed you know what a variable is, what a function is, and that you are comfortable typing into a command-line console oh and that you new what R is. For our next tutorial, I will still make those assumptions. Now lets say you did everything in the previous tutorial post and you’re looking at that flashing cursor and you wonder…what did I set already? The function ls(…) will List Objects currently loaded in memory.

> ls()
[1] "myline.fit" "x"          "x2"         "y"          "y2"

See?  There’s everything we defined in the past session.  Now, if we could only remember what these things are…there’s a function for that too called class(…):

> class(x)
[1] "numeric"
> class(y)
[1] "numeric"
> class(myline.fit)
[1] "lm"

Here we see that x and y are of the class “numeric” and myline.fit is a “lm” or linear model. Notice if you just have a number, that’s also of class “numeric”:

> class(9)
[1] "numeric"

So, R doesn’t really make a strong distinction between a number and a list of numbers; let’s call it a vector because a list is technically different in R.  This is is because R will distribute operations across the whole vector if the thing that is “numeric” has more than one element.  Take a look at this:

> a <- 5
> a - 1
[1] 4
> x
[1]  1  3  6  9 12
> x - 1
[1]  0  2  5  8 11

For the variable a, subtracting 1 gives us 4.  However, when we simply subtract 1 from x, where x is a vector, actually subtracts 1 from every element in the vector.  If you’re an old school LISP hack like me, then you’ll be very excited, but I’m getting a little ahead of myself.  So, what if you just want an individual number from the vector?  R uses a standard ‘array index’ scheme except, unlike every other computer language you’ve likely seen…it starts counting at 1 and not 0.  Check it:

> x
[1]  1  3  6  9 12
> x[0]
numeric(0)
> x[1]
[1] 1
> x[2]
[1] 3

We see that x[0] is numeric(0) which is basically an empty value (a placeholder for a number but with no value stored there).  x[1] is the first element.  x[2] is the second.  We can also see how many items are in there and notice we get an NA when we exceed the right boundary.

> length(x)
[1] 5
> x[6]
[1] NA

NA means ‘Not Available‘.  Now be careful because if you think a negative value is out of range, you’re mistaken.  For example, x[-1] means show me x EXCEPT for the first element.  Looky here:

> x
[1]  1  3  6  9 12
> x[-1]
[1]  3  6  9 12
> x[-2]
[1]  1  6  9 12
> x[-6]
[1]  1  3  6  9 12
> x[-10]
[1]  1  3  6  9 12

Yes, I’d call that not obvious.  Notice -6 and -10 don’t change the vector as there is no 6th or 10th element to remove.  If we start to think of things as vectors of stuff, it gets neat.  If you want the first three elements, you can call a range by startingNumber:endingNumber.

> x[1:3]
[1] 1 3 6
> x[3:5]
[1]  6  9 12

And if you want say just the 2nd and 4th elements, you can just put a numeric vector in there:

> x[c(2,4)]
[1] 3 9

Remember our friend c(…)?  It returns a vector of numbers.  We can simply pass that into the array index and get the 2nd and 4th elements.  And you can mix and match.  This is because the c(…) function expands the range when it is evalutated:

> c(1:3, 5)
[1] 1 2 3 5
> x[c(1:3, 5)]
[1]  1  3  6 12

Things can get messy fast but it wont let you mix negatives with non-negative indecies:

> x
[1]  1  3  6  9 12
> y
[1]  1.5  2.0  7.0  8.0 15.0
> c(x, y)
 [1]  1.0  3.0  6.0  9.0 12.0  1.5  2.0  7.0  8.0 15.0
> z <- c(x, y)
> z
 [1]  1.0  3.0  6.0  9.0 12.0  1.5  2.0  7.0  8.0 15.0
> z[c(3:5, 8)]
[1]  6  9 12  7
> z[c(1, 3:5, 8:9)]
[1]  1  6  9 12  7  8
> z[c(-1, 3:5, 8:9)]
Error in z[c(-1, 3:5, 8:9)] :
  only 0's may be mixed with negative subscripts

Whew…our first error message.  Ok, so lets make an empty vector then add stuff to it, leaving some blanks:

> v <- vector()
> v
logical(0)
> v[1] <- 2
> v[2] <- 4
> v
[1] 2 4
> v[6] <- 12
> v
[1]  2  4 NA NA NA 12

See how R just padded some NAs in there so it could set the 6th element.

> c(1,2,3,4,5) -> a
> a
[1] 1 2 3 4 5
> a[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
[1] 1 3 5

Notice we can also pass in true or false as a ‘switch’ to show that array index.  Next time, we’ll throw in an extra dimension…just to make things interesting.

Leave a Reply

Your email address will not be published. Required fields are marked *