Wednesday, April 17, 2013

New edition of Fleagle's "Primate Adaptation and Evolution"

At last!  There is a beautifully updated version of Fleagle's seminal Primate Adaptation and Evolution.  Just took my first look, and it appears to include all of the many important fossil discoveries since last edition.  Also, there are many new images, and the format is larger, meaning that all the images are much larger.  This is great!!

Tuesday, April 16, 2013

Do the same thing to a bunch of variables with lapply()


It is extremely common to have a dataframe containing a bunch of variables, and to do the exact same thing to all of these variables.

For instance, lets say we have a dataframe that has a bunch of limb bone measurements of different animals, and we want to see if they are related to a categorical predictor variable after controlling for the body mass of the animal.


set.seed(500)

categories <- factor(rep(c("A","B","C"),33))
BM <- rnorm(99,mean=100,sd=15)

myData<-data.frame(categories = categories,
              BM = BM,
              var1 = 0.05 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var2 = 0.1 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var3 = 0.2 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var4 = 0.4 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var5 = 0.9 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var6 = 0.99 * BM + as.numeric(categories) + rnorm(99,sd=1)
              )

rm(categories)
rm(BM)
head(myData)


## categories BM var1 var2 var3 var4 var5 var6 ##1 A 114.53 4.769 12.23 23.87 46.29 104.91 114.57 ##2 B 129.48 9.345 14.21 25.78 55.14 117.26 129.48 ##3 C 113.29 9.604 13.26 25.09 48.38 105.23 116.08 ##4 A 100.46 6.600 12.77 22.17 41.95 93.13 100.06 ##5 B 114.24 7.857 13.86 23.52 46.84 104.18 114.92 ##6 C 91.35 7.289 11.64 22.45 41.04 83.08 92.77


Plotting all our variables against body mass - the long way


We have created 6 variables that are all correlated with our 3-level categorical variable.  They also have an increasing correlation with body mass, which we can see in a plot.  Your first inclination might be to set up a plotting space with room for 6 plots, and then type out each plot command, like so.  

par(mfrow=c(2,3))
with(myData,plot(var1~BM,main="var1",pch=16,xlab="BodyMass",ylab="var1"))
with(myData,plot(var2~BM,main="var2",pch=16,xlab="BodyMass",ylab="var2"))
with(myData,plot(var3~BM,main="var3",pch=16,xlab="BodyMass",ylab="var3"))
with(myData,plot(var4~BM,main="var4",pch=16,xlab="BodyMass",ylab="var4"))
with(myData,plot(var5~BM,main="var5",pch=16,xlab="BodyMass",ylab="var5"))
with(myData,plot(var6~BM,main="var6",pch=16,xlab="BodyMass",ylab="var6"))

That's not too bad with just 6 variables, but would be annoying with 30 variables. And what if we want to change something about the way we are doing the plot?  We will have to change each one of the plotting commands....which I am way too lazy to do.  Only the variable name changes each time, everything else is exactly the same.  We have a clear case here for replacing our 6 plot commands with a single use of `lapply()`. Note: there are reasons (many of them stylistic) to avoiding explicit `for()` loops in R.  Here here is a good introduction to using the apply family of R functions.  

click to enlarge