Wednesday, April 17, 2013

New edition of Fleagle's "Primate Adaptation and Evolution"

At last!  There is a beautifully updated version of Fleagle's seminal Primate Adaptation and Evolution.  Just took my first look, and it appears to include all of the many important fossil discoveries since last edition.  Also, there are many new images, and the format is larger, meaning that all the images are much larger.  This is great!!

Tuesday, April 16, 2013

Do the same thing to a bunch of variables with lapply()


It is extremely common to have a dataframe containing a bunch of variables, and to do the exact same thing to all of these variables.

For instance, lets say we have a dataframe that has a bunch of limb bone measurements of different animals, and we want to see if they are related to a categorical predictor variable after controlling for the body mass of the animal.


set.seed(500)

categories <- factor(rep(c("A","B","C"),33))
BM <- rnorm(99,mean=100,sd=15)

myData<-data.frame(categories = categories,
              BM = BM,
              var1 = 0.05 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var2 = 0.1 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var3 = 0.2 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var4 = 0.4 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var5 = 0.9 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var6 = 0.99 * BM + as.numeric(categories) + rnorm(99,sd=1)
              )

rm(categories)
rm(BM)
head(myData)


## categories BM var1 var2 var3 var4 var5 var6 ##1 A 114.53 4.769 12.23 23.87 46.29 104.91 114.57 ##2 B 129.48 9.345 14.21 25.78 55.14 117.26 129.48 ##3 C 113.29 9.604 13.26 25.09 48.38 105.23 116.08 ##4 A 100.46 6.600 12.77 22.17 41.95 93.13 100.06 ##5 B 114.24 7.857 13.86 23.52 46.84 104.18 114.92 ##6 C 91.35 7.289 11.64 22.45 41.04 83.08 92.77


Plotting all our variables against body mass - the long way


We have created 6 variables that are all correlated with our 3-level categorical variable.  They also have an increasing correlation with body mass, which we can see in a plot.  Your first inclination might be to set up a plotting space with room for 6 plots, and then type out each plot command, like so.  

par(mfrow=c(2,3))
with(myData,plot(var1~BM,main="var1",pch=16,xlab="BodyMass",ylab="var1"))
with(myData,plot(var2~BM,main="var2",pch=16,xlab="BodyMass",ylab="var2"))
with(myData,plot(var3~BM,main="var3",pch=16,xlab="BodyMass",ylab="var3"))
with(myData,plot(var4~BM,main="var4",pch=16,xlab="BodyMass",ylab="var4"))
with(myData,plot(var5~BM,main="var5",pch=16,xlab="BodyMass",ylab="var5"))
with(myData,plot(var6~BM,main="var6",pch=16,xlab="BodyMass",ylab="var6"))

That's not too bad with just 6 variables, but would be annoying with 30 variables. And what if we want to change something about the way we are doing the plot?  We will have to change each one of the plotting commands....which I am way too lazy to do.  Only the variable name changes each time, everything else is exactly the same.  We have a clear case here for replacing our 6 plot commands with a single use of `lapply()`. Note: there are reasons (many of them stylistic) to avoiding explicit `for()` loops in R.  Here here is a good introduction to using the apply family of R functions.  

click to enlarge

Monday, February 25, 2013

Top 6 reasons you need to be using RStudio


Rstudio provides you with tools to make your work more productive.
New to R? -  check out these resources for getting staRted


RStudio is an awesome tool that can help you do your work better and faster.  In technical terms, RStudio is a cross-platform integrated development environment (IDE) for the R statistical language.  If you aren't a programmer type, that description may not mean much to you, and so maybe you think that means you shouldn't be using it.  In this post I will try to convince you that even (or especially) casual users of R should be using RStudio.  

Click to enlarge: Rstudio in action. 

Top 6 reasons to use RStudio

6. the joys of window docking

ever get really annoyed wondering where the hell the text file went that contains your R code? Or wonder where the plot is that you KNOW should have appeard (it was behind your web browser window ;-))? These sorts of problems are eliminated by using Rstudio because all the relevant windows are docked together.  In the image above, the source file containing your code is in the upper left....those commands appear when you run them in the R interpreter on the bottom left.  Plots and help files appear in the bottom right.  The upper right shows your graphical workspace. It is all docked together in a single application window.


5. the graphical workspace

sure, you can type ls() in the interpreter and get the following back [1] "df".  But isn't it much nicer to see a full list of objects in my workspace in the upper right panel of the Rstudio window? Now I actually know something about df, and by just clicking on an name it pops up with a graphical view of that object.....nice!

Click to enlarge: graphical view of a dataframe from the workspace

4. full-featured text editor

This includes everything you sh/would expect from a text editor: syntax highlighting, parenthesis and bracket matching, find/replace with regular expressions, etc etc. Learning how to use a good text editor makes a huge difference!

3. cross-platform interface

Cross-platform means far fewer nitpicky details to remember when going from Windows to Mac or vice versa.  The software works the same on all platforms.  

2. tab-completion of filenames, function names and arguments

Okay....this may blow your mind.  When you hit the tab-key, RStudio will try to autocomplete for you.  This can save a lot of time and typos.  For instance, you should NEVER type out the filename "myFile123_FromThatDayICollectedAllThisData.txt", when you could just type "myFile123" plus the tab-key, and Rstudio will complete it for you (assuming that file is in your working directory). 

Even more amazing is using the tab key after typing a function. You probably don't have all the possible arguments to every R function memorized....but Rstudio does.  For example hitting tab from within lm() shows you the possible arguments you can supply to this function, and what they mean (note your cursor must be within the parentheses).

Click to enlarge: auto-completion and function help using the tab key

1. seamless Rmarkdown and knitr integration

For me, this is the most revolutionary part of the Rstudio ecosystem.  Rstudio (with knitr) allows you to seamlessly document what you are doing while you are doing it.  It also provides a streamlined way of saving figures and output. Instead of writing code in a normal text file, you write it in a simple markdown format.  You embed your R code within this markdown file, so your code is mixed in with your comments about it.  With a single click, you can turn this markdown file into an html document that includes any output from your R code (text or figures) right in context with your normal text. The figures are saved in a single folder, and are named in a sensible way.  RMarkdown and knitr in RStudio is a real step forward in documenting analyses.....check back for a later detailed post on how this works. 

Final note: Some of these tools are present in other IDEs for R, but RStudio provides a really excellent full package (and nobody can touch RStudio in terms of #1). Trust me....you will do more data analysis faster and with fewer headaches if you use Rstudio!

Friday, February 22, 2013

Can molecular phylogenies be trusted?

iguanian on the left with simple tongue, scleroglossan on the right with forked tongue.  Losos et al., 2012.

We live in an age in which DNA has a powerful mystique...and for good reason. The technology for sequencing DNA has revolutionized several fields, including phylogenetic systematics. Molecular phylogeny has become the de facto standard for revealing the true evolutionary relationships between organisms. Molecular phylogenies are though to be inherently more trustworthy for a couple of reasons.  For one, there is less room to argue over trait definitions when we are dealing with DNA base pairs versus morphology. ATGC is unambiguous, where as first molar length may differ based on how length is measured or the wear stage of teeth. 

Molecular phylogenies have revealed a couple of apparent truths about evolutionary relationships that we couldn't readily have guessed from the anatomy. One is that whales are more closely related to cows  than cows are to horses. Or in other words, Cetartiodactyla is natural group of organisms that includes cetaceans (whales) as well as artiodactyls (hoofed-critters with an even number of toes....i.e. cloven -hoofed animals).  This group excludes perissodactyls (hoofed-critters with an odd number of does....like horses).  DNA tells us that Cetartiodactyla is a valid phylogenetic group, but would have been really hard to come up with from anatomy alone....just think of all the basic similarities between cows an horses that would lead you to believe they are closely related to the exclusion of Flipper.

This thing is a closer cousin to a cow than a cow is to a horse. 

Which brings me to a recent discussion by Losos and colleagues in Science about the vast and irreconcilable differences between cutting-edge molecular phylogenies and exhaustive morphological phylogenies for lizards. Based on massively detailed morphological studies, there is a great deal of evidence that iguanians are the most primitive group of lizards, distinct from scleroglossans (other lizards). The scleroglossans share a huge number of morphological shared derived traits, the most obvious of which is that scleroglossans have a forked tongue. But recent molecular data suggests that iguanians are nested high within the group of scleroglossans, implying that all of the supposedly primitive traits shared by iguanians are in fact evolutionary reversals.  This is a tough pill to swallow, because 
"the synapomorphies of scleroglossans inferred as lost by iguanians in the molecular tree come from many functionally different parts of anatomy. These traits have disparate embryological origins and growth patterns, discounting general explanations based on development. Furthermore, iguanians have diverse lifestyles, ranging from large herbivorous iguanas to ant-eating horned lizards and gliding dragons. It is hard to see how this multifaceted suite of characteristics could reflect adaptation to an overall iguanian lifestyle." Losos et al, 2012:1429
So we are left with a conundrum in which the molecular and morphological data are saying fundamentally different things...and the morphological data set appears to be as good as it gets, with great care having been taken to address the problems inherent in morphological data sets. Which do we trust?  Losos and colleagues suggest that the molecular data could well be wrong.  They argue that natural selection acts on the molecular level as well, so we have to consider the possibility that convergence/homoplasy might be obscuring true phylogenetic patterns in the molecular data. 

Their main point is that we shouldn't just accept the molecular data as the truth, without looking at all the evidence.  What do you think?

Thanks to Claud Bramblett for pointing out this article.


Thursday, February 7, 2013

Getting staRted with R.

As a PhD student and researcher, I often hear friends and colleagues say that they want to learn R, but that the learning curve is so steep that they can't seem to get started.  It's true that learning any tool as powerful as R can be confusing at first, especially if you are not accustomed to typing commands in a terminal.  That said, there are TONS of resources available for learning R.  This post describes some of the resources that I have found most useful in my jouRney.

Online Resources
  • O'Reilly Code School TryR - this is a truly fantastic online interactive introduction to learning basic skills in R.  Warning: the tutorial has a persistent pirate metaphor.
  • twotorials - 2 minute videos that teach you how to do simple tasks in R. "got two minutes? Learn some statistical programming in R.  Its easy, free, and FUN!"
  • Quick-R is a fast way to learn what you need to get started.  I found this site after I had learned the basics, but it really seems great.  Thanks for reminding me about this resource,  a Tom.
  • flowingdata tutorials - these are focussed tutorials on specific topics, like dealing with charts in R
  • R-bloggers is a aggregator that brings together hundreds of blogs on R, including this one. I highly recommend subscribing to their RSS feed to keep up with the latest. 
  • google.com - it seems obvious.....but google is your best friend.  Getting a weird error message? Try copying and pasting it into google.  Chances aRe that hundreds of other people have had the same error and you might find some help.  
  • Can't figure out your problem on the google?  Reach out to the R user community. Caveat: Power R users, especially the developers, are among smartest people I have ever been in contact with.  They are active in the community and they can answer your questions!  And they will, within minutes or hours, but only if you ask the question in the correct way.  For instance, if you say "omg....why am I getting this error"....you may only hear crickets.  If you instead distill the problem down to its simplest form, ideally with a self-contained example, you are very likely to have your problem solved.  It takes some effort to formulate your question clearly, and often you figure out what went wrong in the process. 
    • Stack Overflow - there is an extremely active R user base on Stack Overflow.  Questions get answered very quickly...and users can vote on the best answer, so you can avoid wading through unhelpful answers (in the event there are any). 
    • You can also try emailing the R help mailing list. This is another very active community which includes many of the core developers of R.  For best results -- especially on this mailing list -- try not to submit poorly formulated questions. It helps to read their posting guide first. 

Books
Below are links to five of the R books that I have gotten the most out of over the years.  These are all across the spectrum.....from aimed at the total beginner, to covering more specialized and advanced topics.  Have I left off your favorite R book?  Let me know in the comments. 

Introductory Statistics with R (Statistics and Computing)
This book provides a good introduction to basic statistical
concepts with an emphasis on how to do the analyses in R.
A good place to start for a beginner with little stats knowledge.
R For Dummies
Another good entry point for a complete beginner.  This book
has the typical "straight-talk" tone of the dummies series.
If you like other books in this series this might be for you.
The R Book
Now in its second edition, Crawley's R book is a must for
serious users of R for heavy-duty statistical analysis.  Includes
exhaustive coverage of base graphics, as well as great chapters
on the art of linear modeling.  Not a particularly fun read...but
an important reference. Also doubles as a door stop. 
ggplot2: Elegant Graphics for Data Analysis (Use R!)
ggplot2 is revolutionizing graphics in R. This book is really
important for understanding how the concept of the ggplot2
package differs from base graphics in R.  It is a totally different
world, and this book really helps you understand that. 
The Art of R Programming: A Tour of Statistical Software Design
So you want to go beyond merely doing statistical tests, and take
advantage of R's rich features as a programming language?  This
book is for you. It will help you understand how to unlock the
power of R to do heavy lifting.

Shameless commerce disclosure: if you purchase one of these books through one of the above links, I will receive a small referral fee from amazon. 

Tuesday, January 22, 2013

Primate Origins Revisited

Artists reconstruction of C. simpsoni about to nosh on a fruit on while grasping a terminal branch.  Illustration by Doug Boyer. 

There is a new review paper out in the American Journal of Primatology on competing ecological explanations for the origins of primates.  The title "Rethinking Primate Origins Again" is a reference to the classic 1974 paper by Matt Cartmill entitled "Rethinking Primate Origins", in which Cartmill introduced the Nocturnal Visual Predation hypothesis for primate origins. Cartmill's big idea stemmed from the observation that many of the features shared by all living primates --- especially features of the visual system such as convergent eye orbits --- are also found among nocturnal visual predators such as felid carnivores and owls. In this view, the earliest primates are hypothesized to have been small nocturnal visual predators of insects.

A felid carnivore exhibiting orbital convergence.
The authors of the new review paper favor a slightly different view of primate origins.  They believe that the evolution of flowering plants (angiosperms) is the key to understanding primate adaptations.  Rather than nocturnal predation as the driving force, Sussman and colleagues hypothesize that the earliest primates evolved to exploit newly available resources of flowering plants (flowers, fruit, and insects attracted to them) in a fine branches setting. This idea is known as the Angiosperm Coevolution hypothesis.

This should be easy to resolve, right?  We should just look at the fossil record of the earliest primates and see whether they are insectivores --- possessing visual adaptations for nocturnal visual predation --- or whether they lack these features and instead possess dietary adaptations for plant resources.  

Well, its not so easy as it turns out. The problem is that we can't all agree on which fossils are the earliest primates.

The debate centers around a group of mammals known as plesiadapiformes, and one species in particular, Carpolestes simpsoni. Now, pretty much everybody agrees that the first primates of modern aspect arrive on the scene about 55 million years ago, and that many of these guys ate lots of insects.  However, plesiadapiformes are an earlier group of mammals, sharing some features with living primates, but do not have all of the derived features characterizing living primates.  C. simpsoni is one plesiadapiform that has grasping hands and feet and ate fruit but lacks the visual specializations characterizing modern primates.

So....if you think C. simpsoni is a stem primate, then it is of HUGE importance to primate origins because is shows that primates acquired some of their modern features related to fruit eating in terminal branches BEFORE they acquired the visual features that Cartmill claimed were related to nocturnal visual predation of insects.  This would support the Angiosperm Coevolution hypothesis.  However, if you think C. simpsoni has no special relationship to modern primates, then this critter simply has nothing to add the debate on primate origins.

The affinities of plesiapiformes is an OLD (like....paleo-old) debate. It is a complicated question that brings up a lot of difficult issues:

  • how to distinguish features inherited from a common ancestor from features evolved in parallel
  • how to deal with the fact that living primates represent only those species that have survived extinction, and thus...
  • the taxonomic question of how to define primates based on anatomical characteristics with a limited sample of living primates
This new review paper won't be the end of the debate by any means, but it is a nice summary of the competing ideas.  I look forward to the replies from the other side of this debate, and to the new fossils that will, no doubt, be brought to bear on this question.

Thanks to Brett Nachman for pointing out this article. 


Friday, January 18, 2013

Amazing resource for primate photos and illustrations

The The Nash Collection of Primates in Art and Illustration is an extensive collection of (mostly) copyright free depictions of primates.  It is searchable by scientific or common name and includes some beautiful and fascinating images.  A companion website is the PrimateImages: Natural History Collection, which includes tons of photos with varying copyright protection.

Side Note:  Yes, I am aware of Google's Image Search capabilities. What I like about this is how that these collections are curated.....so the taxonomic identifications should be trustworthy, and you actually know the source of the image (illustrator or photographer).

A Slow Loris on the cover of Life Magazine in 1951.  Inexplicably depicted in a coffee mug. 

Monday, November 12, 2012

Australopithecus bahrelghazali: cow-man of the Pliocene?




Following on the recent announcement that Paranthropus boisei was eating lots of grass and/or some other kind of C4 resources by the Pleistocene, we now have isotopic evidence that Australopithecus bahrelghazali was eating a diet with a similar isotopic composition by about 3 million years ago in north-central Africa (Chad).  Keep in mind that that the Chadian australopith is gracile, similar to A. afarensis, so it lacked the mega-chewing apparatus of P. boisei

It remains to be seen what kind of C4 resources A. bahrelghazali was eating....it could be grass leaves and stems, or maybe the carbohydrate rich tubers and corms of many sedges.  Dental microwear data  should help to resolve this.  

Any way you slice it, it is clear that early hominin diets were diverse....much more diverse than we have previously appreciated. 

Monday, October 22, 2012

Beautiful Visualization of Tree of Life

OneZoom tree zoomed into the primates.

I just discovered this awesome new web project to visualize the tree of life. It is called OneZoom, and it provides a beautiful way to see the tree of life all at once.  It is like google maps in that at broad zoom levels it shows very little detail, but as you zoom in on a clade, it gives you more and more detail.  At the species level, it gives you conservation details, plus a link to the wikipedia page to learn more.  What a great tool for teaching primate systematics! Or any other systematics for that matter.

Species level entry