Tuesday, November 19, 2013

musings on academic publishing

I am in the process of revising two submitted papers that come from my dissertation research.  Without further ado, I give you my musings on academic publishing.



Wednesday, November 13, 2013

faster for() loops in R

One of the most common ways people write for() loops is to create an empty results vector and then concatenate each result with the previous (and growing) results vector, like the following.  (Note: wrapping an expression in the function system.time() executes the function and returns a summary of how long it took, in seconds.)

x <- c()
system.time(
  for(i in 1:40000){
    x<-c(x,i) #here i is combined with previous contents of x
  }
)

   user  system elapsed 
  2.019   0.082   2.100 

It is MUCH faster to create the results an empty vector of the correct size, and modify elements in place.  This prevents R from having to move around an ever growing object in memory and is much faster. In short....it seems that what R is slow at is allocating memory for objects.

x<-numeric(40000) #empty numeric vector
system.time(
  for(i in 1:40000){
    x[i] <- i #changing value of particular element of x
  }
)
   user  system elapsed 
  0.066   0.001   0.067 

The second method is over 31 times faster on my machine.

PS.  This post was inspired by Hadley Wickham's much more technical and in-depth coverage of memory usage in R.

Friday, October 18, 2013

New Dmanisi cranium increases variation, questions early Homo species diversity

The team from the site of Dmanisi in Georgia just published the description of their "Skull 5", which represents the fifth well preserved skull from this important 1.8 million year old site from the Republic of Georgia.

Skull 5: credit  Georgia National Museum, AP
This skull is super important, because it is quite different from the other four Dmanisi skulls in that is has a massive face and jaw, and just a tiny brain at about 550cc. Thus, the authors argue that the Dmanisi sample from a single place and a relatively short time interval encompasses virtually all of the variation found in early Homo specimens from Africa. This would imply that the proposed species diversity in early Homo is not really species diversity at all, unless you want to split the Dmanisi specimen into multiple species.

The specimen is getting TONS of well-deserved press attention, but much of the coverage is predictably sensationalist....claiming that this skull overturns what we thought we knew about the human family tree.  A more accurate description might be that this skull offers fresh evidence which is lending new support to a long-standing idea in paleoanthropology (i.e. that early Homo is characterized best as one species).

Saturday, September 7, 2013

Juvenile ape cranium from Miocene of China

Credit: Xue-Ping Ji, Yunnan Institute of Cultural Relics and Archaeology

A team of researchers working at Shuitangba, a site in the Miocene of China, just announced a pretty awesome juvenile cranium of the Miocene ape genus Lufengpithecus.  This skull is important because of its relatively young age (about 6 million years), and because it doesn't closely resemble orangutans. Many researchers have considered Lufengpithecus to be closely related to modern orangutans, but if true, then Lufengpithecus should bear a striking resemblance to modern orangs by 6 million years ago. The authors argue that this isn't the case, and that Lufengpithecus doesn't appear similar to any modern apes.  Very cool fossil!

Monday, September 2, 2013

Academic Phylogeny of Physical Anthropology


Liza Shapiro, Brett Nachman and I have just launched a new website called Academic Phylogeny of Physical Anthropology. We are creating an interactive online genealogy of physical anthropology PhDs.  The idea was hatched over lunch in the department. We were discussing the great paper by Elizabeth Kelley and Robert Sussman in which they trace the academic genealogy of field primatologists. We were lamenting the fact that there wasn't a comparable tree for other sub-specializations within physical anthropology. We decided that tracing academic history is important, and that creating an online version driven by user submissions was the best way to build out the tree.

Please check out the site.  If you are a physical anthropology PhD (or know somebody who is), make sure their name appears. If it doesn't appear, be sure to add it!

Tuesday, June 18, 2013

Early human evolution and the myth of the Paleo Diet

hominin looking out on savanna

One of the most visible fad diets lately (at least where I live) is the so-called Paleo Diet.  If you haven't heard of it, you can think of the Paleo Diet as pop evolutionary psychology for foodies. Essentially, the idea is that our bodies are adapted to the diet of our "pre-agricultural, hunter gatherer ancestors", and that we should try to mimic this diet as much as possible, because it is more natural.  

The basic idea is sensible enough, right? I mean, to the extent that the diet recommends doing away with the heavily processed carbs which are modern technological innovations, I think we can all get on board (leaving aside critical questions regarding poverty and food access for the moment).  However, the picture gets a little less clear when we dig deeper into questions like "which ancestors are we talking about" and "what did those ancestors actually eat"? That's where the science comes in.  

caveman eating burger and fries

In the June 3rd issue of PNAS, there were three important papers on the stable isotopic evidence for early hominin diet, especially hominins in East Africa. These are dense technical papers that provide invaluable direct evidence regarding the diets of our early ancestors.  I won't try to summarize these papers, I just want to pick out three broad points that they drive home for me:
  1. Apes eat mostly ripe fruits and leafy greens, but human ancestors shifted their diets away from ape-like diets very early on.
  2. The diets of human ancestors are notably variable when compared with other animals.  
  3. Some foods that our ancestors have probably eaten for millions of years would be forbidden by the so-called Paleodiet. 
Point 1: While the early hominin species A. anamensis had a carbon isotope signature dominated by C3 resources (Cerling et al, 2013), it is clear that by the time A. afarensis (Lucy's species) came around, they were eating lots of C4 (Wynn et al, 2013).  This is interesting, because it means that a dietary transition away from ape-like diets dominated by C3 vegetation occurred really early in human evolution (around 3.4 Ma) in a species that most researchers agree is directly ancestral to us.  This dietary shift seems really interesting when you consider that even chimpanzees that routinely live in open savanna environments don't eat much C4 vegetation (Schoeninger et al, 1999).  Sponheimer and colleagues (2013) verify that this shift occurred in multiple species through time, and that there is a weak trend towards more C4 as time goes on.  

Point 2: Human ancestors had remarkably varied diets!  Wynn and colleagues (2013) show that different A. afarensis individuals had isotope values that ranged from the neighborhood of committed browsers nearly (but not quite) up to that of committed grazers. Clearly early hominins were eating lots of different things.  This is consistent with a previous study suggesting that different species of the genus Paranthropus were eating remarkably different diets, even though their jaw anatomy is extremely similar.

Point 3: Stable isotope analysis can't tell us exactly which C4 plants hominins were eating.  But, a likely candiate C4 food item for early humans are underground storage organs of certain C4 plants. These starchy roots and tubers would be gritty and fibrous, but would include ample carbohydrates and water.  The idea that  early hominins relied on these so-called underground storage organs (USOs) goes way back in paleoanthropology (Hatley and Kappelman, 1980) and had a bit of a revival in the last decade (Laden and Wrangham, 2005).  It is ironic that practitioners of the paleodiet swear off starchy tubers like potatoes when they have likely been an important part of hominin diets for the last 3.4 million years!!!

Conclusion: It is really hard to know what our earliest ancestors ate. The best science is starting to paint a picture, though and it is clear that leaving behind the ape-like diet of leafy greens and fruits was an integral part of human evolution from very early days.

If we were to consider recent ancestors (say in the last 50,000 years), it would be clear that hunting and gathering human populations have made a living from every kind of diet you can imagine (think near vegetarians on one extreme and eating tons of whale blubber on the other hand)! It is far from clear what it means to "eat like a caveman" and this idea has much more to do with selling diet books than it does with the science of figuring out what our ancestors actually ate. Thankfully, we have lots of careful scientists doing the difficult task of figuring it out!

References:


Hatley T, and Kappelman J. 1980. Bears, pigs, and Plio-Pleistocene hominids:a case for the exploitation of belowground food resources. Human Ecology 8:371–387.

Laden G, and Wrangham R. 2005. The rise of the hominids as an adaptive shift in fallback foods: Plant underground storage organs (USOs) and australopith origins. Journal of Human Evolution 49:482–498.

Schoeninger MJ, Moore J, and Sept JM. 1999. Subsistence strategies of two“ savanna” chimpanzee populations: the stable isotope evidence. American Journal of Primatology 49:297–314.


Wednesday, April 17, 2013

New edition of Fleagle's "Primate Adaptation and Evolution"

At last!  There is a beautifully updated version of Fleagle's seminal Primate Adaptation and Evolution.  Just took my first look, and it appears to include all of the many important fossil discoveries since last edition.  Also, there are many new images, and the format is larger, meaning that all the images are much larger.  This is great!!

Tuesday, April 16, 2013

Do the same thing to a bunch of variables with lapply()


It is extremely common to have a dataframe containing a bunch of variables, and to do the exact same thing to all of these variables.

For instance, lets say we have a dataframe that has a bunch of limb bone measurements of different animals, and we want to see if they are related to a categorical predictor variable after controlling for the body mass of the animal.


set.seed(500)

categories <- factor(rep(c("A","B","C"),33))
BM <- rnorm(99,mean=100,sd=15)

myData<-data.frame(categories = categories,
              BM = BM,
              var1 = 0.05 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var2 = 0.1 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var3 = 0.2 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var4 = 0.4 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var5 = 0.9 * BM + as.numeric(categories) + rnorm(99,sd=1),
              var6 = 0.99 * BM + as.numeric(categories) + rnorm(99,sd=1)
              )

rm(categories)
rm(BM)
head(myData)


## categories BM var1 var2 var3 var4 var5 var6 ##1 A 114.53 4.769 12.23 23.87 46.29 104.91 114.57 ##2 B 129.48 9.345 14.21 25.78 55.14 117.26 129.48 ##3 C 113.29 9.604 13.26 25.09 48.38 105.23 116.08 ##4 A 100.46 6.600 12.77 22.17 41.95 93.13 100.06 ##5 B 114.24 7.857 13.86 23.52 46.84 104.18 114.92 ##6 C 91.35 7.289 11.64 22.45 41.04 83.08 92.77


Plotting all our variables against body mass - the long way


We have created 6 variables that are all correlated with our 3-level categorical variable.  They also have an increasing correlation with body mass, which we can see in a plot.  Your first inclination might be to set up a plotting space with room for 6 plots, and then type out each plot command, like so.  

par(mfrow=c(2,3))
with(myData,plot(var1~BM,main="var1",pch=16,xlab="BodyMass",ylab="var1"))
with(myData,plot(var2~BM,main="var2",pch=16,xlab="BodyMass",ylab="var2"))
with(myData,plot(var3~BM,main="var3",pch=16,xlab="BodyMass",ylab="var3"))
with(myData,plot(var4~BM,main="var4",pch=16,xlab="BodyMass",ylab="var4"))
with(myData,plot(var5~BM,main="var5",pch=16,xlab="BodyMass",ylab="var5"))
with(myData,plot(var6~BM,main="var6",pch=16,xlab="BodyMass",ylab="var6"))

That's not too bad with just 6 variables, but would be annoying with 30 variables. And what if we want to change something about the way we are doing the plot?  We will have to change each one of the plotting commands....which I am way too lazy to do.  Only the variable name changes each time, everything else is exactly the same.  We have a clear case here for replacing our 6 plot commands with a single use of `lapply()`. Note: there are reasons (many of them stylistic) to avoiding explicit `for()` loops in R.  Here here is a good introduction to using the apply family of R functions.  

click to enlarge

Monday, February 25, 2013

Top 6 reasons you need to be using RStudio


Rstudio provides you with tools to make your work more productive.
New to R? -  check out these resources for getting staRted


RStudio is an awesome tool that can help you do your work better and faster.  In technical terms, RStudio is a cross-platform integrated development environment (IDE) for the R statistical language.  If you aren't a programmer type, that description may not mean much to you, and so maybe you think that means you shouldn't be using it.  In this post I will try to convince you that even (or especially) casual users of R should be using RStudio.  

Click to enlarge: Rstudio in action. 

Top 6 reasons to use RStudio

6. the joys of window docking

ever get really annoyed wondering where the hell the text file went that contains your R code? Or wonder where the plot is that you KNOW should have appeard (it was behind your web browser window ;-))? These sorts of problems are eliminated by using Rstudio because all the relevant windows are docked together.  In the image above, the source file containing your code is in the upper left....those commands appear when you run them in the R interpreter on the bottom left.  Plots and help files appear in the bottom right.  The upper right shows your graphical workspace. It is all docked together in a single application window.


5. the graphical workspace

sure, you can type ls() in the interpreter and get the following back [1] "df".  But isn't it much nicer to see a full list of objects in my workspace in the upper right panel of the Rstudio window? Now I actually know something about df, and by just clicking on an name it pops up with a graphical view of that object.....nice!

Click to enlarge: graphical view of a dataframe from the workspace

4. full-featured text editor

This includes everything you sh/would expect from a text editor: syntax highlighting, parenthesis and bracket matching, find/replace with regular expressions, etc etc. Learning how to use a good text editor makes a huge difference!

3. cross-platform interface

Cross-platform means far fewer nitpicky details to remember when going from Windows to Mac or vice versa.  The software works the same on all platforms.  

2. tab-completion of filenames, function names and arguments

Okay....this may blow your mind.  When you hit the tab-key, RStudio will try to autocomplete for you.  This can save a lot of time and typos.  For instance, you should NEVER type out the filename "myFile123_FromThatDayICollectedAllThisData.txt", when you could just type "myFile123" plus the tab-key, and Rstudio will complete it for you (assuming that file is in your working directory). 

Even more amazing is using the tab key after typing a function. You probably don't have all the possible arguments to every R function memorized....but Rstudio does.  For example hitting tab from within lm() shows you the possible arguments you can supply to this function, and what they mean (note your cursor must be within the parentheses).

Click to enlarge: auto-completion and function help using the tab key

1. seamless Rmarkdown and knitr integration

For me, this is the most revolutionary part of the Rstudio ecosystem.  Rstudio (with knitr) allows you to seamlessly document what you are doing while you are doing it.  It also provides a streamlined way of saving figures and output. Instead of writing code in a normal text file, you write it in a simple markdown format.  You embed your R code within this markdown file, so your code is mixed in with your comments about it.  With a single click, you can turn this markdown file into an html document that includes any output from your R code (text or figures) right in context with your normal text. The figures are saved in a single folder, and are named in a sensible way.  RMarkdown and knitr in RStudio is a real step forward in documenting analyses.....check back for a later detailed post on how this works. 

Final note: Some of these tools are present in other IDEs for R, but RStudio provides a really excellent full package (and nobody can touch RStudio in terms of #1). Trust me....you will do more data analysis faster and with fewer headaches if you use Rstudio!

Friday, February 22, 2013

Can molecular phylogenies be trusted?

iguanian on the left with simple tongue, scleroglossan on the right with forked tongue.  Losos et al., 2012.

We live in an age in which DNA has a powerful mystique...and for good reason. The technology for sequencing DNA has revolutionized several fields, including phylogenetic systematics. Molecular phylogeny has become the de facto standard for revealing the true evolutionary relationships between organisms. Molecular phylogenies are though to be inherently more trustworthy for a couple of reasons.  For one, there is less room to argue over trait definitions when we are dealing with DNA base pairs versus morphology. ATGC is unambiguous, where as first molar length may differ based on how length is measured or the wear stage of teeth. 

Molecular phylogenies have revealed a couple of apparent truths about evolutionary relationships that we couldn't readily have guessed from the anatomy. One is that whales are more closely related to cows  than cows are to horses. Or in other words, Cetartiodactyla is natural group of organisms that includes cetaceans (whales) as well as artiodactyls (hoofed-critters with an even number of toes....i.e. cloven -hoofed animals).  This group excludes perissodactyls (hoofed-critters with an odd number of does....like horses).  DNA tells us that Cetartiodactyla is a valid phylogenetic group, but would have been really hard to come up with from anatomy alone....just think of all the basic similarities between cows an horses that would lead you to believe they are closely related to the exclusion of Flipper.

This thing is a closer cousin to a cow than a cow is to a horse. 

Which brings me to a recent discussion by Losos and colleagues in Science about the vast and irreconcilable differences between cutting-edge molecular phylogenies and exhaustive morphological phylogenies for lizards. Based on massively detailed morphological studies, there is a great deal of evidence that iguanians are the most primitive group of lizards, distinct from scleroglossans (other lizards). The scleroglossans share a huge number of morphological shared derived traits, the most obvious of which is that scleroglossans have a forked tongue. But recent molecular data suggests that iguanians are nested high within the group of scleroglossans, implying that all of the supposedly primitive traits shared by iguanians are in fact evolutionary reversals.  This is a tough pill to swallow, because 
"the synapomorphies of scleroglossans inferred as lost by iguanians in the molecular tree come from many functionally different parts of anatomy. These traits have disparate embryological origins and growth patterns, discounting general explanations based on development. Furthermore, iguanians have diverse lifestyles, ranging from large herbivorous iguanas to ant-eating horned lizards and gliding dragons. It is hard to see how this multifaceted suite of characteristics could reflect adaptation to an overall iguanian lifestyle." Losos et al, 2012:1429
So we are left with a conundrum in which the molecular and morphological data are saying fundamentally different things...and the morphological data set appears to be as good as it gets, with great care having been taken to address the problems inherent in morphological data sets. Which do we trust?  Losos and colleagues suggest that the molecular data could well be wrong.  They argue that natural selection acts on the molecular level as well, so we have to consider the possibility that convergence/homoplasy might be obscuring true phylogenetic patterns in the molecular data. 

Their main point is that we shouldn't just accept the molecular data as the truth, without looking at all the evidence.  What do you think?

Thanks to Claud Bramblett for pointing out this article.


Thursday, February 7, 2013

Getting staRted with R.

As a PhD student and researcher, I often hear friends and colleagues say that they want to learn R, but that the learning curve is so steep that they can't seem to get started.  It's true that learning any tool as powerful as R can be confusing at first, especially if you are not accustomed to typing commands in a terminal.  That said, there are TONS of resources available for learning R.  This post describes some of the resources that I have found most useful in my jouRney.

Online Resources
  • O'Reilly Code School TryR - this is a truly fantastic online interactive introduction to learning basic skills in R.  Warning: the tutorial has a persistent pirate metaphor.
  • twotorials - 2 minute videos that teach you how to do simple tasks in R. "got two minutes? Learn some statistical programming in R.  Its easy, free, and FUN!"
  • Quick-R is a fast way to learn what you need to get started.  I found this site after I had learned the basics, but it really seems great.  Thanks for reminding me about this resource,  a Tom.
  • flowingdata tutorials - these are focussed tutorials on specific topics, like dealing with charts in R
  • R-bloggers is a aggregator that brings together hundreds of blogs on R, including this one. I highly recommend subscribing to their RSS feed to keep up with the latest. 
  • google.com - it seems obvious.....but google is your best friend.  Getting a weird error message? Try copying and pasting it into google.  Chances aRe that hundreds of other people have had the same error and you might find some help.  
  • Can't figure out your problem on the google?  Reach out to the R user community. Caveat: Power R users, especially the developers, are among smartest people I have ever been in contact with.  They are active in the community and they can answer your questions!  And they will, within minutes or hours, but only if you ask the question in the correct way.  For instance, if you say "omg....why am I getting this error"....you may only hear crickets.  If you instead distill the problem down to its simplest form, ideally with a self-contained example, you are very likely to have your problem solved.  It takes some effort to formulate your question clearly, and often you figure out what went wrong in the process. 
    • Stack Overflow - there is an extremely active R user base on Stack Overflow.  Questions get answered very quickly...and users can vote on the best answer, so you can avoid wading through unhelpful answers (in the event there are any). 
    • You can also try emailing the R help mailing list. This is another very active community which includes many of the core developers of R.  For best results -- especially on this mailing list -- try not to submit poorly formulated questions. It helps to read their posting guide first. 

Books
Below are links to five of the R books that I have gotten the most out of over the years.  These are all across the spectrum.....from aimed at the total beginner, to covering more specialized and advanced topics.  Have I left off your favorite R book?  Let me know in the comments. 

Introductory Statistics with R (Statistics and Computing)
This book provides a good introduction to basic statistical
concepts with an emphasis on how to do the analyses in R.
A good place to start for a beginner with little stats knowledge.
R For Dummies
Another good entry point for a complete beginner.  This book
has the typical "straight-talk" tone of the dummies series.
If you like other books in this series this might be for you.
The R Book
Now in its second edition, Crawley's R book is a must for
serious users of R for heavy-duty statistical analysis.  Includes
exhaustive coverage of base graphics, as well as great chapters
on the art of linear modeling.  Not a particularly fun read...but
an important reference. Also doubles as a door stop. 
ggplot2: Elegant Graphics for Data Analysis (Use R!)
ggplot2 is revolutionizing graphics in R. This book is really
important for understanding how the concept of the ggplot2
package differs from base graphics in R.  It is a totally different
world, and this book really helps you understand that. 
The Art of R Programming: A Tour of Statistical Software Design
So you want to go beyond merely doing statistical tests, and take
advantage of R's rich features as a programming language?  This
book is for you. It will help you understand how to unlock the
power of R to do heavy lifting.

Shameless commerce disclosure: if you purchase one of these books through one of the above links, I will receive a small referral fee from amazon. 

Tuesday, January 22, 2013

Primate Origins Revisited

Artists reconstruction of C. simpsoni about to nosh on a fruit on while grasping a terminal branch.  Illustration by Doug Boyer. 

There is a new review paper out in the American Journal of Primatology on competing ecological explanations for the origins of primates.  The title "Rethinking Primate Origins Again" is a reference to the classic 1974 paper by Matt Cartmill entitled "Rethinking Primate Origins", in which Cartmill introduced the Nocturnal Visual Predation hypothesis for primate origins. Cartmill's big idea stemmed from the observation that many of the features shared by all living primates --- especially features of the visual system such as convergent eye orbits --- are also found among nocturnal visual predators such as felid carnivores and owls. In this view, the earliest primates are hypothesized to have been small nocturnal visual predators of insects.

A felid carnivore exhibiting orbital convergence.
The authors of the new review paper favor a slightly different view of primate origins.  They believe that the evolution of flowering plants (angiosperms) is the key to understanding primate adaptations.  Rather than nocturnal predation as the driving force, Sussman and colleagues hypothesize that the earliest primates evolved to exploit newly available resources of flowering plants (flowers, fruit, and insects attracted to them) in a fine branches setting. This idea is known as the Angiosperm Coevolution hypothesis.

This should be easy to resolve, right?  We should just look at the fossil record of the earliest primates and see whether they are insectivores --- possessing visual adaptations for nocturnal visual predation --- or whether they lack these features and instead possess dietary adaptations for plant resources.  

Well, its not so easy as it turns out. The problem is that we can't all agree on which fossils are the earliest primates.

The debate centers around a group of mammals known as plesiadapiformes, and one species in particular, Carpolestes simpsoni. Now, pretty much everybody agrees that the first primates of modern aspect arrive on the scene about 55 million years ago, and that many of these guys ate lots of insects.  However, plesiadapiformes are an earlier group of mammals, sharing some features with living primates, but do not have all of the derived features characterizing living primates.  C. simpsoni is one plesiadapiform that has grasping hands and feet and ate fruit but lacks the visual specializations characterizing modern primates.

So....if you think C. simpsoni is a stem primate, then it is of HUGE importance to primate origins because is shows that primates acquired some of their modern features related to fruit eating in terminal branches BEFORE they acquired the visual features that Cartmill claimed were related to nocturnal visual predation of insects.  This would support the Angiosperm Coevolution hypothesis.  However, if you think C. simpsoni has no special relationship to modern primates, then this critter simply has nothing to add the debate on primate origins.

The affinities of plesiapiformes is an OLD (like....paleo-old) debate. It is a complicated question that brings up a lot of difficult issues:

  • how to distinguish features inherited from a common ancestor from features evolved in parallel
  • how to deal with the fact that living primates represent only those species that have survived extinction, and thus...
  • the taxonomic question of how to define primates based on anatomical characteristics with a limited sample of living primates
This new review paper won't be the end of the debate by any means, but it is a nice summary of the competing ideas.  I look forward to the replies from the other side of this debate, and to the new fossils that will, no doubt, be brought to bear on this question.

Thanks to Brett Nachman for pointing out this article. 


Friday, January 18, 2013

Amazing resource for primate photos and illustrations

The The Nash Collection of Primates in Art and Illustration is an extensive collection of (mostly) copyright free depictions of primates.  It is searchable by scientific or common name and includes some beautiful and fascinating images.  A companion website is the PrimateImages: Natural History Collection, which includes tons of photos with varying copyright protection.

Side Note:  Yes, I am aware of Google's Image Search capabilities. What I like about this is how that these collections are curated.....so the taxonomic identifications should be trustworthy, and you actually know the source of the image (illustrator or photographer).

A Slow Loris on the cover of Life Magazine in 1951.  Inexplicably depicted in a coffee mug.