Applied biostatistical analysis using R

This text is designed for upper-level undergraduate or graduate students intending to pursue a career in the biological or biomedical sciences. It began like so many other texts on applied statistics. After teaching a graduate course in applied biostatistics for a number of years, I became frustrated by not finding a text that perfectly fit my conceptual and pedagogical approach to the topic1. In particular, I was looking for three things. First, I was looking for something that struck a balance between providing enough depth to communicate the nuts and bolts of what various tests are doing and providing enough breadth to serve as a useful resource as the student progressed in his or her career. Second, I wanted a text that would communicate enough of what I call the “classical approach to data analysis” so that readers would feel competent enough to evaluate current literature and standardized methodologies but that would also communicate more contemporary approaches to making inferences from biological data. Finally, in my opinion, it is safe to say that we have reached a point in which it is impossible to adequately train practitioners of modern statistics without also discussing how to implement concepts and approaches on the computer. Modern approaches rely on advanced computation but are too important to not address in an introductory text! Thus, I wanted a text that would demonstrate concepts using software that was powerful enough for readers to rely on for the rest of their careers. R is the software (more specifically, language) that I have chosen to accomplish this. By introducing readers simultaneously to both the concepts necessary for sound statistical analysis and to R, the lingua franca of statistics, the goal of this text is to provide a solid foundation for applied scientists to build a career of rigorous quantitative thinking.

In any biostatistics course, there is a tension that exists between the level of mathematical detail that can be communicated and a desire to make the material widely accessible to biologists. This text leans heavily toward the accessibility end of that spectrum. Emphasis is placed on a conceptual understanding of concepts (glossing over much of the calculus) and an ability to implement concepts in software. As a result, only a general understanding of college level algebra is required. Personal experience tells me that many graduate students within the biological sciences are more than ready to get their boot camp experience of “stats” over with as soon as possible so that they can then move on to the real business of becoming a practicing scientist. (Years later, they spend much effort and time catching up on what they wish they had learned back in graduate school.) This notion is somewhat reinforced by the fact that, in many cases, biology graduate students just do not have the option of taking several years of graduate level mathematics and statistics - even in the unlikely case that they desired to do so. Thus, some of the impetus for writing this text is the need of practicing biologists to get a practical introduction to statistics within a graduate program that frequently does not have the flexibility to allow more than a semester or two (at most) of “stats”. Having said this, the entire domain of statistics obviously relies on the foundation of mathematical probability. My advice to students is to get as much training in quantitative methods as possible.

Stephen B. Cox


  1. This is in no way an indictment against any other available texts. Why should a text that fits my own personal style exist?