2.1 Some definitions

Before going on, we need to define some concepts. These concepts will form the foundation for learning the statistical tests and approaches we will discuss later on. To help illustrate the concepts and guide discussion, they will be presented in the context of an actual ecological question. Phytotelmata are small bodies of water that are contained within plants. They occur in pitcher plants, holes in trees, and basically anywhere that water can pool within a plant. In fact, they can be considered entire ecosystems, containing a surprising diversity of species. In the tropics, phytotelmata of bromeliads (Fig.2.1) provide aquatic habitats that are important in the life history of many invertebrate organisms. Because a primary source of energy in these systems is detritus, one might hypothesize that bromeliads that have more detritus may be able to support more species of detritivore invertebrates. Given this setting, consider the following concepts.

Figure 2.1: Picture of bromeliads from the Luquillo Experimental Forest of Puerto Rico. Photo taken from www.elyunque.com

  1. Population - the complete set of items (usually biological entities) that are theoretically possible and that satisfy the criteria of your domain of interest. This is a deceptively simple concept, and failure to explicitly consider it can often lead to inappropriate conclusions. In fact, care should be taken to ensure that the study design and data analysis are appropriately matched to the population about which inferences are intended to be made. For the question above, what is the population of interest? Is it all bromeliads in the world? Is it bromeliads within a particular forest? Is it one particular species of bromeliad? Answers to questions such as these will help determine the level of sampling that will be necessary to allow inferences to be made.

  2. Parameter - an attribute of a population that is of interest to the researcher. Parameters capture or describe some aspect of the population and are frequently quantitative measures of an effect that we are interested in. By convention, Greek letters are used to represent parameters. Common examples are $\mu $ (the mean) and $\sigma $ (the standard deviation) - terms that will be defined later on. For our example, the slope of the relationship between detritus biomass and the number of detritivore species (represented as $\beta _{1}$) may be a good quantitative measure that captures our question of interest. However, an important thing to remember is that parameters are attributes of populations, and, as a result, are almost impossible to ever truly know. What is the slope of that relationship for all bromeliads in the world that we are interested in? We will never know, but we can attempt to estimate what it is.

  3. Sample - the subset of the population that you actually observe and measure. Ideally, this sample will be a random subset of independent observations, although there are specific cases in which random sampling may not be the most efficient. Many times, this sample comes from an experiment that has been designed to address a specific question. In fact, the topic of ’experimental design’ is really a discussion of how experiments can be designed to ensure that the data (see below) they produce is an adequate representation of the population of interest. The degree to which the sample reflects that population of interest determines the validity of any conclusions drawn from analyzing the sample, and if the sample is not a good representation, no amount of statistical mumbo jumbo will save you from possibly getting erroneous results. For our example, we may choose to sample a number of bromeliads, chosen at random from all possible bromeliads within a defined area like the Luquillo Experimental Forest of Puerto Rico. (If this was the case, how would you describe the population of interest?)

  4. Statistic - an attribute of a sample that is of interest to the researcher. Basically, statistics are estimates of parameters, and they are usually represented with Latin letters. Thus, if we are really interested in $\mu $ (the population mean), we might focus attention on the statistic, $\bar y$ (the sample mean). Another common example is the statistic $s$, the sample standard deviation. It is an estimate of $\sigma $ (see Fig.2.2). For the bromeliad data, assuming that the relationship between detritivore richness and biomass is linear, we will let $b_{1}$ be the slope of the relationship in our sample. It estimates $\beta _{1}$, the true slope in the population.

    Figure 2.2: Illustration of the relationships between statistics, characteristics of a sample, and parameters, characteristics of the statistical population of interest.

  5. Variable - any characteristic that assumes different values (i.e., is not constant), depending on certain conditions. This is the characteristic or characteristics that you, as a researcher, have decided to measure. In our example, we are interested in two variables; the biomass of detritus in bromeliads, and the number of detritivore species in each of the bromeliads. Variables can be classified further as follows.

    1. Qualitative variable - any variable that takes on a variety of levels that do not have any specific quantitative meaning. These are sometimes referred to as categorical variables, and common examples include species, sex, etc.

    2. Quantitative variable - any variable whose various states can be quantified in a numerical fashion. Quantitative variables can be further described by the scale of measurement that is used.

      1. Ratio scale - there are two primary characteristics of a ratio scale. First, the intervals between measurements are constant. So, for example, for our biomass measurements, which can be measured in grams, the difference between 5 and 3 grams is the same as the difference between 13 and 11 grams. Second, a true zero point exists. This zero point acts as an anchor to our measurement scale and lets us speak about a biomass of 10 grams being twice as big as a biomass of 5 grams, etc. Actually, ratio scale variables include most variables that we measure and deal with on a day to day basis.

      2. Interval scale - the interval scale is similar to the ratio scale except that it does not contain a true zero point. To illustrate, consider the degrees on a compass where the zero point (due North) is arbitrarily defined. Other measurements that occur on a circular scale (e.g., time) typically are measured on an interval scale. Finally, temperature is another variable that biologists typically deal with that is measured on an interval scale. The zero point for temperature depends on which units are used, and it does not really make sense to say that 40 degrees is twice as hot as 20 degrees.

      3. Ordinal scale - an ordinal variable takes on a variety of levels that do not have a specific mathematical relationship; however, there is a specific ordering to the levels. Grades (A, B, C, D, or F) are a classic example. For example, if we ranked bromeliads based on the amount of detritus into the groups Low, Medium, and High - the variable “groups” would then be an ordinal scale variable.

    3. Continuous vs. Discrete variables - regardless of the scale being used, quantitative variables can be either discrete or continuous. When discrete, a quantitative variable only takes on a discrete set of values. Many times, these are whole numbers, and a variable that represents a count is the classic example. For example, the number of detritivore species in each bromeliad is a discrete variable. On the other hand, a continuous variable takes on (at least theoretically) an infinite number of values. The biomass of detritus within bromeliads could be considered a continuous variable.

  6. Data - these are the specific values of each of the variables of interest that are observed within the sample. Once we have defined our population and gathered the sample, the individual values of detritivore species richness and detritus biomass in the sampled bromeliads, as a whole, would be our data.