Tuesday, 15 July 2014

Random variables: directions, turtles and rocks... by Murray Lark

Murray Lark is our hero and master of spatial statistical methodology for earth sciences. Here's another insight into the random variables he came across whilst exercising his craft and publishing his latest paper...

Back in the late 1960s an American marine biologist captured 76 turtles and took them out to sea where he released them and noted the direction in which each swam away. You can see the directions in Figure 1 below.  This is a 'rose diagram,' which is a sort of histogram for directional data. It shows that most of the turtles headed north-east by east (the direction of home), but several headed 180 degrees the other way (as if they knew where they wanted to go, but had the map upside down).

Figure 1
This data set has become a classic in directional statistics, the methods used for variables which are measured as angles.  Such data are common in the earth sciences.  Some examples are the direction in which the bedding planes of a sedimentary rock dip, the direction of ocean waves or the orientation of horizontal faults in rocks or cracks in a drying soil. 

Directions are not as easy to analyse as you might think.  Imagine a repetition of the turtle experiment where the direction of home was due north (zero degrees).  Our turtles are all better navigators than their predecessors, so we get the following ten directions:  0, 2, 358, 355, 7, 1, 358, 0, 355, 357.  None of the turtles deviates by more than five degrees from the way home, but what is their average direction?  If you simply compute the average of the ten data above you get 179 degrees, which is almost due south.  The reason for this is that angular data don't behave like ordinary real numbers, or even real numbers with a maximum and minimum.  On the scale of compass bearings the numbers "wrap around", zero degrees is equivalent to 360, and two observations, one of 358 and one of 2 degrees are very similar.

Directional statistics has to deal with this tricky behaviour.  One tool of the trade is the von Mises distribution.  A statistical distribution is a mathematical function that we can use to compute the probability that a random variable will fall in a particular interval.  The von Mises distribution can be used for data which are "wrapped around" the circle.  However, like its relation the bell-shaped "normal" distribution that we use for non-directional random variables, it has a single peak or mode.  
At BGS we have been exploring some alternative distributions for circular data in collaboration with a colleague from the CSIRO in Australia. 

Figure 2
One distribution, of considerable interest, is called the projected normal distribution.  While the mathematical account of this distribution is a bit complex, it is not difficult to understand intuitively.  Imagine that we consider the location where the turtles were released as the origin of our map with coordinates {0,0}.  We can generate a random direction from the projected normal distribution by selecting a random coordinate pair {x,y} which have a joint normal distribution.  The random direction is that of the line which joins the origin of the map to the random pair.

Figure 2 shows the projected normal distribution fitted to the turtle data.  The area between the red line and the black circle is one, and the area between the red line and the black circle over some range of angles (e.g. between North and East) is the probability of a turtle swimming in that direction.  Notice the large bulge in the distribution in the direction of home, and the rather smaller bulge 180 degrees away.

There is an open access paper which presents our work with the projected normal distribution, its comparison with some alternative models and a new related distribution. 

By the way, I have never been able to find out whether the marine biologist rescued the turtles that set off in the wrong direction.

No comments: