A Level

# Data analysis

## 4. Data analysis

Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

## Sediment analysis

### (a) Size of coarse sediments: mean size

Coarse sediments are pebbles and cobbles. If you have measured the a, b and c axes using calipers or a ruler, you could calculate the mean pebble size for each sample site.

"Mean size" = (a + b + c)/3 for each pebble

### (b) Size of fine sediments: phi sizes

Fine sediments are clay, silt and sand. Once you have sieved the sediment, you can calculate phi sizes. Use the conversion table if you do not have the phi sizes already.

Sediment size
mm phi
1.00 0
0.50 1
0.25 2
0.13 3
0.06 4
0.03 5
0.01 6

Calculate the percentage mass of sediment in each phi size category. For example, if total mass=100g and the mass of material at 5-10mm = 20g, then 20% of the total mass of sediment is 5-10mm in diameter. This can be presented in a number of ways

• using a histogram with % mass on the y axis and sediment size on the x-axis
• pie charts to show changes along the transect, which might be overlaid on a map or aerial photograph
• plot a scattergraph to show how mean sediment size varies with distance along the beach (see below).

Alternatively, use semi-logarithmic graph paper to plot a cumulative frequency graph of phi against mass. Plot phi size on the linear x-axis. Plot the cumulative mass of sediment on the logarithmic y-axis.

On your finished graph, find the phi size values at 16% and 84% cumulative mass. Use these figures in the following formula

("phi at 84% mass" - "phi at 16% mass")/2

Use the following table to interpret the result

result interpretation
0.35 very well sorted
0.35 - 0.5 well sorted
0.5 - 0.7 moderately well sorted
0.7 - 1.0 moderately sorted
1.0 - 2.0 poorly sorted
2.0 - 4.0 very poorly sorted
4.0 extremely poorly sorted

### (c) Zingg's shape classification

The analysis of the shape of coarse sediments can be divided into 4 categories: shape, sphericity, flatness and roundness.

The raw data needed for each pebble are the lengths of the a, b and c axes.

Calculate the ratio b-:a

Calculate the ratio c-:b

Now classify each pebble into one of the four groups shown in the table

Type of pebble b-:a c-:b
Sphere > 0.67 > 0.67
Disc > 0.67 < 0.67
Rod < 0.67 > 0.67

### (d) Krumbein's Index of Sphericity

The raw data needed for each pebble are the lengths of the a, b and c axes.

For each stone, calculate Krumbein's Index as follows

"Krumbein's Index" = ((bc)/a^2)^(1/3)

Krumbein's Index (K) K must be between 0 and 1. K = 1 for a perfectly spherical pebble. The lower that K is, the less spherical the pebble.

### (e) Cailleux’s Flatness Index

The raw data needed for each pebble are the lengths of the a, b and c axes.

For each stone, calculate Cailleux’s Flatness Index as follows

"Flatness Index" = (a + b) / (2c) xx 100

A perfectly equidimensional particle will have a Flatness Index of 100 and will increase infinitely as it become flatter.

### (f) Cailleux’s Roundness Index

The raw data needed for each pebble are:

• the length of the longest axis (l)
• the radius of curvature of the sharpest angle (r)

For each stone, calculate Cailleux Index as follows

"Roundness Index" = (2r)/lxx1000

Roundness Index =1000 for a perfectly spherical pebble. The lower the Roundness Index is, the more angular the pebble.

Cailleux's Roundness Index may be presented using box and whisker plots.

## Chi-squared test

Chi squared in a statistical test that is used either to test whether there is a significant difference, goodness of fit or an association between observed and expected values.

chi^2 = ∑ (O-E)^2 / E

The chi squared test can only be used if

• the data are in the form of frequencies in a number of categories (i.e. nominal data).
• there are more than 20 observations in total
• the observations are independent: one observation does not affect another

There are 3 steps to take when using the chi squared test

### Step 1. State the null hypothesis

There is no significant association between _______ and _______

### Step 2. Calculate the chi squared statistic

chi^2 = ∑ (O-E)^2 / E

chi^2 = chi squared statistic

O = Observed values

E = Expected values

### Step 3. Test the significance of the result

Compare your calculated value of chi^2 against the critical value for chi^2 at a confidence level of 95% / significance value of P = 0.05, and appropriate degrees of freedom.

"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)

If Chi Squared is equal to or greater than the critical value REJECT the null hypothesis. There is a SIGNIFICANT difference between the observed and expected values.

If Chi Squared is less than the critical value, ACCEPT the null hypothesis. There is NO SIGNIFICANT difference between the observed and expected values.

#### Worked example

A Geography student is comparing two deposits. He has measured the orientation of the long axis of stones in Deposit A and Deposit B) that he suspects have a different mode of origin. The orientations were placed into 4 categories: 0-45°, 46-90°, 91-135° and 136-180°.

Here are the results. Geographers call them the Observed Values.

Long axis orientation (°) Deposit A Deposit B SUM
0-45 33 38 71
46-90 55 48 103
91-135 47 32 79
136-180 26 39 65
SUM 161 157 318

### Step 1. State the null hypothesis

There is no significant association between stone orientation and location of deposit..

### Step 2. Calculate the chi squared statistic

It is best to break this down into a number of smaller steps.

(a) Calculate the Expected Values using the formula

"Expected value" = ("row total" xx "column total")/"grand total"

Long axis orientation (°) Deposit A Deposit B
O E O E
0-45 33 35.9 38 35.1
46-90 55 52.1 48 50.9
91-135 47 40.0 32 39.0
136-180 26 32.9 39 32.1

(b) Calculate (O-E) and (O-E)^2and (O-E)^2/E

Observed (O) and Expected (E) values have been copied from the table above.

Deposit A
O E (O-E) (O-E)^2 (O-E)^2//E
33 35.9 -2.9 8.7 0.2
55 52.1 2.9 8.1 0.2
47 40.0 7.0 49.0 1.2
26 32.9 -6.9 47.7 1.5
Deposit B
O E (O-E) (O-E)^2 (O-E)^2//E
38 35.1 2.9 8.4 0.2
48 50.9 -2.9 8.4 0.2
32 39.0 -7.0 49.0 1.3
39 32.1 6.9 47.6 1.5

(c) Find the sum of the (O-E)^2/E column

chi^2 = ∑ (O-E)^2 / E

chi^2 = 0.2 + 0.2 + 1.2 + 1.5 + 0.2 + 0.2 + 1.3 + 1.5

chi^2 = 6.2

### Step 3. Test the significance of the result

Calculate degrees of freedom

"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)

In this example "Degrees of freedom" = (4-1) xx (2-1) = 3

Choose a significance level, e.g. p=0.05. This means that chance should only account for the results in up to 5% of occasions the field test is carried out.

Compare the result with the critical value in the table. If the calculated value is greater than the critical value in the table the null hypothesis must be rejected.

At 3 degrees of freedom at p=0.05, the critical value is 7.81

Since our calculated value of 6.2 > 7.81, the null hypothesis is not rejected.

There is no significant association between stone orientation and location of deposit.

## Nearest neighbour index

The nearest neighbour index (NNI) is a summary statistic that shows how evenly distributed points are over space.

There are three steps to take in nearest neighbour index:

Step 1. Plot the points on a map. Give each point a unique number.

Step 2. Measure the distance from each point and the closest other point.

Step 3. Calculate the NNI statistic

"NNI" = 2 bar D sqrt(n/A)

• bar D = mean nearest neighbour distance
• n = total number of points
• A = area

The result ranges from 0 and 2.15.

Clustered distribution = 0; random distribution = 1.0; regular distribution = 2.15.

### Worked example

Drumlins are sometimes found in large numbers. A 1500m x 1500m square was chosen in an area east of Newtownards in County Down. 16 drumlins of various sizes were found in the area.

Step 1. Plot the points on a map.

Step 2. Measure the distance from each point and the closest other point.

Drumlin number Number of closest other drumlin Distance (m)
1 2 305
2 6 210
3 4 168
4 3 168
5 4 308
6 2 210
7 2 566
8 9 142
9 10 98
10 9 98
11 12 160
12 11 160
13 11 237
14 12 222
15 14 214
16 15 459
SUM 3724

Step 3. Calculate the NNI statistic. It is best to break this down into a number of smaller steps.

"NNI" = 2 bar D sqrt(n/A)

bar D= 3724/16 = 232.75m

A=1500 xx 1500 = 2250000 m^2

"NNI" = 2 bar D sqrt(n/A)

"NNI" = 2 xx 232.75 xx sqrt(16/2250000) = 1.24

In conclusion, the distribution of drumlins in the area can be described as largely random.