A Level

# Data analysis

## 4. Data analysis

Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

## Water balance

The water balance of a drainage basin is calculated as

P–Q–G–ΔS–E = 0

• P is precipitation
• Q is stream discharge
• G is groundwater discharge
• ΔS is change in storage
• E is evapotranspiration

Use secondary data and extrapolated primary data to calculate the annual water balance.

## Analysing drainage basins: circularity and elongation

Both the size and the shape of the drainage basin are important in determining the runoff response to rainfall. The circularity and elongation of the drainage basin can be assesed, using secondary data collected from maps. Useful maps of small drainage basins are available at the Environment Agency’s Catchment Data Explorer.

### (a) Circularity ratio

The circularity ratio the ratio of the area of a circle of the same area as the basin to the basin perimeter.

"Circularity ratio" = "Basin area"/"Area of circle with same basin perimeter"

Using the maths of a circle (radius, area and circumference), this can be expressed as

"Circularity ratio" = (4 xx π xx A)/P^2

• A = basin area
• P = basin perimeter

To calculate the circularity ratio, use map data to find the

• area of the drainage basin
• length of the drainage basin perimeter

make sure that the same units (metres and square metres or kilometres and square kilometres) are used for length and area

### Worked example

The River Redlake drainage basin in Shropshire is shown below.

The Redlake drainage basin has an area of 27.215 km^2 and a perimeter length of 42.55 km. What is its circularity ratio?

"Circularity ratio" = (4 xx π xx A)/P^2

"Circularity ratio" = (4 xx π xx 27.215)/(42.55)^2

"Circularity ratio" = 0.19

Therefore the drainage basin is not at all circular. Of course this is fairly obvious from the map, but it would be a useful calculation if you were comparing two or more basins.

### (b) Elongation ratio

The elongation ratio is the ratio of diameter of a circle of the same area as the basin to the maximum basin length. The maximum basin length is a straight line from the mouth of a stream to the furthest point on the watershed of its basin.

The ratio runs from 1.0 (for a perfect circle) towards zero (the more elongated the basin is).

"Elongation ratio" = 1/L xx sqrt(4/π xx A)

• A = basin area
• L = maximum basin length

### Worked example

The River Redlake drainage basin in Shropshire is shown below.

The Redlake drainage basin has an area of 27.215 km^2 and a maximum basin length of 15.95 km(shown by the red line). What is its elongation ratio?

"Elongation ratio" = 1/L xx sqrt(4/π xx A)

"Elongation ratio" = 1/15.95 xx sqrt(4/π xx 27.215)

"Elongation ratio" = 0.37

## Mann Whitney U test

Mann Whitney U is a statistical test that is used either to test whether there is a significant difference between the medians of two sets of data.

The Mann Whitney U test can only be used if there are at least 6 pairs of data. It does not require a normal distribution.

There are 3 steps to take when using the Mann Whitney U test

### Step 1. State the null hypothesis

There is no significant difference between _______ and _______

### Step 2. Calculate the Mann Whitney U statistic

U_1= n_1 xx n_2 + 0.5 n_2 (n_2 + 1) - ∑ R_2

U_2 = n_1 xx n_2 + 0.5 n_1 (n_1 + 1) - ∑ R_1

• n_1 is the number of values of x_1
• n_2 is the number of values of x_2
• R_1 is the ranks given to x_1
• R_2 is the ranks given to x_2

### Step 3. Test the significance of the result

Compare the value of U against the critical value for U at a confidence level of 95% / significance value of P = 0.05.

If U is equal to or smaller than the critical value (p=0.05) the REJECT the null hypothesis. There is a SIGNIFICANT difference between the 2 data sets.

If U is greater than the critical value, then ACCEPT the null hypothesis. There is NOT a significant difference between the 2 data sets.

### Worked example

A geographer was interested in whether there was a difference in soil infiltration rates for two land-uses within a small drainage basin: an area of deciduous woodland and an area of evergreen woodland.

Eight readings of infiltration rate were taken at random locations in each site. Here are the results.

Infiltration rate (cm / minute) in deciduous woodland Infiltration rate (cm / minute) in evergreen woodland
0.5 0.2
0.7 0.1
0.4 0.4
0.4 0.4
0.6 0.3
0.5 0.5
0.6 0.2
0.3 0.2

### Step 1. State the null hypothesis

There is no significant difference in infiltration rate between deciduous woodland and evergreen woodland within the drainage basin

### Step 2. Calculate Mann Whitney U statistic.

(a) Give each result a rank. Calculate the sum of the ranks for the two columns.

Deciduous woodland Evergreen woodland
Infiltration rate (cm / minute) rank Infiltration rate (cm / minute) rank
0.5 12 0.2 3
0.7 16 0.1 1
0.4 8.5 0.4 8.5
0.4 8.5 0.4 8.5
0.6 14.5 0.3 5.5
0.5 12 0.5 12
0.6 14.5 0.2 3
0.3 5.5 0.2 3
SUM 91.5 SUM 44.5

(b) Calculate ∑R_1and ∑R_2

∑R_1 is the sum of the ranks in the first column (deciduous woodland) = 91.5

∑R_2 is the sum of the ranks in the first column (evergreen woodland) = 44.5

n_1 = 8 and n_2 = 8

(c) Calculate U_1 and U_2

U_1 = 8 xx 8 + 0.5 xx 44.5 xx (8 + 1) – 91.5 = 55.5

U_2 = 8 xx 8 + 0.5 xx 91.5 xx (8 + 1) – 44.5 = 8.5

### Step 3. Test the significance of the result

In this example, U_1 = 55.5 and U_2 = 8.5

U is the smaller of the two values, so U=8.5

The critical value at p=0.05 significance level for n_1=8 and n_2=8 is 13. Since our calculated value of 8.5 < 13.5, the null hypothesis can be rejected.

In conclusion, there is a significant difference in infiltration rate between deciduous and evergreen woodland within the drainage basin.