Geography Fieldwork

A Level

Data analysis

4. Data analysis

Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

Recurrence interval

Reports of flooding sometimes use the phrase "a 100-year flood". This does not mean that such floods only happen once every 100 years. Instead these terms mean that, on the basis of past records, the probability is that such a flood will only happen once in any given 100 years.

A recurrence interval how often a river is expected to reach a particular level of flow.

`"recurrence interval" = (N+1)/M`

  • `N` = number of years for which data has been collected`
  • `M` = rank (known as the magnitude number)

Worked example

Peak flow data from 1991-92 to 2013-14 was obtained for the River Thames at Eynsham, Oxfordshire from the National River Flow Archive.

Hydrological year Peak flow (cumecs)
1991-1992 33.07
1992-1993 81.635
1993-1994 78.484
1994-1995 79.532
1995-1996 74.867
1996-1997 40.358
1997-1998 72.413
1998-1999 83.066
1999-2000 77.624
2000-2001 91.572
2001-2002 62.028
2002-2003 91.796
2003-2004 55
2004-2005 50.9
2005-2006 49
2006-2007 102.054
2007-2008 87.587
2008-2009 75.795
2009-2010 60.135
2010-2011 51.896
2011-2012 66.552
2012-2013 97.989
2013-2014 107.355

(a) Rank the peak flow column from highest (1) to lowest (23)

Hydrological year Peak flow (cumecs) Rank (magnitude number)
1991-1992 33.07 23
1992-1993 81.635 8
1993-1994 78.484 10
1994-1995 79.532 9
1995-1996 74.867 13
1996-1997 40.358 22
1997-1998 72.413 14
1998-1999 83.066 7
1999-2000 77.624 11
2000-2001 91.572 5
2001-2002 62.028 16
2002-2003 91.796 4
2003-2004 55 18
2004-2005 50.9 20
2005-2006 49 21
2006-2007 102.054 2
2007-2008 87.587 6
2008-2009 75.795 12
2009-2010 60.135 17
2010-2011 51.896 19
2011-2012 66.552 15
2012-2013 97.989 3
2013-2014 107.355 1

(b) Calculate the recurrence interval for each peak flow

`"recurrence interval" = (N+1)/M`

Hydrological year Peak flow (cumecs) Rank (magnitude number) Recurrence interval
1991-1992 33.07 23 1.04
1992-1993 81.635 8 3.00
1993-1994 78.484 10 2.40
1994-1995 79.532 9 2.67
1995-1996 74.867 13 1.85
1996-1997 40.358 22 1.09
1997-1998 72.413 14 1.71
1998-1999 83.066 7 3.43
1999-2000 77.624 11 2.18
2000-2001 91.572 5 4.80
2001-2002 62.028 16 1.50
2002-2003 91.796 4 6.00
2003-2004 55 18 1.33
2004-2005 50.9 20 1.20
2005-2006 49 21 1.14
2006-2007 102.054 2 12.00
2007-2008 87.587 6 4.00
2008-2009 75.795 12 2.00
2009-2010 60.135 17 1.41
2010-2011 51.896 19 1.26
2011-2012 66.552 15 1.60
2012-2013 97.989 3 8.00
2013-2014 107.355 1 24.00

Flood frequency curve

This is a graph of river flow on the y-axis plotted against recurrence inteval on the x-axis. The x-axis is plotted on a logarithmic scale.

For example, peak flow data from 1991-92 to 2013-14 was obtained for the River Thames at Eynsham, Oxfordshire from the National River Flow Archive.

Flood frequency curve.
Flood frequency curve.

Chi-squared test

Chi squared in a statistical test that is used either to test whether there is a significant difference, goodness of fit or an association between observed and expected values.

`chi^2 = ∑ (O-E)^2 / E`

The chi squared test can only be used if

  • the data are in the form of frequencies in a number of categories (i.e. nominal data).
  • there are more than 20 observations in total
  • the observations are independent: one observation does not affect another

There are 3 steps to take when using the chi squared test

Step 1. State the null hypothesis

There is no significant association between _______ and _______

Step 2. Calculate the chi squared statistic

`chi^2 = ∑ (O-E)^2 / E`

  • `chi^2` = chi squared statistic
  • `O` = Observed values
  • `E` = Expected values

Step 3. Test the significance of the result

Compare your calculated value of `chi^2` against the critical value for `chi^2` at a confidence level of 95% / significance value of P = 0.05, and appropriate degrees of freedom.

`"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)`

If Chi Squared is equal to or greater than the critical value REJECT the null hypothesis. There is a SIGNIFICANT difference between the observed and expected values.

If Chi Squared is less than the critical value, ACCEPT the null hypothesis. There is NO SIGNIFICANT difference between the observed and expected values.

Worked example

A Geography student is investigating whether people's previous experience of floods has affected their preparedness for future floods. She surveyed householders in a coastal town, taking a stratified sample from householders that were flooded in winter 2013/4 and householders that were not flooded. One of the questions she asked was

Statement: If floods affected my house in the next year, I would be able to cope.

Answers: Strongly agree / Agree / Neither agree nor disagree / Disagree / Strongly disagree

Here are the results. Geographers call them the Observed Values.

Possible answers Flooded Not flooded SUM
Strongly agree 5 8 13
Agree 7 16 23
Neither 5 5 10
Disagree 6 5 11
Strongly disagree 2 1 3
SUM 25 35 60

Step 1. State the null hypothesis

There is no significant association between past experience of flooding and preparedness for floods in the future.

Step 2. Calculate the chi squared statistic

It is best to break this down into a number of smaller steps.

(a) Calculate the Expected Values using the formula

`"Expected value" = ("row total" xx "column total")/"grand total"`

Possible answers Flooded Not flooded
O E O E
Strongly agree 5 5.4 8 7.6
Agree 7 9.6 16 13.4
Neither 5 4.2 5 5.8
Disagree 6 4.6 5 6.4
Strongly disagree 2 1.3 1 1.8

(b) Calculate `(O-E)` and `(O-E)^2`and `(O-E)^2/E`

Observed (`O`) and Expected (`E`) values have been copied from the table above.

Flooded
`O` `E` `(O-E)` `(O-E)^2` `(O-E)^2//E`
5 5.4 -0.4 0.16 0.03
7 9.6 -2.6 6.76 0.70
5 4.2 0.8 0.64 1.15
6 4.6 1.4 1.96 0.43
2 1.3 0.7 0.49 0.38
Not flooded
`O` `E` `(O-E)` `(O-E)^2` `(O-E)^2//E`
8 7.6 0.4 0.16 0.02
16 13.4 2.6 6.76 0.50
5 5.8 -0.8 0.64 0.11
5 6.4 -1.4 1.96 0.31
1 1.8 -0.8 0.64 0.36

(c) Find the sum of the `(O-E)^2/E` column

`chi^2 = ∑ (O-E)^2 / E`

`chi^2 = 0.03+0.70+1.15+0.43+0.38+0.02+0.50+0.11+0.31+0.36`

`chi^2 = 3.99`

Step 3. Test the significance of the result

Calculate degrees of freedom

`"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)`

In this example `"Degrees of freedom" = (5-1) xx (2-1) = 4`

Choose a significance level, e.g. `p=0.05`. This means that chance should only account for the results in up to 5% of occasions the field test is carried out.

Compare the result with the critical value in the table. If the calculated value is greater than the critical value in the table the null hypothesis must be rejected.

At 3 degrees of freedom at `p=0.05`, the critical value is `9.49`

Since our calculated value of `3.99 > 9.49`, the null hypothesis is not rejected.

There is no significant association between past experience of flooding and preparedness for floods in the future.