A Level

# Data analysis

## 4. Data analysis

Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

## Chi-squared test

Chi squared in a statistical test that is used either to test whether there is a significant difference, goodness of fit or an association between observed and expected values.

chi^2 = ∑ (O-E)^2 / E

The chi squared test can only be used if

• the data are in the form of frequencies in a number of categories (i.e. nominal data).
• there are more than 20 observations in total
• the observations are independent: one observation does not affect another

There are 3 steps to take when using the chi squared test

### Step 1. State the null hypothesis

There is no significant association between _______ and _______

### Step 2. Calculate the chi squared statistic

chi^2 = ∑ (O-E)^2 / E

chi^2 = chi squared statistic

O = Observed values

E = Expected values

### Step 3. Test the significance of the result

Compare your calculated value of chi^2 against the critical value for chi^2 at a confidence level of 95% / significance value of P = 0.05, and appropriate degrees of freedom.

"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)

If Chi Squared is equal to or greater than the critical value REJECT the null hypothesis. There is a SIGNIFICANT difference between the observed and expected values.

If Chi Squared is less than the critical value, ACCEPT the null hypothesis. There is NO SIGNIFICANT difference between the observed and expected values.

### Worked example

A Geography student is investigating whether people's previous experience of coastal floods has affected their preparedness for future floods. She surveyed householders in a coastal town, taking a stratified sample from householders that were flooded in winter 2013/4 and householders that were not flooded. One of the questions she asked was

Statement: If coastal floods affected my house in the next year, I would be able to cope.

Answers: Strongly agree / Agree / Neither agree nor disagree / Disagree / Strongly disagree

Here are the results. Geographers call them the Observed Values.

Possible answers Flooded Not flooded SUM
Strongly agree 5 8 13
Agree 7 16 23
Neither 5 5 10
Disagree 6 5 11
Strongly disagree 2 1 3
SUM 25 35 60

### Step 1. State the null hypothesis

There is no significant association between past experience of flooding and preparedness for floods in the future.

### Step 2. Calculate the chi squared statistic

It is best to break this down into a number of smaller steps.

(a) Calculate the Expected Values using the formula

"Expected value" = ("row total" xx "column total")/"grand total"

O E O E
Strongly agree 5 5.4 8 7.6
Agree 7 9.6 16 13.4
Neither 5 4.2 5 5.8
Disagree 6 4.6 5 6.4
Strongly disagree 2 1.3 1 1.8

(b) Calculate (O-E) and (O-E)^2and (O-E)^2/E

Observed (O) and Expected (E) values have been copied from the table above.

Flooded
O E (O-E) (O-E)^2 (O-E)^2//E
5 5.4 -0.4 0.16 0.03
7 9.6 -2.6 6.76 0.70
5 4.2 0.8 0.64 1.15
6 4.6 1.4 1.96 0.43
2 1.3 0.7 0.49 0.38
Not flooded
O E (O-E) (O-E)^2 (O-E)^2//E
8 7.6 0.4 0.16 0.02
16 13.4 2.6 6.76 0.50
5 5.8 -0.8 0.64 0.11
5 6.4 -1.4 1.96 0.31
1 1.8 -0.8 0.64 0.36

(c) Find the sum of the (O-E)^2/E column

chi^2 = ∑ (O-E)^2 / E

chi^2 = 0.03+0.70+1.15+0.43+0.38+0.02+0.50+0.11+0.31+0.36

chi^2 = 3.99

### Step 3. Test the significance of the result

Calculate degrees of freedom

"Degrees of freedom" = ("number of rows" – 1) xx ("number of columns" – 1)

In this example "Degrees of freedom" = (5-1) xx (2-1) = 4

Choose a significance level, e.g. p=0.05. This means that chance should only account for the results in up to 5% of occasions the field test is carried out.

Compare the result with the critical value in the table. If the calculated value is greater than the critical value in the table the null hypothesis must be rejected.

At 3 degrees of freedom at p=0.05, the critical value is 9.49

Since our calculated value of 3.99 > 9.49, the null hypothesis is not rejected.

There is no significant association between past experience of flooding and preparedness for floods in the future.

## Hudson's equation

Hudson’s equation is used in civil engineering. It calculates the maximum mass (the Hudson's value) that can be moved by the impact of storm waves. Hudson's equation is used to assess the effectiveness of riprap, a commonly used method of hard defence at the coast.

If the mass of the boulder is less than the Hudson's value, the boulder can be moved during a storm. This would mean the coastal defence is ineffective and may become dangerous.

### Step 1. Calculate the volume and mass of the boulder

Measured the length, width and height of the boulder in metres.

"volume" = "length" xx "width" xx "height"

"mass" = "density" xx "volume"

For example, if a boulder is 5m long, 4m wide and 3m high

"volume" = 5 xx 4 xx 3 = 60"m"^3

Granite used in rip-rap has a density of 2.64 " tonnes"//"m"^3

"mass" = 2.64 xx 60

"mass" = 158.4 " tonnes"

(Note than a tonne is the same as a megagram)

### Step 2. Calculate Hudson's value

The formula calculates the maximum mass that could be moved.

"Mass" = (D xx H^3) / (3xx(D-1)^3 xx "cotangent " A)

• D = density of boulder (granite density = 2.64 " tonnes"//"m"^3)
• H = height of storm wave (use secondary data sources such as Wavenet to find the maximum storm wave associated with the location)
• A = angle of rest of the riprap

Cotangent means the reciprocal of the tangent. This means the contangent of angle "A" is the same as 1/("tan "A)

For example, for a 60"m"^3 boulder above, assuming a 3"m" storm wave

"Mass" = (2.64 xx 3^3) / (3xx(2.64-1)^3 xx "cotangent " 45)

"Mass" = 104.8 " tonnes"

The calculated mass by Hudson's equation (104.8 tonnes) is less than the measured mass of the boulder (158.4 tonnes). This boulder will not be moved during a storm in which wave height reaches 3m. But will the riprap still be effective if storms increase in height in the future? See the most recent projections.