International baccalaureate math sl investigation_the correlation between lung cancer incidents and the mean amount of smoked cigarettes by felix dyrek
International baccalaureate math sl investigation_the correlation between lung cancer incidents and the mean amount of smoked cigarettes by felix dyrek
1. Candidate name: Felix Dyrek Candidate number: 001528-031
Investigation
Math Studies
Working Title:
The correlation between lung cancer
incidents and the mean amount of
smoked cigarettes
Candidate Name: Felix Dyrek
Candidate number: 001528-031
School name: Kolegium Europejskie
School number: 001528
Assignment supervisor: Katarzyna Nosalska
1
2. Candidate name: Felix Dyrek Candidate number: 001528-031
Introduction:
The aim of my investigation is to find out if there is a correlation betweenlung cancer incidents
in total and tobacco consumption of men and women in 6 countries.
Hypothesis:
My hypothesis assumes that the rate of lung cancer incidents is bigger in the countries with
higher tobacco consumption then in the countries with smaller tobacco consumption.
Method:
In order to be able to investigate the correlation between lung cancer incidents and tobacco
consumption I needed to collect datafrom the tobacco industry and various (lung) cancer
constitutions. The next step is to verify the collected data and form statistics. In order to calculate
the correlation the following mathematical methods are used:
The Pearson’s Correlation Coefficient
The X2 Test
2
3. Candidate name: Felix Dyrek Candidate number: 001528-031
Raw Data:
Table 1 : Lung Cancer incident rates per 100000 people
Country Lung Cancer Female Lung Cancer Male Lung Cancer
incidents incidents incidents
China 93 27 66
Japan 60 13 47
Thailand 87 37 50
Sweden 35 13 22
Poland 85 21 64
UK 73 22 51
Chart 1: Lung Cancer incident rates per 100000 people
100
90
80
70
60
Total
50
Female
40
Male
30
20
10
0
China Japan Thailand Sweden Poland UK
3
4. Candidate name: Felix Dyrek Candidate number: 001528-031
Table 2: Adult smokers in total
Country Adult smokers in Adult smokers (per Adult smokers (%)
total 100000 people)
China 462.800.000 35600 35.6
Japan 42.132.466 33100 33.1
UK 12.721.380 21000 21
Sweden 1.939.183 19000 19
Poland 13.149.988 34500 34.5
Thailand 15.019.407 23400 23.4
Chart 2: Comparison between percentage of adult smokers in total and countries in %
Adult Smokers
China
Japan
UK
Sweden
Poland
Thailand
4
5. Candidate name: Felix Dyrek Candidate number: 001528-031
Table 3: Average amount of cigarettes smoked per year
Country Total amount of cigarettes Total amount of cigarettes
smoked per person smoked per 100000 people
China 1791 179.100.000
Japan 3023 302.300.000
UK 2232 223.200.000
Sweden 1202 120.200.000
Poland 2061 206.100.000
Thailand 1067 106.700.000
Chart 3: Comparison between the average amounts of cigarettes smoked per person / year and
countries in %
Average amount of smoked cigarettes per
person / year
China
Japan
UK
Sweden
Poland
Thailand
5
6. Candidate name: Felix Dyrek Candidate number: 001528-031
Calculations
The Pearson Correlation coefficient
The Pearson Correlation coefficient is used to identify if there is a correlation between the lung
cancer incidents and the average amount of smoked cigarettes per person. Table 4 is divided into
6 countries in order to be able to compare them. First of all it is to chart the researched data for
lung cancer incidents (x) and the amount of smoked cigarettes per person / year (y). The next
step is to multiply these data in order to obtain XY. Data in x and y have to be raised by 2 to
obtain the results for the last two columns. The following step is to sum up the obtained data up.
Table 4
Table X Y XY X2 Y2
4
Country Lung Cancer Amount of
Incidents smoked cigarettes
per person / year
1 China 93 179100000 16656300000 8649 32076810000000000
2 Japan 60 302300000 18138000000 3600 91385290000000000
3 UK 73 223200000 16293600000 5329 49818240000000000
4 Sweden 35 120200000 4207000000 1225 14448040000000000
5 Poland 85 206100000 17518500000 7225 42477210000000000
6 Thailand 87 106700000 9282900000 7569 11384890000000000
Total 6 433 1137600000 82096300000 33597 241590480000000000
6
7. Candidate name: Felix Dyrek Candidate number: 001528-031
The last step is to insert the collected data into the Pearson Correlation Coefficient formula and
solve the equation.
r= 82096300000−6×72.16×189600000_______________
√33597−6×72.162 √241590480000000000−6×1896000002
r= 82089216000____________
√2354.60√25901520000000000
r= -1.54x1011
There is no correlation.
7
8. Candidate name: Felix Dyrek Candidate number: 001528-031
The X2 Test
2 ( fo fe )2
calc
fe
Where:
f o is an observed frequency
f e is an expected frequency
Observed Value Table(fo): Taken from Table 1, average of the European and Asian countries
within the female and male lung cancer incidents.
Female lung cancer Male lung cancer Sum
incidents incidents
Europe 18.67 45.67 64.34
Asia 25.67 54.33 80
Sum 44.34 100 144.34
Calculation Table:The calculation table will be used to change the observed values into expected
values to have the possibility to calculate the x2 test.
S1 S2 Sum
R1 wy÷n wz÷n w
R2 xy÷n xz÷n x
Sum Y Z n
8
9. Candidate name: Felix Dyrek Candidate number: 001528-031
Expected Value Table(fe): This table represents the lung cancer incidents between Europe and
Asia. The data is based on my previous data on the 6 countries – China, Japan, Thailand,
Sweden, Poland and the United Kingdom divided into their representing continents. It is also
divided between male and female groups.
Female lung cancer Male lung cancer Sum
incdents incidents
Europe 19.76 44.58 64.34
Asia 24.58 55.42 80
Sum 44.34 100 144.34
Now I am going to calculate the x2 test in order to observe if there exists a correlation between
observed and expected values extracted from the tables concerning male and female lung cancer
incidents within Europe and Asia.
2
Calculations:
fo fe fo−fe (fo−fe)2 (fo−fe)2÷fe
18.67 19.76 -1.09 1.1881 0.0601
45.67 44.58 1.09 1.1881 0.0267
25.67 24.58 1.09 1.1881 0.0483
54.33 55.42 -1.09 1.1881 0.0214
Total 0.1565
2
So, = 0.1565
2
The is small enough to observe that there is a correlation between observed and expected
9
10. Candidate name: Felix Dyrek Candidate number: 001528-031
Degrees of freedom
df = (r – 1)(c – 1)
The next step is find df and using a table to find the meaning of x2 which I just have obtained.
The x2 distribution depends on the number of degrees of freedom (df) wheredf = (r – 1)(c – 1)
My table equals:
df=(r-1)(c-1)
df=(2-1)(2-1)
df=1x1=1
10
11. Candidate name: Felix Dyrek Candidate number: 001528-031
Conclusion and Evaluation
Due to the results which I have obtained during my research it can be concluded that there
doesn’t exist a direct correlation between the amount of smoked cigarettes and the lung cancer
incidents. So my hypothesis is proven to be wrong. There can be various factors resulting in lung
cancer such as second hand smoke, car exhaust, multiple alpha, beta and gamma rays. As these
facots can oncrease the chance of lung cancer my data is not 100% accurate as there are external
factors which can increase the lung cancer incidents. Thus lung cancer incidents are not purely
based on the amount of consuming cigarettes even though it is a known fact that excessive
cigarette consumption may cause lung cancer. As Due to the explanation above the investigation
could be improved by including more external factors such as the one previously mentioned.
11