# Statistics and Difference

BIO 2003 SUMMATIVE ASSIGNMENT 2 Introduction: The report analyses the result of a study on workers from brick and tile industries conducted by the Health and Safety Laboratory (HSL). HSL put down few criteria’s to the workers which being that neither of the workers from the tiles and brick industries should have worked in both the industries and that they did not smoke. The criteria’s put across was an assurance to attain reliable results.

**Statistics and Difference**specifically for you

The essence of the study lies in detecting any difference in the health of the workers in these industries (as identified by cell damage) if any and also to determine if any relationship exists between the length of service and the recorded health effect. The Null Hypothesis (Ho) states that no difference in the median between the percentage-damaged cells of the workers from the brick and tile industries is observed. Null Hypothesis for the correlation study also states that there is no correlation between the health effects of the workers and the time period they have worked in the industries.

Nonetheless the Alternative Hypothesis (H1) states that the median percentage of damaged cell of the workers in the brick industry is different when compared to the median percentage of damaged cells of workers of both the operations. H1 for the correlation study states that correlation exists between the time period the workers have worked in the industry and their health effects. Analysis will be carried out with the help of the following 5 samples: * Worker ID * Age * Department * Length of service * Percentage of cell damage The above samples are independent within and also between each other.

To obtain an accurate analysis of the data, the normality, box plot and straight-line relationship and independence of the statistical analysis will be checked. The Null or Alternative Hypothesis will be accepted or rejected on the basis of a statistical analysis, which will be used to analyse the median percentage of damaged cells got from the brick and tile operations. Table 1: Descriptive Statistics of brick and tile operation workers percentage damaged cells Variable| N| N*| Mean| SE Mean | St: Dev. | Minimum| Q1| Median| Q3| Maximum| % Damaged cells of Tile operation| 27| 0| 1. 337 | 0. 210 | 1. 090 | 0. 200 | 0. 600 | 1. 00| 1. 500 | 4. 700| % Damaged cells of Brick operation | 38| 0| 1. 532 | 0. 179 | 1. 106 | 0. 200 | 0. 536 | 1. 370| 2. 189 | 4. 562| Table 1 gives a descriptive data of the workers of the respective industries. As seen in the table above the % of damaged cells of the workers in the brick industry is higher when compared with the tile operation workers. The median percentage of brick industry workers is 1. 370 which is higher as compared to the brick operation workers which is 1. 100. The inter-quartile range which being the difference between Q3 and Q1 is higher for the brick operation compared to that of the tile.

Figure 1:Box plot displaying %damage of cell in workers from both tile and brick industries. The figure above shows that the percentage-damaged cell for tile operators is lower when compared with the brick operators indicating a difference in the mean and median. Figure 1 shows a difference in the health hazard of the tile and brick workers. There is evidence of skewness in the distribution of brick operators whereas the tile distribution is symmetric, as the median line for the brick operators has shifted away from the centre.

The % cell damage in workers of the tile operation is closely grouped apart from the 2 extreme outliers when compared to the % cell damage of the brick workers, which is quite wide. For the above box plot the need for a further analysis is to be carried out as the hypothesis cannot either be accepted neither rejected since the box plot only denotes statistical measures (mean, median, Q1, Q3, max & min values) which are not ample to prove the difference between the two sites. Figure 2: Histogram of the Tile and Brick operation data The % of damaged cells of the brick operation is higher when compared to the tile operation.

This is concluded from the histogram above which exhibits that the bar values which is the % damaged cells for brick operation is higher than the bar value of the tile operation. We have used a histogram, as it is one of the important tools for a data analysis. Figure 3:The Test For Equal Variance. The values of the estimated equal variances show no difference in the % cell damage of the workers from the brick and tile operations’-value obtained from the Levene’s Test is 0. 200 which is also higher than 0. 05 implies that the hypothesis of difference cannot be rejected.

The value of the F-Test is 0. 952 which being higher than 0. 05 shows also shows no signs that the null hypothesis (H0) should be rejected and also that there is no difference between %cell damage of workers from brick and tile operations. The obtained values from the test for equal variance point out to an abnormal distribution of data stating the acceptance of the null hypothesis. Hence no clear evidence of a difference in the median among the % damaged cells in the workers of both the operations. Figure 4:Normal Distribution Graph For Brick And Tile Operation.

Figure 4 illustrates a normal distribution graph for tile and brick operations. The figure above shows that the %damaged cells of brick and tile operations are not uniformly distributed, as the points are not scattered about a straight line. There is evidence that the residuals followed a skewed distribution and it can also be seen that the above graph does not follow any trend or pattern. The is no convincing evidence to reject the null hypothesis (H0) as the P-Value is lower than 0. 05 in Fig4. From the above facts it may be concluded that the residuals do not follow a normal distribution.

A MANN WHITNEY TEST will be used to statistically analyse the data as the %damaged cells of workers in the tile operation shows that the data is not normally distributed since the P-Value is lower than 0. 05 and also that the plots on the graph so no route any precise trend. MANN WHITNEY TEST Results & CI Of Tile & Brick Manufacturing Operations Table 2:illuminates the number of samples used in the Mann Whitney test and the obtained median for data of brick and tile manufacturing operations Sample type| Number of sample| Median| Tile | 27| 1. 100|

Brick| 38| 1. 370| Point estimate for ETA1-ETA2 is 0. 200 95. 0% CI for ETA1-ETA2 is (-0. 323, 0. 800) W = 1319. 0 Test of ETA1 = ETA2 vs. ETA1 not = ETA2 is significant at 0. 3905 The test is significant at 0. 3903 (adjusted for ties). The results shows a confidence interval of 95% between 0. 323 and 0. 800 in the %damaged cells of workers In the brick and tile operations. Contrariwise the difference in the median is 0. 200(estimated), which means that 0. 200%(approximately) more % of damaged cells in workers of the brick operations than those of the tile operations.

A 100% certain analysis cannot be proven as the confidence interval (CI) is only 95%, hence creating a need for more data in order to achieve a 100% certain analysis. An analyses of results obtained shows the P-value got from the Mann-Whitney test was 0. 3905. Since the P-value is higher than 0. 05 it indicated no evidence to reject the null hypothesis of no differences. Therefore it can be concluded that there is no convincing evidence of difference in the median between %damaged cells of workers in the 2 operations. Conclusion:

A use of various graphs and descriptive statistics were used and inferred to decide if there were any differences in the health of the workers of the 2 operations. The Mann Whitney U test was considered to find the difference in the %-damaged cells of the tile and brick operation workers. A conclusion may be drawn from the these analyses that there is scarce evidence to suggest that there is noteworthy difference in the % damaged cells in workers of tile and brick operations. Question: 2 Table 3: Paired T-test and 95% CI to determine if the data of % damaged cells and length of service of workers in two operations is paired. N| Mean| StDev| SE Mean| % Damaged cells| 65| 1. 451 | 1. 095| 0. 136| length of service (years | 65| 8. 995 | 7. 349| 0. 912| Difference| 65| -7. 544 | 6. 964| 0. 864| 95% CI for mean difference: (-9. 270, -5. 819) T-Test of mean difference = 0 (Vs. not = 0): T-Value = -8. 73 P-Value = 0. 000 The table shows the T-test and the P-value got is >0. 05 stating no convincing evidence to reject null hypothesis of no differences. It may be concluded that the data is paired since the P-value is 0. 000. A scatter plot may also be used to test the relationship between the two samples.

Figure5: A scatter plot showing the correlation between the % of cells damaged with a regression line and the length of service in years. The predicted value for Regression is 17. 4%, which states the 17. 4% of the variability in the data is represented by the regression model. This cannot be used to get future values as the predictive value itself is very low. Pearson’s correlation needs to be conducted since the above scatter plot shows a minor positive association between the % damaged cells and the length of the service, but the damage of the cells in the future cannot be predicted.

Pearson’s Correlation results: Difference 65 -7. 544 6. 964 0. 864 95% CI for mean difference: (-9. 270, -5. 819) T-Test of mean difference = 0 (vs. not = 0): T-Value = -8. 73 P-Value = 0. 000 Pearson correlation of length of service (years) and % damaged cells = 0. 417 P-Value = 0. 001. The association between the length of service and %damaged cells of the tile and brick operations cannot be accepted since the values from Pearson’s Correlation is 0. 417which is higher than 0. 400. Therefore a regression fitted line will be used to forecast the future data.

The P-value is 0. 001 which being less than 0. 05 does not prove to be a convincing evidence to reject null hypothesis (H0) of no differences. Hence a conclusion may be drawn stating a difference in the length of services and the % damaged cells of workers from both the operations. Hence a regression fitted line plot will be used to predict future values. Further Analysis: Figure6:shows the data between the %damaged cells and the age of workers as well as the regression line. The scatter plot above shows that there is a moderate positive correlation between the age and the % damaged cells.

Therefore a Pearson’s correlation will be conducted. Pearson correlation of age (years) and % damaged cells = 0. 251 P-Value = 0. 044 The P value is 0. 044 which is less than 0. 05, this means that the null hypothesis must be rejected and the alternative hypothesis is accepted that there is not sufficient evidence available to say that there is a correlation. Conclusion: The data was analysed using descriptive statistics, various graphs, Pearson’s correlation and regression fitted line plot to find association between the % damaged cell and length of service in tile and brick operations.

The results concluded that there is no association between the % of damaged cells and their length of service. However there was a positive correlation which was observed between the % of damaged cells and age of workers in both operations. This suggested that it is the age which is the cause of damage and not the dust. The first test carried out, concluded that there is no genuine difference between the health hazard of the worker at the tile and brick operation.

The second test concluded that there is little relationship between the workers health and the length of their service. Since the R-sq value was only 17. 4%, the extent of damage cannot be predicted by the length of employment. Overall conclusion: It can be concluded that there is insignificant difference in the percentage damaged cells in the workers of tile and brick operations. It can also be concluded that age of workers and not the length of exposure to the dust in brick or tile operations increase % damaged cells of workers.