Statistics

From HaFrWiki42
Jump to navigation Jump to search


ANOVA

ANalysis Of VAriance (ANOVA) [1] is a collection of statistical models in which observed variance is partitioned into components due to different explanatory variables. So a simply form ANOVA gives a statistical test of weather means of several groups are all equal.

Logic

Partitioning of the sum of squares

ANOVA is based on the following formula of the Sum of Squares [2]

  SSTotal = SSError + SSTreatments  

So, the number of degrees of freedom (abbreviated df) can be partitioned in a similar way and specifies the chi-square distribution which describes the associated sums of squares:

  dƒTotal = dƒError + dƒTreatments  

the F-Test

The F-test is used for comparisons of the components of the total deviation. In a one-way or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic:

          variance of the group means         MSTR 
    F = ----------------------------------  = ----  
        mean of the within-group variances    MSE

where:      SSTR
    MSTR = ------ . l = number of treatments
            I - 1

and:        SSE
    MSE  = ----- . nt = total number of cases
           nt - I
            

to the F-distribution with I-1,nT-I degrees of freedom. Using the F-distribution is a natural candidate because the test statistic is the quotient of two mean sums of squares which have a chi-square distribution.

Example

Students [3]are getting 3 types of sounds during the their study for an exam. They are divided into 3 groups of 8 students. Here are there results:

# Group description Test scores
1 Constant sound 7 4 6 8 6 6 2 9
2 Random sound 5 5 3 5 5 7 2
3 No sound 2 4 7 1 2 1 5 5
x1 x12 x2 x22 x3 x32
7 49 5 25 2 4
4 16 5 25 4 16
6 36 3 9 7 49
8 64 4 16 1 1
6 36 4 16 2 4
6 36 7 49 1 1
2 4 2 4 5 25
9 81 2 4 5 25
Sx1 = 48 Sx12 = 322 Sx2 = 32 Sx22 = 148 Sx3 = 27 Sx32 = 125
(Sx1)2 = 2304 (Sx2)2 = 1024 (Sx3)2 = 729
M1 = 6 M2 = 4 M3 = 3.375
SStotal = (322 + 148 + 125) - ( ( 48 + 32 + 27 )2 / 24 ) = 595 - 477.04 = 117.96
SSamong = ( (2304/8) + (1024/8) + (792/8) ) - 477.04 = 507.13 - 477.04 = 30.08
SSwithin = 117.96 - 30.08 = 87.88

So:

Source SS dƒ MS F
Among 30.08 2 15.04 3.59
Within 87.88 21 4.18  

F (dƒ= 2, 21), F must be at least 3.4668 to reach p < 0.05, so F score is statistically significant.

  • H0: Students learn more with constant music.
  • H1: Random or No-sound learn better.

The group with constant music (x1 has the highest score. However, the signficant F only indicates that at least two means are signficantly different from one another, but unkown is which specific mean pairs significantly differ. To find that a post-hoc analysis (e.g., Tukey's HSD) is needed.

Tukey's HSD

top

Logic

                      M1 - M2  
  Tukey's HSD = -------------------
                SQR( MSw ( 1 / n) )
Where:
M : Treatment/group mean
n : Number of treatment/group
  1. Calculate an analysis of variance (e.g., One-way between-subjects ANOVA).
  2. Select two means and note the relevant variables (Means, Mean Square Within, and number per condition/group)
  3. Calculate Tukey's test for each mean comparison
  4. Check to see if Tukey's score is statistically significant with Tukey's probability/critical value table taking into account appropriate dƒwithin and number of treatments.

Example

From the example above:

                 6 - 4                    2  
M1 vs M2 = -------------------- = ----------------- = 2.767
           SQR( 4.18 ( 1 / 8) )   SQR(4.18 x 0.125) 

           6 - 3.375
M1 vs M3 = --------- =  3.6315 [*]
              0.72

           4 - 3.375
M2 vs M3 = --------- =  0.864648
             0.72

[*] According to the Tukey's sig/probability table, taking into account (dƒwithin = 21 and treatments = 3, p < 0.05) = 3.58, the mean comparison between means 1 and 3 is statistically signficant, but not the other comparisons. (Use the Tukey Calculator: Psychology World Tukeys Calculator).

Trend Analysis

top Mann-Kendall test is applicable is cases when the data values Xi of a time series can be assumed to obey the model:

  xi = ƒ(ti) + εi  
  • ƒ(t): continuous monotonic increasing or decreasing function of time. 
  • εi: Residuals with the same distribution and a mean of zero. 

And is calculated using:

       n-1   n
   S = Σ     Σ sgn(Xj - Xk)
       k=1   j=k+1

Where:
               = +1 ... if Xj-Xk > 0
  sgn(Xj - Xk) =  0 ... if Xj-Xk = 0
               = -1 ... if Xj-Xk < 0

See also

top

  • Psychology World Virtual Statistician of Richard Hall.
  • Mann Kendall Test, Detecting Trends of Annual Values of Atmospheric Polllutants by the Man-Kendall Test and Sen's Slope Estimates (Excel Template Application MakeSens).

References

top

  1. ANOVA, wikipedia: Definition of Anova.
  2. Sum of Squares, wikipedia: Sum of squares is a concept that permeates much of inferential statistics and descriptive statistics. More properly, it is "the sum of the squared deviations".
  3. ANOVA Example, Psychology World,Richard Hall Between One-Way ANOVA.