![]() |
|
|
Available Immediately Basic Statistics Other interesting pages ...
SAS Tip of the Month
Resume |
Basic Statistics - Part II, Tests About the Mean of a Normal Distribution (t-test) Sometimes the question is asked as to whether two samples have a similar mean. Just looking at a listing of the data does not give a definitive answer - some sort of test is required. In order to do a test it is important to formulate what statisticians call an "hypothesis", that is to ask a question (often called the null hypothesis and denoted by H0) and see if it is true, versus asking the same question (often called the alternative hypothesis and denoted by H1) and seeing if it is false. There are tests for where the population standard deviation is known but this very rarely occurs. So this article will jump straight to the situation where the standard deviation of the population is unknown. This is where the t-test is used to do the test on hypothesis. SINGLE SAMPLE The first part of this paper will look at the procedure where there is a single sample and uses the sample mean to test the hypothesis. The process for the test is summarized below:
The following examples demonstrate the procedure used for the tests. Example 1 A sample of eight bottles of a certain product were taken and their liquid content measured – the results are below: 369, 357, 356, 364, 348, 361, 345, 364 The researcher wants to test the null hypothesis that the mean equals 355 versus the alternative that it does not. Let α = 0.01.
Now looking at SAS, how does the same test get done. There is a procedure called PROC TTEST that will do the calculations however the default output and interpretation are very different. Using the same data as in example 1 (loading it into a dataset called PRODA with a variable VOLUME) and running the following code
20 proc ttest data=proda;
21 var volume;
22 run;
will produce the following output
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
volume 8 346.25 358 369.75 4.6472 8.2462 24.477 2.9155
T-Tests
Variable DF t Value Pr > |t|
volume 7 1.03 0.3377
From the output, how is a decision made as to whether accept or reject the hypothesis? The number to look for is under the label "Pr > |t|", and to read this correctly the number is compared against the significance level - if the number is greater than or equal to the significance then accept H0, otherwise accept H1. In this case the significance was 0.05 (0.10/2=0.05 since two sided test) and as 0.3377 > 0.005 then accept H0. Is there a way to check that the p-value from the TTEST procedure is acceptable or that the correct hypothesis was chosen? After all the way the procedure for determining which hypothesis to choose is well known and in countless textbooks? There is a function available in BASE SAS that can give the p-value from the t value computed in Example 1 and is shown in SAS data step below with the result:
31 data _null_;
32 x=1.029;
33 df=7;
34 p=(1-probt(abs(x),df))*2; /*significance level of a two-tailed t test*/
35 put p=;
36 run;
p=0.3377168268
The t-test can also be done using the UNIVARIATE procedure for the Single Sample case and using the MU0= option as the following SAS code and output shows (look at the Tests for Location section, Student's t result):
393 proc univariate data=proda mu0=355;
394 var volume;
395 run;
The UNIVARIATE Procedure
Variable: volume
Moments
N 8 Sum Weights 8
Mean 358 Sum Observations 2864
Std Deviation 8.24621125 Variance 68
Skewness -0.480995 Kurtosis -0.7557588
Uncorrected SS 1025788 Corrected SS 476
Coeff Variation 2.30341096 Std Error Mean 2.91547595
Basic Statistical Measures
Location Variability
Mean 358.0000 Std Deviation 8.24621
Median 359.0000 Variance 68.00000
Mode 364.0000 Range 24.00000
Interquartile Range 12.00000
Tests for Location: Mu0=355
Test -Statistic- -----p Value------
Student's t t 1.028992 Pr > |t| 0.3377
Sign M 2 Pr >= |M| 0.2891
Signed Rank S 7 Pr >= |S| 0.3672
It is also possible to do the calculations using data step code within BASE SAS, as shown below, and get an output similar to the output below::
data _null_;
...SAS Statements...
tsigl=-abs(tinv(alpha,df));
tsigh=abs(tinv(alpha,df));
tval=(mean-mju)/(std/sqrt(n));
p=(1-probt(abs(tval),df))*2;
...more SAS Statements...
run;
--- Output ---
T-TEST
Dataset = PRODA
Variable = volume
H0 = 355 , H1 ^= 355
alpha = 0.01 (2-sided test: alpha/2=0.005)
N= 8
Mean=358
S= 8.2462112512
DF= 7
-3.499483297 < 1.0289915109 < 3.4994832974 : accept H0, reject H1
Pr > |t| = 0.3377205477
For the data step method I have as a macro in my macro collection – every programmer should carry around with them something that contains their useful of frequently used code. Example 2 A company took a random sample of ten components and clocked the duration a machine took to recondition and inspect the each component (in seconds), the times of which were 5.7, 4.8, 5.9, 4.9, 6.1, 4.2, 6.5, 6.4, 5.8, 5.7 The goal is to have an average time of 5 seconds. Using a significance level of 0.01 was the goal met?
In the previous example, had the significance level been 0.05 (t0.05=1.833) then the result would have been quite different, 1.833<=2.564 then reject H0 and accept H1. Using SAS and the UNIVARIATE procedure (MJU=5) the output is:
The UNIVARIATE Procedure
Variable: time
Moments
N 10 Sum Weights 10
Mean 5.6 Sum Observations 56
Std Deviation 0.74087036 Variance 0.54888889
Skewness -0.7500206 Kurtosis -0.2612091
Uncorrected SS 318.54 Corrected SS 4.94
Coeff Variation 13.2298278 Std Error Mean 0.23428378
Basic Statistical Measures
Location Variability
Mean 5.600000 Std Deviation 0.74087
Median 5.750000 Variance 0.54889
Mode 5.700000 Range 2.30000
Interquartile Range 1.20000
Tests for Location: Mu0=5
Test -Statistic- -----p Value------
Student's t t 2.560997 Pr > |t| 0.0306
Sign M 2 Pr >= |M| 0.3438
Signed Rank S 19 Pr >= |S| 0.0488
A decision is made as to accept H0 or H1 by comparing the "Pr > |t|" value against the significance level – in this case 0.01 < 0.0306 so accept H0. TWO SAMPLES The second part of this paper will discuss using the t-test to compare two means with an unknown but assumed common population variance. The hypothesis that is usually tested is that the means are equal, denoted by H0: μ1=μ2. The hypothesis can also be rewritten as H0:μ1-μ2=0 - this is useful as it is then possible to easily write the test to check if it is larger or smaller by a specified value, sometimes denoted in textbooks as ∆. The test is very much the same as for the single sample except that the calculation for t is now:
where
Summarizing the procedure the process would be:
The following examples demonstrate the procedure used for the tests. Example 3 A sample of free range eggs from two farms were sought and a test was asked for to determine if the mean weight (ounces) of the eggs from the two farms are the same using a significance of 0.01:
Farm A: 20, 28, 24, 20, 24, 21, 17, 28, 25, 19
Farm B: 29, 16, 25, 27, 27, 18, 22, 27
To calculate the t-test using SAS procedures it is not possible to use the BASE SAS procedures but instead use others, for example TTEST from the SAS/STAT module. Using the data above and using the variable FARM to indicate where the sample came from, WTOZ as the weight in ounces, and the following code:
proc ttest data=eggs0;
class farm;
var wtoz;
run;
the following output appears:
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable farm N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
wtoz 1 10 19.898 22.6 25.302 2.598 3.7771 6.8956 1.1944
wtoz 2 8 19.917 23.875 27.833 3.13 4.734 9.635 1.6737
wtoz Diff (1-2) -5.521 -1.275 2.971 3.1448 4.2225 6.4264 2.0029
T-Tests
Variable Method Variances DF t Value Pr > |t|
wtoz Pooled Equal 16 -0.64 0.5334
wtoz Satterthwaite Unequal 13.3 -0.62 0.5457
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
wtoz Folded F 7 9 1.57 0.5173
As with the UNIVARIATE procedure above the result to look at is the “ Pr> |t|” value in the row using the method Pooled (if the assumption is that the two populations have the same variance then the “Pooled” method value is used, otherwise the Satterthwaite method value is used – the distinction is too advanced for this paper and if the reader is interested they should refer to the SAS documentation). As with the single sample method the decision is made as to accept H0 or H1 comparing the "Pr > |t|" value against the significance level – in this case 0.005 < 0.5334 (0.01/2=0.005 – two tailed test) so accept H0. To check if the p-value calculated in the TTEST procedure is the same as the one that was calculated by hand above the following data step code is used:
41 data _null_;
42 x=-0.637;
43 df=16;
44 p=(1-probt(abs(x),df))*2; /*significance level of a two-tailed t test*/
45 put p=;
46 run;
p=0.533133971
The value 0.5331 is about that of 0.5334 from the TTEST procedure – the difference is due to rounding. A good programmer will carry around a piece of SAS code to do this test using BASE SAS that is the manual method plus the calculation for the p-value. Example 4 A sample of free range eggs from two farms were sought and a test was asked for to determine if the mean weight (ounces) of the eggs from Farm A is greater than Farm B by three ounces using a significance of 0.01:
Farm A: 26, 26, 28, 33, 32, 27, 24, 24
Farm B: 19, 16, 26, 18, 28, 20, 18, 23, 18, 27
Using the TTEST procedure and with the following SAS code
proc ttest data=eggs0 H0=3;
class farm;
var wtoz;
run;
the following output is generated:
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable farm N Mean Mean Mean Std Dev Std Dev Std Dev Std Err
wtoz 1 8 23.317 27.5 31.683 1.9863 3.3806 8.9927 1.1952
wtoz 2 10 16.832 21.3 25.768 2.6853 4.3474 9.9017 1.3748
wtoz Diff (1-2) 0.7224 6.2 11.678 2.7016 3.9536 6.974 1.8754
T-Tests
Variable Method Variances DF t Value Pr > |t|
wtoz Pooled Equal 16 1.71 0.1073
wtoz Satterthwaite Unequal 16 1.76 0.0981
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
wtoz Folded F 9 7 1.65 0.5198
A decision is made as to accept H0 or H1 by comparing the “Pr > |t|” value against the significance level – in this case 0.005 < 0.1073 so accept H0. |