본문 바로가기
카테고리 없음

[STATA] 가설검정 기초

by e-money2580 2023. 1. 10.
반응형


use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_1.dta", clear

tab abany
tab abany, nolabel

recode abany (2=0), gen(abany1) // 더미변수 abany에서 값이 2인 것을 0으로 변환한 새 변수 생성

tab abany1 
tab abany1 sex

** prtest [변수] = 00            - 1개 표본 모비율 검정 (test of proportion)

prtest abany1==0.5

/*
One-sample test of proportion                   Number of obs      =       900

------------------------------------------------------------------------------
    Variable |       Mean   Std. err.                     [95% conf. interval]
-------------+----------------------------------------------------------------
      abany1 |        .43   .0165025                      .3976556    .4623444
------------------------------------------------------------------------------
    p = proportion(abany1)                                        z =  -4.2000
H0: p = 0.5

     Ha: p < 0.5                 Ha: p != 0.5                   Ha: p > 0.5
 Pr(Z < z) = 0.0000         Pr(|Z| > |z|) = 0.0000          Pr(Z > z) = 1.0000
*/
// 귀무가설 : 모비율은 0.5이다
// 왼쪽 꼬리 검정결과 p값이 5%보다 낮으므로 귀무가설을 기각하고 연구가설을 채택(낙태찬성비율은 0.5 미만)
// 양측 검정결과 p값이 5%보다 낮으므로 귀무가설 기각(낙태찬성비율은 0.5가 아니다)
// 우측 검정결과 p값이 100%이므로 귀무가설 채택(낙태찬성비율은 0.5 이하)



** bys [더미변수] : prtest [검정대상 변수]    - 1개 표본 범주별 모비율 검정

bys sex: prtest abany1 == 0.5

/*
-> sex = male

One-sample test of proportion                   Number of obs      =       484

------------------------------------------------------------------------------
    Variable |       Mean   Std. err.                     [95% conf. interval]
-------------+----------------------------------------------------------------
      abany1 |   .4442149   .0225854                      .3999484    .4884814
------------------------------------------------------------------------------
    p = proportion(abany1)                                        z =  -2.4545
H0: p = 0.5

     Ha: p < 0.5                 Ha: p != 0.5                   Ha: p > 0.5
 Pr(Z < z) = 0.0071         Pr(|Z| > |z|) = 0.0141          Pr(Z > z) = 0.9929

----------------------------------------------------------------------------------------------------------------
-> sex = female

One-sample test of proportion                   Number of obs      =       416

------------------------------------------------------------------------------
    Variable |       Mean   Std. err.                     [95% conf. interval]
-------------+----------------------------------------------------------------
      abany1 |   .4134615   .0241446                      .3661391     .460784
------------------------------------------------------------------------------
    p = proportion(abany1)                                        z =  -3.5301
H0: p = 0.5

     Ha: p < 0.5                 Ha: p != 0.5                   Ha: p > 0.5
 Pr(Z < z) = 0.0002         Pr(|Z| > |z|) = 0.0004          Pr(Z > z) = 0.9998
*/
// 남성/여성 두 그룹 모두 낙태 찬성 비율이 0.5 미만이라는 연구가설을 채택(찬성비율이 0.5라는 귀무가설 기각)


** prtest [1집단 변수] == [2집단 변수]  : 2개 표본 모비율 검정 (wide type)

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_2.dta", clear

prtest treat==control

/*
Two-sample test of proportions                 treat: Number of obs =       30
                                             control: Number of obs =       30
------------------------------------------------------------------------------
    Variable |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       treat |         .6   .0894427                      .4246955    .7753045
     control |   .3333333   .0860663                      .1646465    .5020202
-------------+----------------------------------------------------------------
        diff |   .2666667   .1241266                       .023383    .5099503
             |  under H0:   .1288122     2.07   0.038
------------------------------------------------------------------------------
        diff = prop(treat) - prop(control)                        z =   2.0702
    H0: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.9808         Pr(|Z| > |z|) = 0.0384          Pr(Z > z) = 0.0192

*/
// 양측 검정결과 p값이 0.05보다 작으므로 5% 유의수준에서 귀무가설 기각
// 우측 검정결가 p값이 0.05보다 작으므로 5% 유의수준에서 귀무가설 기각 
// =두 집단의차이는 0보다 크다 = treat 그룹의 성공비율이 control 그룹보다 더 크다


** prtest [새로운 변수], by (_stack)   - 2개 표본 모비율 검정(wide type을 long type으로 변환하여)

stack treat control, into(cure)
// wide type을 long type으로 변환 stack [변수1] [변수2], into(새로운 변수)
// treat이 _stack의 1, contorl이 _stack의 2로 변환

tab _stack cure

prtest cure, by(_stack) 

/*
Two-sample test of proportions                     1: Number of obs =       30
                                                   2: Number of obs =       30
------------------------------------------------------------------------------
       Group |       Mean   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           1 |         .6   .0894427                      .4246955    .7753045
           2 |   .3333333   .0860663                      .1646465    .5020202
-------------+----------------------------------------------------------------
        diff |   .2666667   .1241266                       .023383    .5099503
             |  under H0:   .1288122     2.07   0.038
------------------------------------------------------------------------------
        diff = prop(1) - prop(2)                                  z =   2.0702
    H0: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.9808         Pr(|Z| > |z|) = 0.0384          Pr(Z > z) = 0.0192
*/
// 우측 검정결과 p값이 0.05보다 낮으므로 5% 유의수준에서 귀무가설 기각
// = _stack 값 1인 treat이 _stack 값 2인 control보다 성공비율이 높다


** bitest : 이항확률분포(binomial) 모비율 검정

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_3.dta", clear

bitest quick == 0.3

/*
Binomial probability test

    Variable |            N   Observed k   Expected k   Assumed p   Observed p
-------------+----------------------------------------------------------------
       quick |           15            7          4.5     0.30000      0.46667

  Pr(k >= 7)           = 0.131143  (one-sided test)
  Pr(k <= 7)           = 0.949987  (one-sided test)
  Pr(k <= 1 or k >= 7) = 0.166410  (two-sided test)
*/
// 좌측/양측/우측 검정결과 모두 0.05보다 크므로 모비율이 0.3이라는 귀무가설을 채택


** sdtest [변수] == 00 : 1개 표본의 모표준편차 검정

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_1.dta",clear

sdtest hrs1==14

/*
One-sample test of variance
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
    hrs1 |   1,729    41.77675    .3516739    14.62304      41.087     42.4665
------------------------------------------------------------------------------
    sd = sd(hrs1)                                          c = chi2 =  1.9e+03
H0: sd = 14                                      Degrees of freedom =     1728

     Ha: sd < 14                 Ha: sd != 14                   Ha: sd > 14
  Pr(C < c) = 0.9955         2*Pr(C > c) = 0.0091           Pr(C > c) = 0.0045
*/
// 양측 검정결과 p값이 0.05보다 작으므로 귀무가설을 기각
// = 모표준편차는 14라고 말할 수 없다


** by [범주 구분 변수], sort : sdtest [검정대상 변수] == 00    - 1개 표본내 2개 그룹 각각 검정
by sex, sort : sdtest hrs1 == 14

/*
-> sex = male

One-sample test of variance
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
    hrs1 |     843    45.08897    .5155687    14.96926    44.07702    46.10092
------------------------------------------------------------------------------
    sd = sd(hrs1)                                          c = chi2 = 962.6241
H0: sd = 14                                      Degrees of freedom =      842

     Ha: sd < 14                 Ha: sd != 14                   Ha: sd > 14
  Pr(C < c) = 0.9976         2*Pr(C > c) = 0.0047           Pr(C > c) = 0.0024

----------------------------------------------------------------------------------------------------------------
-> sex = female

One-sample test of variance
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
    hrs1 |     886    38.62528    .4556319    13.56223    37.73104    39.51953
------------------------------------------------------------------------------
    sd = sd(hrs1)                                          c = chi2 = 830.5183
H0: sd = 14                                      Degrees of freedom =      885

     Ha: sd < 14                 Ha: sd != 14                   Ha: sd > 14
  Pr(C < c) = 0.0957         2*Pr(C < c) = 0.1913           Pr(C > c) = 0.9043

*/ 
// 납성그룹은 5% 유의수준에서 표준편차가 14보다 크다
// 여성그룹은 5% 유의수준에서 표준편차가 14라는 귀무가설을 기각할 수 없다


** sdtest [1집단 변수] == [2집단 변수]

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_4.dta",clear

sdtest inc_male==inc_female

/*
Variance ratio test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
inc_male |      19        2728    508.9954    2218.659     1658.64     3797.36
inc_fe~e |      14    2272.714    674.6932    2524.471    815.1283      3730.3
---------+--------------------------------------------------------------------
Combined |      33    2534.848    404.8982    2325.963    1710.098    3359.599
------------------------------------------------------------------------------
    ratio = sd(inc_male) / sd(inc_female)                         f =   0.7724
H0: ratio = 1                                    Degrees of freedom =   18, 13

    Ha: ratio < 1               Ha: ratio != 1                 Ha: ratio > 1
  Pr(F < f) = 0.3000         2*Pr(F < f) = 0.6001           Pr(F > f) = 0.7000
*/
// 귀무가설을 기각할 수 없음 = 두 그룹(남성/여성)의 모분산은 같다


** sdtest [새로운 변수], by(_stack)       - long type으로 변환하여 검정 

stack inc_male inc_female, into(income)

sdtest income, by(_stack) // _stack을 기준으로 두 개 그룹의 income 모표준편차 검정

/*
Variance ratio test
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       1 |      19        2728    508.9954    2218.659     1658.64     3797.36
       2 |      14    2272.714    674.6932    2524.471    815.1283      3730.3
---------+--------------------------------------------------------------------
Combined |      33    2534.848    404.8982    2325.963    1710.098    3359.599
------------------------------------------------------------------------------
    ratio = sd(1) / sd(2)                                         f =   0.7724
H0: ratio = 1                                    Degrees of freedom =   18, 13

    Ha: ratio < 1               Ha: ratio != 1                 Ha: ratio > 1
  Pr(F < f) = 0.3000         2*Pr(F < f) = 0.6001           Pr(F > f) = 0.7000
*/
// 유의수준 5%에서 귀무가설을 기각할 수 없음 = 두 그룹(남성/여성)의 모표준편차는 동일하다


** robvar [새로운 변수], by(_stack)  - 정규분포 조건이 만족되지 않을 때 모분산 가설검정(by Levene)

robvar income, by(_stack)

/*
            |          Summary of income
     _stack |        Mean   Std. dev.       Freq.
------------+------------------------------------
          1 |        2728   2218.6595          19
          2 |   2272.7143   2524.4707          14
------------+------------------------------------
      Total |   2534.8485    2325.963          33

W0  =  0.41287742   df(1, 31)     Pr > F = 0.52523681

W50 =  0.05588585   df(1, 31)     Pr > F = 0.81467592

W10 =  0.12462038   df(1, 31)     Pr > F = 0.72646558
*/

// W0 : 각 그룹 평균  W50 : 각 그룹 중앙값  W10 : 각 그룹 상하위 5% 절사한 평균
// p값이 0.05보다 크므로 5% 유의수준에서 귀무가설을 기각할 수 없음 = 두 집단의 모분산이 동일하다


** robmean [변수], trim(0.05) 10% 절사평균 구하기 

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data2_4.dta", clear

findit robmean

stack inc_male inc_female, into(income)

robmean income, trim(0.05)

/*
.05 highest and lowest cases trimmed or winsorized

      income |       Obs       Estimate     Std. Dev.       Min         Max
---------------------------------------------------------------------------
        mean |        33     2534.8485     2325.9630           0       9200
      median |        33     1920.0000
     trimmed |        31     2401.6129     2014.3098           0       8160
  winsorized |        33     2503.3333     2238.2236           0       8160
huber 1-step |        24     2006.6639                       360       3480
  mod 1-step |        28     1708.9286     1161.3141           0       4700
  multi-step |        31     1796.2085                         0       6800
---------------------------------------------------------------------------
*/
//trimmed의 관측치 31개 추정량 2401.61이 10% 절사평균임

 

[출처]  STATA 기초통계와 회귀분석(민인식, 최필선, 2012), 한국STATA학회 홈페이지(http://kastata.org/html/sub02-04.asp)

반응형

댓글