본문 바로가기
카테고리 없음

[STATA] 비모수적 검정

by e-money2580 2023. 1. 10.
반응형


** 콜모고로프-스미르노프 검정 
// 연속형 변수의 확률분포가 이론적 확률분포와 같은지 검정
// 가설검정을 위해 모평균과 같은 특정한 모수를 요구하지 않기 때문에 비모수적 검정임

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data7_1.dta",clear

su wage
/*
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      2,246    7.766949    5.755523   1.004952   40.74659
*/

ksmirnov wage = normal((wage-r(mean))/r(sd)) 
// 귀무가설 : 표본 데이터가 주어진 이론적 확률분포를 따른다
// normal((wage-r(mean))/r(sd)) : wage 변수를 표준화하여 표준정규 변수의 누적확률을 계산

/*
One-sample Kolmogorov–Smirnov test against theoretical distribution
      normal((wage-r(mean))/r(sd))

Smaller group             D     p-value  
---------------------------------------
wage                 0.1464       0.000
Cumulative          -0.1579       0.000
Combined K-S         0.1579       0.000

Note: Ties exist in dataset; 
      there are 967 unique values out of 2246 observations.
*/
// 5% 유의수준에서 귀무가설을 기각 = wage 변수의 확률분포는 정규분포를 따른다고 말할 수 없다



** 윌콕슨 순위합 검정 
// 두 그룹의 표본이 서로 같은 분포에서 나왔는지를 비모수적으로 검정 (두 변수의 분포가 같은지 검정)
// 성별과 연령이 기록된 데이터에서, 연령 분포가 남녀간에 서로 동일한지 검정
use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data7_2.dta", clear

ranksum age, by(gender)
/*

Two-sample Wilcoxon rank-sum (Mann–Whitney) test

      gender |      Obs    Rank sum    Expected
-------------+---------------------------------
           0 |       54      3063.5        2997
           1 |       56      3041.5        3108
-------------+---------------------------------
    Combined |      110        6105        6105

Unadjusted variance    27972.00
Adjustment for ties     -164.95
                     ----------
Adjusted variance      27807.05

H0: age(gender==0) = age(gender==1)
         z =  0.399
Prob > |z| = 0.6900
Exact prob = 0.6923
*/
// p값이 0.05보다 크므로 5% 유의수준에서 귀무가설(성별에 따른 연령분포는 같다)을 기각할 수 없음



** 윌콕슨 부호순위합 검정임
// 두 그룹의 표본이 서로 독립이 아닌 경우 두 변수이 분포가 같은지 검정 
// 차 12대의 연비를 비오는 날, 맑은 날 두 번씩 측정, 비오는 날과 맑은 날의 연비 변수는 동일한 자동차로 서로 짝지어서 측정했으므로 서로 독립이라고 할 수 없다

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data7_3.dta",clear

signrank mpg1=mpg2
/*
Wilcoxon signed-rank test

        Sign |      Obs   Sum ranks    Expected
-------------+---------------------------------
    Positive |        3        13.5        38.5
    Negative |        8        63.5        38.5
        Zero |        1           1           1
-------------+---------------------------------
         All |       12          78          78

Unadjusted variance      162.50
Adjustment for ties       -1.63
Adjustment for zeros      -0.25
                     ----------
Adjusted variance        160.63

H0: mpg1 = mpg2
         z = -1.973
Prob > |z| = 0.0485
Exact prob = 0.0479
*/
// 5% 유의수준에서 귀무가설(비오는 날의 연비와 맑은 날의 연비는 같다)을 기각하므로, 두 날의 연비는 같지 않다.


** 크러스칼-월리스 순위 검정임
// 비교 대상 그룹이 3개 이상일 때 각 변수의 분포가 같은지 검정
// 미국 50개 주 인구통계, region은 4개 지역 범주로 구분한 변수, medage는 각 지역 인구의 중위값 연령
// 각 지역 연령이 동일한 모집단에서 나왔는지(각 지역별 연령 분포가 동일한지) 검정임

use "D:\STATA연습데이터\STATA기초통계와회귀분석\R_data7_4.dta",clear

kwallis medage, by(region)
/*
Kruskal–Wallis equality-of-populations rank test

  +--------------------------+
  | region  | Obs | Rank sum |
  |---------+-----+----------|
  | NE      |   9 |   376.50 |
  | N Cntrl |  12 |   294.00 |
  | South   |  16 |   398.00 |
  | West    |  13 |   206.50 |
  +--------------------------+

  chi2(3) = 17.041
     Prob = 0.0007

  chi2(3) with ties = 17.062
               Prob = 0.0007
*/
// 5% 유의수준에서 귀무가설(4개 지역의 연령 분포는 같다)을 기각하므로, 4개 지역의 연령분포는 서로 다르다



** 중앙값 검정임

median medage, by(region) medianties(below)
// medianties(blow) : 어떤 관측치가 전체 중앙값과 동일한 경우, 중앙값보다 낮은 그룹으로 포함

/*
Median test

   Greater |
  than the |                Census region
    median |        NE    N Cntrl      South       West |     Total
-----------+--------------------------------------------+----------
        no |         1          7          8          9 |        25 
       yes |         8          5          8          4 |        25 
-----------+--------------------------------------------+----------
     Total |         9         12         16         13 |        50 

          Pearson chi2(3) =   7.7009   Pr = 0.053
*/
// 5% 유의수준에서 귀무가설(4개 지역표본은 동일한 중앙값을 갖는다)을 기각할 수 없음

[출처]  기초통계와 회귀분석(민인식, 최필선, 2012), 한국STATA학회 홈페이지(http://kastata.org/html/sub02-04.asp)

반응형

댓글