초판 2023.8.30
세상의 모든 일이 원인이 있고 결과가 있다. 즉, 원인 없이 생겨난 것은 없다.
우리가 원인을 안다면 결과에 영향을 미칠 수 있다. 이런 인과(causality)에 대한 생각 때문에 인과 관계(causal relationship)에 대한 탐구는 유사 이래로 종교, 철학, 과학의 중요한 주제였다. 인과 관계에 대한 탐구를 담론의 수준에서 과학적인 분석의 수분으로 변환하는 노력이 인과 분석(causal analysis)이다.
인과 분석에 대한 기법 연구는 매우 오랜 역사를 가지고 있다. 최근 머신러닝 기법이 급격히 발전하며 이를 인과 분석에 도입하고자 하는 노력이 증가하고 있다.
그러나, 머신러닝을 도입한 인과 분석은 아직 엄밀한 체계가 갖추어지지 않았다.
이 책의 목적은 머신러닝을 활용한 인과 분석 접근 방법에 대한 간략한 소개를 하는 것이다. 이 책에서 다룬 내용들은 깊은 이론적 논의를 배제하고 실무적 관점에서 어떻게 사용할 수 있을지에 대해 중점을 두었다. 사실 각 장의 내용이 하나의 과목이 될 수 있을 정도로 방대한 내용들이 있어 이 책에서 학습한 것만으로는 부족할 것이다. 따라서 이 책에서 제안된 방법을 통해 연구를 수행하려는 경우에는 관련 분야별로 좀 더 학습이 필요하다.
이 책의 내용은 지난 2년여간 대학원에서 강의와 스터디를 통해 논의되었던 내용을 기반으로 하고 있다. 이 책을 출간하며 그간 대학원에서 같이 공부했던 많은 학생들에게 감사하고 싶다. 아무쪼록 독자들이 이 책을 통해 머신러닝 인과 분석에 대한 관심이 어느 정도 충족되었으면 하는 바람이다.
김양석, 노미진, 한무명초
Chapter 00 서장
저술 목적··················································· 10
운영시스템················································· 14
개발 환경··················································· 14
Chapter 01 가설 검정
서론·························································· 26
가설과 가설 검정·········································· 26
가설······················································· 26
가설 검정················································ 27
검증의 단계················································ 28
1단계: 귀무가설 및 대립가설 설명················· 28
2단계: 데이터 수집···································· 29
3단계: 통계 테스트 수행······························ 29
4단계: 귀무가설 기각 여부 결정···················· 30
5단계: 연구 결과 제시································ 30
검증 오류················································ 31
가설 검정 사례············································ 32
데이터 로드············································· 32
정규성 검증에 대한 가설 검정······················· 34
상관성 검증에 대한 가설 검정······················· 36
모수 통계 가설 검정··································· 39
비모수 통계 가설 검정································ 44
결론·························································· 48
Chapter 02 선형 회귀 모델링
서론·························································· 50
모델과 모델링············································· 50
데이터셋···················································· 51
단순 회귀 분석············································ 54
가설설정················································· 54
모델링···················································· 54
모델링 결과············································· 54
AIC······················································· 60
다중 회귀 분석············································ 82
모델링···················································· 88
모델링 결과············································· 88
회귀 모델 가정 검정··································· 92
결론·························································104
Chapter 03 이산 회귀 모델링
서론·························································106
모델링 기법···············································106
로짓(Logit) 모형····································107
프로빗 모형···········································107
로짓과 프로빗 모형의 차이점······················108
데이터 분석 사례·········································109
Step 1: 라이브러리 가져오기·····················109
Step 2: 데이터 로딩 및 이해······················109
Step 3: 가설 설정···································110
Step 4: 데이터 준비································110
Step 5: Logit 모델링·······························113
Step 6: Probit 모델링·····························116
결론·························································118
Chapter 04 인과 추론 분석
서론·························································120
인과 추론의 4 단계······································121
모델에서 목표 추정치 식별·························124
확인된 추정치를 기반으로 인과 추론·············125
획득한 추정치에 대한 반박·························126
DoWhy 인과 추론의 특징·····························128
명시적 식별 가능·····································128
식별과 추정의 분리··································128
자동화된 견고성 검사·······························128
확장성··················································129
인과 추론 분석 사례 - 호텔 예약 취소···············129
Step 1: 라이브러리 가져오기·····················130
Step 2: 데이터 로딩 및 데이터 이해·············130
Step 3: 데이터 준비································133
Step 4: DoWhy를 활용한 인과 관계 추정····142
결론·························································151
Chapter 05 인과 발견 분석
서론·························································154
패키지 설치···············································154
분석 방법 이해···········································155
Step 1: 라이브러리 가져오기·····················156
Step 2: 검증 데이터 생성··························156
Step 3: 인과 관계 발견····························158
Step 4: 오차 변수 간의 독립성 검증·············159
분류 문제의 인과 발견··································160
Step 1: 라이브러리 가져오기·····················160
Step 2: 커스텀 함수 만들기·······················161
Step 3: 데이터 로딩하기···························161
Step 4: 모델링 하기································162
Step 5: 변수 오차 간 독립성 검증···············165
Step 6: 예측 모델 생성과 예측 영향도 분석···166
수치 예측 문제의 인과 발견····························167
Step 1: 라이브러리 가져오기·····················167
Step 2: 커스텀 함수 만들기·······················168
Step 3: 데이터 로딩하기···························168
Step 4: 모델링 하기································170
Step 5: 변수 오차 간 독립성 검증··············· 172
Step 6: 예측 모델 생성과 예측 영향도 분석··· 172
Step 7: 최적 개입의 추정·························· 173
결론·························································174
Chapter 06 인과 영향 분석
서론·························································176
Causal Impact··········································177
모델의 동작 방식의 이해···························· 179
폭스바겐 인과 영향 분석 사례·························188
Step 1: 라이브러리 로딩··························· 188
Step 2: 데이터 로딩 및 데이터 이해············· 189
Step 3: 기본 모델 분석···························· 191
Step 4: 시계열 성분 분해·························· 195
Step 5: 사용자 정의 모델·························· 197
결론·························································202
Chapter 07 반대사실 분석
서론·························································206
소득 분류 반대사실 분석·······························207
Step 1: 라이브러리 가져오기····················· 207
Step 2: 데이터셋 로딩 및 이해··················· 207
Step 3: DiCE로 카운터 팩트 생성··············· 209
Step 4: 카운터 팩츄얼 사례 기반 속성 중요도··· 216
주택 가격 예측 반대사실 분석 사례··················219
Step 1: 라이브러리 로딩···························219
Step 2: 데이터 로딩 및 이해······················220
Step 3: DiCE로 카운터 팩트 생성···············223
Step 4: 카운터 팩츄얼 기반 속성 중요도·······225
결론·························································228
참고문헌···················································229
색인·························································232