초판 2023.8.30

세상의 모든 일이 원인이 있고 결과가 있다. 즉, 원인 없이 생겨난 것은 없다.

우리가 원인을 안다면 결과에 영향을 미칠 수 있다. 이런 인과(causality)에 대한 생각 때문에 인과 관계(causal relationship)에 대한 탐구는 유사 이래로 종교, 철학, 과학의 중요한 주제였다. 인과 관계에 대한 탐구를 담론의 수준에서 과학적인 분석의 수분으로 변환하는 노력이 인과 분석(causal analysis)이다.

인과 분석에 대한 기법 연구는 매우 오랜 역사를 가지고 있다. 최근 머신러닝 기법이 급격히 발전하며 이를 인과 분석에 도입하고자 하는 노력이 증가하고 있다.

그러나, 머신러닝을 도입한 인과 분석은 아직 엄밀한 체계가 갖추어지지 않았다.

이 책의 목적은 머신러닝을 활용한 인과 분석 접근 방법에 대한 간략한 소개를 하는 것이다. 이 책에서 다룬 내용들은 깊은 이론적 논의를 배제하고 실무적 관점에서 어떻게 사용할 수 있을지에 대해 중점을 두었다. 사실 각 장의 내용이 하나의 과목이 될 수 있을 정도로 방대한 내용들이 있어 이 책에서 학습한 것만으로는 부족할 것이다. 따라서 이 책에서 제안된 방법을 통해 연구를 수행하려는 경우에는 관련 분야별로 좀 더 학습이 필요하다.

이 책의 내용은 지난 2년여간 대학원에서 강의와 스터디를 통해 논의되었던 내용을 기반으로 하고 있다. 이 책을 출간하며 그간 대학원에서 같이 공부했던 많은 학생들에게 감사하고 싶다. 아무쪼록 독자들이 이 책을 통해 머신러닝 인과 분석에 대한 관심이 어느 정도 충족되었으면 하는 바람이다.

김양석, 노미진, 한무명초

Chapter 00 서장

저술 목적··················································· 10

운영시스템················································· 14

개발 환경··················································· 14

Chapter 01 가설 검정

서론·························································· 26

가설과 가설 검정·········································· 26

가설······················································· 26

가설 검정················································ 27

검증의 단계················································ 28

1단계: 귀무가설 및 대립가설 설명················· 28

2단계: 데이터 수집···································· 29

3단계: 통계 테스트 수행······························ 29

4단계: 귀무가설 기각 여부 결정···················· 30

5단계: 연구 결과 제시································ 30

검증 오류················································ 31

가설 검정 사례············································ 32

데이터 로드············································· 32

정규성 검증에 대한 가설 검정······················· 34

상관성 검증에 대한 가설 검정······················· 36

모수 통계 가설 검정··································· 39

비모수 통계 가설 검정································ 44

결론·························································· 48

Chapter 02 선형 회귀 모델링

서론·························································· 50

모델과 모델링············································· 50

데이터셋···················································· 51

단순 회귀 분석············································ 54

가설설정················································· 54

모델링···················································· 54

모델링 결과············································· 54

AIC······················································· 60

다중 회귀 분석············································ 82

모델링···················································· 88

모델링 결과············································· 88

회귀 모델 가정 검정··································· 92

결론·························································104

Chapter 03 이산 회귀 모델링

서론·························································106

모델링 기법···············································106

로짓(Logit) 모형····································107

프로빗 모형···········································107

로짓과 프로빗 모형의 차이점······················108

데이터 분석 사례·········································109

Step 1: 라이브러리 가져오기·····················109

Step 2: 데이터 로딩 및 이해······················109

Step 3: 가설 설정···································110

Step 4: 데이터 준비································110

Step 5: Logit 모델링·······························113

Step 6: Probit 모델링·····························116

결론·························································118

Chapter 04 인과 추론 분석

서론·························································120

인과 추론의 4 단계······································121

모델에서 목표 추정치 식별·························124

확인된 추정치를 기반으로 인과 추론·············125

획득한 추정치에 대한 반박·························126

DoWhy 인과 추론의 특징·····························128

명시적 식별 가능·····································128

식별과 추정의 분리··································128

자동화된 견고성 검사·······························128

확장성··················································129

인과 추론 분석 사례 - 호텔 예약 취소···············129

Step 1: 라이브러리 가져오기·····················130

Step 2: 데이터 로딩 및 데이터 이해·············130

Step 3: 데이터 준비································133

Step 4: DoWhy를 활용한 인과 관계 추정····142

결론·························································151

Chapter 05 인과 발견 분석

서론·························································154

패키지 설치···············································154

분석 방법 이해···········································155

Step 1: 라이브러리 가져오기·····················156

Step 2: 검증 데이터 생성··························156

Step 3: 인과 관계 발견····························158

Step 4: 오차 변수 간의 독립성 검증·············159

분류 문제의 인과 발견··································160

Step 1: 라이브러리 가져오기·····················160

Step 2: 커스텀 함수 만들기·······················161

Step 3: 데이터 로딩하기···························161

Step 4: 모델링 하기································162

Step 5: 변수 오차 간 독립성 검증···············165

Step 6: 예측 모델 생성과 예측 영향도 분석···166

수치 예측 문제의 인과 발견····························167

Step 1: 라이브러리 가져오기·····················167

Step 2: 커스텀 함수 만들기·······················168

Step 3: 데이터 로딩하기···························168

Step 4: 모델링 하기································170

Step 5: 변수 오차 간 독립성 검증··············· 172

Step 6: 예측 모델 생성과 예측 영향도 분석··· 172

Step 7: 최적 개입의 추정·························· 173

결론·························································174

Chapter 06 인과 영향 분석

서론·························································176

Causal Impact··········································177

모델의 동작 방식의 이해···························· 179

폭스바겐 인과 영향 분석 사례·························188

Step 1: 라이브러리 로딩··························· 188

Step 2: 데이터 로딩 및 데이터 이해············· 189

Step 3: 기본 모델 분석···························· 191

Step 4: 시계열 성분 분해·························· 195

Step 5: 사용자 정의 모델·························· 197

결론·························································202

Chapter 07 반대사실 분석

서론·························································206

소득 분류 반대사실 분석·······························207

Step 1: 라이브러리 가져오기····················· 207

Step 2: 데이터셋 로딩 및 이해··················· 207

Step 3: DiCE로 카운터 팩트 생성··············· 209

Step 4: 카운터 팩츄얼 사례 기반 속성 중요도··· 216

주택 가격 예측 반대사실 분석 사례··················219

Step 1: 라이브러리 로딩···························219

Step 2: 데이터 로딩 및 이해······················220

Step 3: DiCE로 카운터 팩트 생성···············223

Step 4: 카운터 팩츄얼 기반 속성 중요도·······225

결론·························································228

참고문헌···················································229

색인·························································232