22-30.102 Causal Inference in Natural Language Processing

Course offering details

Instructors: Dr. Thi Thanh Huyen Nguyen

Event type: Interactive class

Displayed in timetable as: 22-3.e86

Hours per week: 3

Credits: 6,0

Language of instruction: English

Min. | Max. participants: - | 45

Comments/contents:
This course introduces causal inference techniques for social scientists through the lens of applied microeconomics, with a specific application in using text as data in social sciences. The ability to determine causal pathways is instrumental for social science research and holds the key to understanding the effects of policy. The ever-increasing availability of large-scale text corpora and computing power, along with advances in methods theory has boosted researchers’ ability to estimate causal relationships, not just between policy variables and demographic information, but also hidden novel insights about relevant institutional and human patterns in texts.


To make a conclusion of causality when two variables are correlated, economists typically describe the need to rule out two main alternatives: reverse causality and omitted variable bias (OVB). The reverse causality problem occurs when people infer that one factor causes a second, but in reality, it is the second that causes the first. For example, when seeing smoke and firetrucks in the same place, the false conclusion that the firetrucks caused the smoke would be one of reverse causality. Related is simultaneous causation, which occurs when both reverse causation and direct causation occur at the same time.

Learning objectives:
Comprehensive overview of contemporary causal inference methods in social science policy questions, especially in the context of text as data.

Familiarity with statistical and practical issues around text data in policy impact evaluation.

Ability to critically present relevant scientific articles/applications to peers and discuss presentations of other participants.

Ability to fit, interpret and apply causal inference analysis techniques in chosen policy contexts to independent research projects.

Didactic concept:
The first 3- 4 lectures will go through the key fundamentals of contemporary causal inference models and their applications in economics. Examples include, but are not limited to: how texts on job ads impact different types of applicants; how the market reacts to FOMC releases; how deliberation affects opinion changes on a controversial topic.

The subsequent lectures will explore various applications of Text as Outcome, as Treatment, and as Mediator, in the form of flipped classrooms. These sessions consist of a short lecture, followed by either: (1) a presentation by groups of 2-3 students on the required readings of that week and critical discussions from remaining class members; or (2) an interactive coding exercise session in groups.

Participants are required to read and prepare necessary materials before class, to maximize the interaction across groups.

Literature:
This course requires a basic understanding of Python, statistics, probability theories, and applied econometric techniques used in social sciences. 

Participants are also expected to be familiar with the basics of text as data (text data acquisition, cleaning, feature reduction, and basics of relevant supervised & unsupervised ML methods)  and syntaxes of Python & Stata (at the typical level obtained after BSc Econometrics course).

Appointments
Date From To Room Instructors
1 Tue, 5. Apr. 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
2 Tue, 12. Apr. 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
3 Tue, 19. Apr. 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
4 Tue, 26. Apr. 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
5 Tue, 3. May 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
6 Tue, 10. May 2022 08:00 11:00 WiWi 2091/2201 Dr. Thi Thanh Huyen Nguyen
7 Tue, 17. May 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
8 Tue, 31. May 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
9 Tue, 7. Jun. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
10 Tue, 14. Jun. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
11 Tue, 21. Jun. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
12 Tue, 28. Jun. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
13 Tue, 5. Jul. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
14 Tue, 12. Jul. 2022 08:00 11:00 WiWi 2101/2105 Dr. Thi Thanh Huyen Nguyen
Exams in context of modules
Module (start semester)/ Course Exam Date Instructors Compulsory pass
22-3.E86 Causal Inference in Natural Language Processing (SuSe 22) / 22-3.e86  Causal Inference in Natural Language Processing 1  Paper and presentation of project findings Time tbd Dr. Thi Thanh Huyen Nguyen Yes
Course specific exams
Description Date Instructors Mandatory
1. Paper and project Time tbd Yes
Class session overview
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
Instructors
Dr. Thi Thanh Huyen Nguyen