Columbia University Narrative Intelligence Lab

Columbia University
Narrative Intelligence Lab 1-5

CUØ /'kjuːnɪl/

CUØ /'kjuːnɪl/
Research Team Events Book Series Resources Memos
Lab Memo
Author: Dennis Yi Tenen
May 10, 2025
On Study Design in Computational Humanities

Reading Thad Dunning’s Natural Experiments in the Social Science (Cambridge, 2012) I am particularly struck by his discussion of study design. “How can causal inference be improved?” he asks on page 4 and answers: “In seeking to answer such questions, I place central emphasis on natural experiments as a ‘design-based’ method of research — one in which control over confounding variables comes primarily from research-design choices, rather than ex post adjustment using parametric statistical models (4).”

This approach seems particularly well-suited for computational study in the humanities, where “the veracity of causal and statistical assumptions that are often difficult to explicate and defend — let alone validate.” The natural experiment approach seeks to shift reasoning about such assumptions from the statistical modeling part of the research process, expressed mathematically, to the design process, expressed in the logic of the world observed: “With natural experiments, it is the research design, rather than the statistical modeling, that compels conviction.”

For this reason, Dunning writes, “substantive and contextual knowledge plays an important role at every stage of natural-experimental research — from discovery to analysis to evaluation.” The emphasis on context necessitates thinking about statistical concepts such as “effect,” in more specified, historical terms. The influence of one author on another, for example, depends crucially on contingent facts about their biography, their publication history, ideology, genre conventions, and numerous other factors worthy of consideration. The design-approach asks us to ground abstract statistical relationships firmly within concrete historical contexts and detailed interpretive frameworks.

As a consequence of reasoning about complicated contexts, the quantitative analysis of natural experiments tends to be simple. Dunning writes: “Often, a minimum of mathematical manipulation is involved. For example, straightforward contrasts between the treatment and control groups — such as the difference in average outcomes in these two groups — often suffices to provide evidence of causal effects (105).” The potential simplicity of quantitative data analysis makes the statistical results easier to convey and interpret, Dunning writes. “Rather than presenting the estimated coefficients from multivariate models in long tables of regression results,” he concludes, “analysts may have more space in articles to discuss the research design and substantive import of the results. I would add this also makes them easier to peer-review.

Simplicity ultimately breeds transparency. Again, Dunning: “Analyzing data from strong research designs — including true and natural experiments — requires analysts to invoke assumptions about the process that gives rise to observed data (106).” For me, here finally lies the subtle but crucial point of his argument: all of the above remains true not just for natural experiments, but for strong research study design in computational humanities and social sciences more generally. Christopher H. Achen makes a similar point in his wonderful paper on “Garbage-Can Regressions,” arguing for “sophisticated simplicity” in study design, engaging more “creatively” with the data.

The study-design mindset fits well with my organic inclinations as a humanist. I don’t normally reason by data manipulation. Reasoning by data manipulation alone risks “cooking the books” in losing sight of the underlying social or linguistic dynamics. The vagrancies of culture force me to think contextually: in terms of processes, timelines, customs, genres, relationships, narratives, etc. And I would like to remain firmly grounded in that realm when doing computational research.