01 / 01

AI,
Ideas,
the Future of Research

Lecture
Outline
Today
The talk
in six parts.
Ideas, old gates, automation, agents, research workflows, and one practical pipeline.
01
Ideas Matter, Always
History, capital, and the central place of ideas.
02
The Old Gates
Knowledge, access, faster answers, and the new scarcity.
03
From Assistance to Automation
ANI, AI-generated papers, and the Nature case.
04
From Chatbox to Agents
Skills, agents, orchestration, and pipelines.
05
From Questions to Workflows
API-scale research and two concrete examples.
06
Practice
One question, one pipeline, one live research workflow.
Fatih Kansoy
University of Oxford
02
Section
01
01
Ideas Matter,
Always.
History Capital Ideas
Fatih Kansoy
University of Oxford
02
Section 01
Ideas Matter, Always
A long time ago in a galaxy far, far away....
GDP per capita, 1 to 2018
Fatih Kansoy
University of Oxford
03
Section 01
Ideas Matter, Always
No Industry Without Order
Leviathan
"
Whatsoever therefore is consequent to a time of Warre, where every man is Enemy to every man... [there is] no place for Industry... no Society; and which is worst of all, continually feare, and danger of violent death; And the life of man, solitary, poor, nasty, brutish, and short.
Hobbes’s point: before growth, trade, or innovation, society needs order. Security is the precondition for industry.
Portrait of Thomas Hobbes
Thomas Hobbes
Leviathan · 1651
Fatih Kansoy
University of Oxford
04
Section 01
Ideas Matter, Always
A Growth Story: Solow, Romer, and Jones
Solow
Focus
Physical Capital
Output rises through capital deepening, but each extra unit of capital adds less than the one before.
Romer
Focus
Ideas
Ideas are produced inside the economy and can be used by many people without being used up.
Jones
Focus
Ideas Harder
As the frontier moves out, more researchers are needed to sustain the same pace of idea creation.
Fatih Kansoy
University of Oxford
05
Section 01
Ideas Matter, Always
A Growth Story · 1 of 3
Solow
Focus: Physical Capital
Output rises through capital deepening, but each extra unit of capital adds less than the one before.
Fatih Kansoy
University of Oxford
05
Section 01
Ideas Matter, Always
A Growth Story · 2 of 3
Romer
Focus: Ideas
Ideas are produced inside the economy and can be used by many people without being used up.
Fatih Kansoy
University of Oxford
05
Section 01
Ideas Matter, Always
A Growth Story · 3 of 3
Jones
Focus: Ideas Harder
As the frontier moves out, more researchers are needed to sustain the same pace of idea creation.
Fatih Kansoy
University of Oxford
05
Section 01
Ideas Matter, Always
“Everything has been invented”
E = mc²
F = ma
dP / dt
Patent Office
Filed 1899
Portrait of Charles H. Duell
Charles H. Duell
U.S. Commissioner of Patents
Commonly attributed, 1899
"
Everything that can be invented
has been invented.
The deeper lesson is not the quote itself. The mistake is perennial: each era keeps treating the limits of current imagination as the limits of invention.
Fatih Kansoy
University of Oxford
06
Section
02
02
The Old Gates
& The New Scarcity
Chained books The broken monopoly Faster answers The right question
Fatih Kansoy
University of Oxford
11
Section 02
The Old Gates
Chained books in a medieval library
The Old Scarcity
Knowledge Was Gatekept
Before print and mass education, most people were excluded before they could even begin to read, think, or write.
01
Physical Access
Books were scarce, expensive, and often chained inside elite libraries.
02
Language
Scholarship lived in learned languages that most people could not access.
03
Institution
Universities, monasteries, and courts decided who could join the conversation at all.
Fatih Kansoy
University of Oxford
12
Section 02
The Old Gates
The Wall Starts To Crack
Breaking the Monopoly
As copying and distribution got cheaper, more people could read, learn, and contribute.
1450
Print
Books stop being aristocratic objects.
19th C.
Public Libraries
Reading moves beyond universities, courts, and monasteries.
1990s
Open Web
Papers and courses spread beyond the campus gate.
Now
AI Tools
AI lowers the cost of doing, not just reading.
A chained medieval book with a loose chain
Fatih Kansoy
University of Oxford
13
Section 02
The Old Gates
What changed
AI Makes Hard
Answers Faster
For a long time, hard questions stalled because the search itself was too expensive. AI lowers that cost.
Before
Decades
Hard questions often just waited for enough search, enough data, or enough compute.
With AI
Search
Models sharply lower the cost of exploring structures, proofs, code, simulations, and candidate answers.
Result
Faster
Questions that once looked immovable start to become tractable much sooner than before.
Case: AlphaFold
50 years
Protein-structure prediction remained a stubborn problem for half a century.
A long-stalled problem started to move.
AlphaFold showed what happens when models, data, and compute line up: a hard scientific search problem becomes dramatically more tractable.
Models Data Compute
Fatih Kansoy
University of Oxford
14
Section 02
The Old Gates
The next break in scarcity
Answers Come
Faster
Search, drafting, coding, translation, and analysis are suddenly available to far more people than before.
Search
Draft
Code
Translate
Explain
Test
The old gate falls again: capabilities once reserved for expert teams and institutions now sit on ordinary machines.
What remains scarce
Can it ask
the right
question?
If AI helps us answer faster and better, the frontier may move toward judgment: which problem is worth pursuing, framing, and insisting on.
Fatih Kansoy
University of Oxford
15
Section
03
03
From Assistance
To Automation
AI-assisted vs AI-generated The AI Scientist Milestones and limits
Fatih Kansoy
University of Oxford
08
Section 03
From Assistance to Automation
ANI, AGI, and ASI
Present
ANI
Artificial Narrow Intelligence
Built for bounded tasks. Fast, useful, and narrow.
Examples: ChatGPT, Siri, search, translation, recommendation.
Goal
AGI
Artificial General Intelligence
A system that could reason, learn, plan, and transfer knowledge across many domains.
Claim: not one workflow, but broad competence across tasks.
Hypothetical
ASI
Artificial Super Intelligence
A theoretical system that would exceed the best human minds across most or all relevant domains.
Status: speculative, debated, and not something that exists today.
Fatih Kansoy
University of Oxford
07
Section 03
From Assistance to Automation
AI Spectrum · 1 of 3
ANI
Present
ANI
Artificial Narrow Intelligence
Built for bounded tasks. Fast, useful, and narrow.
Examples: ChatGPT, Siri, search, translation, recommendation.
Today’s systems are already powerful, but they still operate as specialised tools, not general minds.
Fatih Kansoy
University of Oxford
00
Section 03
From Assistance to Automation
AI Spectrum · 2 of 3
AGI
Goal
AGI
Artificial General Intelligence
A system that could reason, learn, plan, and transfer knowledge across many domains.
Claim: not one workflow, but broad competence across tasks.
The jump from ANI to AGI is about breadth of competence, not just better output in one narrow workflow.
Fatih Kansoy
University of Oxford
00
Section 03
From Assistance to Automation
AI Spectrum · 3 of 3
ASI
Hypothetical
ASI
Artificial Super Intelligence
A theoretical system that would exceed the best human minds across most or all relevant domains.
Status: speculative, debated, and not something that exists today.
ASI is not the current reality. It is a speculative end-state in a very different debate.
Fatih Kansoy
University of Oxford
00
Section 03
From Assistance to Automation
AI-Assisted vs AI-Generated Papers
AI-Assisted
Human leads. AI helps.
A mature, human-led workflow where AI stays in a bounded supporting role.
The scholar still owns the argument, judgment, and final decisions.
vs
AIGPs
AI works. Human directs.
An experimental workflow where the system takes on substantial research labor.
The human behaves more like an advisor who critiques and redirects.
Main point: who is driving the project?
Fatih Kansoy
University of Oxford
08
Section 03
From Assistance to Automation
Can a robot do research?
Nature · 25 March 2026
A Nature paper says end-to-end AI research is now possible.
The paper presents The AI Scientist: a system that can generate ideas, write code, run experiments, draft the manuscript, and review its own output.
This is not a smarter chatbot. It is an agentic research pipeline.
Screenshot of the Nature article on end-to-end automation of AI research
25 March 2026
Lu et al. (2026), Towards end-to-end automation of AI research, Nature
nature.com/articles/s41586-026-10265-5
Fatih Kansoy
University of Oxford
09
Section 03
From Assistance to Automation
Automated research is a loop, not a prompt.
01
Idea Search
Find a question and check whether it is new.
02
Experiment
Edit code, run trials, and compare results.
03
Paper
Turn outputs into figures and write the manuscript.
04
Review
Score the draft and decide whether it is worth pushing.
The AI Scientist
One system
cycles through all four stages instead of answering a single prompt.
The point
The novelty is not better writing. It is a machine that can keep moving around the research loop.
Fatih Kansoy
University of Oxford
10
Section 03
From Assistance to Automation
Who does what?
What the AI does
Search. Build. Write.
Search
Generate candidate ideas and screen the literature for overlap.
Build
Edit the code, run trials, and convert outputs into figures.
Write
Draft the paper, add citations, and review the draft.
What the human still does
Choose. Filter. Approve.
Choose
Pick the domain template, starting scaffold, model, and constraints.
Filter
For the workshop test, humans manually filtered the most promising AI outputs at each stage.
Approve
Take responsibility for disclosure, withdrawal protocol, and submission.
Advisor logic, not chatbot logic.
Fatih Kansoy
University of Oxford
11
Section 03
From Assistance to Automation
What happened in the real test?
3
AI papers
entered workshop review
Blind
Human review
reviewers knew some submissions were AI-generated, but not which ones
1
Paper
cleared the workshop bar
Withdrawn
After review
the likely-accepted paper was pulled under the study protocol
Scores
6, 7, 6
6
Reviewer 1
7
Reviewer 2
6
Reviewer 3
6.33
Average
Top 45% of the 43 papers reviewed for the workshop.
Venue and result
At the I Can’t Believe It’s Not Better workshop at ICLR 2025, organizers said the strongest paper would likely have been accepted, but it was withdrawn by protocol because it was AI-generated. The accepted-style paper reported a negative result.
43 papers reviewed 70% workshop acceptance 32% ICLR main track
Fatih Kansoy
University of Oxford
12
Section 03
From Assistance to Automation
Why this matters beyond computer science
Research becomes pipeline
Work can be delegated.
Search, coding, drafting, and review can now be bundled and repeated inside one workflow.
Research turns into a process
Evaluation becomes scarce
Attention is the bottleneck.
If output gets cheaper, filtering, reading, and judgment become more valuable than before.
Review matters more
Institutions need labels
One norm is not enough.
AI-assisted work and AI-generated work cannot live under the same disclosure rules.
Journals need distinctions
For social scientists
Automation can change how research is organized long before it replaces theory.
Fatih Kansoy
University of Oxford
14
Section
04
04
From Chatbox
To Agents
The chatbox ceiling Skills, agents, pipelines The agentic research workflow
Fatih Kansoy
University of Oxford
16
Section 04
From Chatbox to Agents
The Fundamental Shift
A Chatbot Answers.
An Agent Acts.
CHATBOT (2022–2025) Human prompt LLM text → text answer Response No memory · No tools · No files · One-shot
AGENT (2026 →) LLM + Skills + Tools PLAN ACT VERIFY REFLECT files code search data Memory · Tools · Multi-step · Verify · Loop
Fatih Kansoy
University of Oxford
17
Section 04
From Chatbox to Agents
The Fundamental Shift · 1 of 2
A Chatbot Answers.
Chatbot (2022-2025)
01
Human
The human writes the prompt and manually decides the next step.
02
LLM
The model turns text into text.
03
Response
Useful output, but still a one-shot exchange.
No memory No tools No files One-shot
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
The Fundamental Shift · 2 of 2
An Agent Acts.
Agent (2026 →)
01
Plan
Sets the next task.
02
Act
Uses tools and files.
03
Verify
Checks whether the output works.
04
Reflect
Loops instead of stopping at one answer.
files code search data
The change is not just better answers. It is memory, tools, multi-step execution, verification, and looped work.
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
What Is a Skill?
A Skill is a plain-text instruction file that turns a general-purpose model into a specialist. It is not software. It is a repeatable protocol written in Markdown.
01
Trigger
When should the model activate this workflow? Which requests call it?
02
Modules
What ordered steps should it follow, and where should it pause to check the work?
03
References
Which checklists, style guides, examples, or files must be read before acting?
04
Constraints
What hard rules can never be broken?
Anatomy of a SKILL.md
name: academic-paper-review
trigger:
  - "review my paper"
  - "write a referee report"
modules:
  1. Referee report
  2. British English audit
  3. LLM cliche detection
  4. Style and repetition audit
reads_before_acting:
  - british-english-guide.md
  - llm-cliche-list.md
tools: [pdf_reader, semantic_search]
checkpoints: after module 1, after module 3
constraints: never fabricate citations; British English only
Plain Markdown. One reusable protocol. The model reads it before it acts.
A Skill turns "I have a smart chatbot" into "I have a specialist that follows a reproducible workflow."
Fatih Kansoy
University of Oxford
17
Section 04
Skill File
The Skill Definition
SKILL.md
academic-paper-review · 4 modules · 2 checkpoints
---
name: academic-paper-review
description: >
  Review economics and finance papers to publication-ready standard.
  Covers referee reports, British English, LLM cliche detection, and style auditing.

trigger:
  - "review my paper"
  - "full review"
  - "referee report"
  - "check British English"
  - "check for LLM cliches"
  - "style audit"
  - "proofread"

modules:
  1. Referee Report Simulation
  2. British English Audit
  3. LLM Cliche Detection
  4. Style & Repetition Audit

reads_before_acting:
  - references/british-english-guide.md
  - references/llm-cliche-list.md

tools:
  - semantic_scholar_api
  - pdf_reader

inputs:
  - manuscript (.tex or .pdf)

output: single consolidated report (LaTeX or Markdown)

checkpoints:
  - after Module 1 — confirm major comments before continuing
  - after Module 3 — review severity before style audit

constraints:
  - never fabricate citations, issues, or sources
  - british_english_only in all output text
  - no_em_dashes in suggested rewrites
  - verify_all_citations before including
---

# Academic Paper Review

A reusable, multi-module skill for reviewing economics and finance
research papers. Calibrated to top-tier journals.

## Module 1: Referee Report Simulation
Voice: Professor of economics serving as referee for a top journal.
CHECKPOINT: Pause. Present major comments for human review.

## Module 2: British English Audit
Read references/british-english-guide.md first.

## Module 3: LLM Cliche and AI-Voice Detection
Read references/llm-cliche-list.md first.
CHECKPOINT: Pause. Review severity before proceeding.

## Module 4: Vocabulary, Style and Repetition Audit
- Overused words
- Weak language
- Passive voice
- Repetition
- Register

## Output
Deliver one consolidated review document.
Scroll
Fatih Kansoy
University of Oxford
18
Section 04
From Chatbox to Agents
Academic Paper
Review Skill
One instruction launches the same review workflow every time. The model does not improvise from zero. It follows a fixed checking sequence.
01
Referee Report
Summarise the contribution, flag major weaknesses, and write the core review.
02
British English
Check spelling, punctuation, and consistency before the paper leaves your desk.
03
LLM Fingerprints
Catch generic phrasing, cliches, and stylistic tells that weaken the draft.
04
Style Audit
Check repetition, passive voice, and tonal drift across the manuscript.
One command
“Review my paper.”
A single request activates the full protocol.
The model loads references and starts the checks
The run
Read → Review → Audit → Merge
Four passes, one sequence, one structure.
One package comes back instead of scattered chat replies
The output
A structured review bundle
Referee report, language fixes, style warnings, and revision notes together.
The gain is not one clever review. It is consistent quality control you can run on every manuscript.
Fatih Kansoy
University of Oxford
18
Section 04
Reference File 1
Domain Knowledge File
british-english-guide.md
references/ · loaded before Module 2
# British English Guide for Academic Manuscripts

Flag every American English spelling, punctuation, or usage
in the manuscript.

## Spelling Rules

Pattern          American → British       Examples
─────────────────────────────────────────────────────────────
-ize → -ise      Always convert            analyse, characterise,
                                           standardise
-or → -our       Always convert            behaviour, labour,
                                           colour, rigour
-er → -re        Convert for units         centre, metre, theatre
-ense → -ence    Convert                   defence, licence (noun),
                                           offence
-ing doubles     Double the consonant      modelling, signalling,
                                           labelled, travelled

Other: grey (not gray), programme (not program, except software),
judgement, ageing, sceptic, towards.

## Punctuation

- Full stops and commas go outside quotation marks
  unless part of the quoted text.
- Use single quotation marks; double only for nested quotes.

## Dates

Write: 15 December 2015 (not December 15, 2015).

## Typical conversions

analyze → analyse
behavior → behaviour
labor → labour
organization → organisation
modeling → modelling
center → centre
program → programme

## Output expected from the agent

Return a table:
| Location | American Form | British Correction |

End with:
"X Americanisms found across Y categories."
Scroll
Fatih Kansoy
University of Oxford
19
Section 04
From Chatbox to Agents
From Skill to Specialist
What Is an Agent?
An Agent is a model reading a Skill with a named role. Same underlying model, different instructions, different job. It can keep context, use tools, follow steps, and stop at checkpoints.
LLM + SKILL.md + Role = Agent
A chatbot waits for the next prompt. An agent keeps working through a plan.
Same Model, Different Jobs
Reviewer Agent
Reads the paper-review skill and produces a structured referee report.
Data Agent
Downloads, merges, validates, and documents data through a fixed workflow.
Econometrics Agent
Runs models, robustness checks, tables, and figures until diagnostics are satisfied.
Writer Agent
Drafts, revises, integrates citations, and compiles the manuscript in the right style.
A Skill is the protocol. An Agent is the worker that follows it. Multiple agents can share one model family and still do different jobs.
Fatih Kansoy
University of Oxford
19
Section 04
From Chatbox to Agents
The Building Blocks
FOUNDATION MODEL LLM GPT · Claude · Llama General-purpose Knows everything, does nothing specific + SKILL SKILL SKILL.md A text file. Not software. trigger: "review this paper" inputs: manuscript.pdf, data/ steps: 1->2->3->checkpoint->4 verify: check DOIs, data, claims output: review_report.md Narrow task · Ordered procedure + ROLE AGENT LLM reading a Skill with a named role METHODOLOGY REVIEWER Checks design, stats, reproducibility DEVIL'S ADVOCATE Attacks assumptions, finds fallacies SOCRATIC MENTOR Asks questions, challenges thinking CODE WRITER Implements, debugs, runs experiments Same LLM, different instructions Like an actor changing costumes ORCHESTRATE PIPELINE The conductor 1 · RESEARCH 2 · EXPERIMENT 3 · WRITE 4 · VERIFY 5 · REVIEW 6 · REVISE + + + ! + + Human checkpoint after each stage Foundation Instruction Specialisation Orchestration LLM + SKILL.md = Agent | Agents + Pipeline = Research
Fatih Kansoy
University of Oxford
18
Section 04
From Chatbox to Agents
Building Blocks · 1 of 4
Foundation Model
General-purpose engine
LLM
This is the base model: broad capability, wide knowledge, but no narrow workflow or role by itself.
GPT Claude Llama
General-purpose rather than task-specific.
Useful across domains, but not yet organised for one repeatable job.
The model is the foundation, not the full workflow.
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
Building Blocks · 2 of 4
Skill File
SKILL.md
A Skill is a plain-text protocol. It tells the model when to activate, what inputs to expect, what sequence to follow, how to verify, and what output to return.
Trigger: what request should call this workflow.
Inputs: which files, folders, or materials must be read.
Steps: the ordered procedure and checkpoints.
Verify: what has to be checked before finishing.
Output: the artifact the run should produce.
Skill = repeatable instructions, not software.
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
Building Blocks · 3 of 4
Agent Role
Same LLM, different job
An agent is a model reading a Skill with a named role. The role changes what it pays attention to and how it behaves.
01
Methodology Reviewer
Checks design, statistics, and reproducibility.
02
Devil's Advocate
Attacks assumptions and finds weak logic.
03
Socratic Mentor
Challenges framing and forces clearer thinking.
04
Code Writer
Implements, debugs, runs, and iterates.
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
Building Blocks · 4 of 4
Pipeline Orchestration
From agents to research
01
Research
Search and rank the literature.
02
Experiment
Write code and explore branches.
03
Write
Draft the manuscript and figures.
04
Verify
Check data, claims, and outputs.
05
Review
Score the draft and decide next actions.
06
Revise
Loop until the work clears review.
The human checkpoint remains after each stage. The pipeline lowers labour cost; it does not remove judgment.
LLM + SKILL.md = Agent
Agents + Pipeline = Research
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
How It Works
Orchestration: One LLM Conducts Many
ORCHESTRATOR program.md assigns the next task checks every handoff 01 RESEARCH AGENT searches papers and tests novelty literature + ranking 02 EXPERIMENT AGENT writes code and explores branches code + run + branch 03 REVIEW AGENT scores the draft and makes a decision score + decide 04 INTEGRITY AGENT checks claims, evidence, consistency 05 WRITING AGENT turns results into figures and paper text draft + figures + cite
Fatih Kansoy
University of Oxford
15
Section 04
From Chatbox to Agents
Orchestration · 1 of 2
One LLM Conducts
Orchestrator
`program.md`
The orchestrator does not do every job itself. It assigns the next task, checks each handoff, and decides whether the workflow should continue, branch, or stop.
01
Assign
Send the next job to the right specialist.
02
Check
Read what came back at each handoff.
03
Loop
Keep the cycle moving until review clears the work.
Fatih Kansoy
University of Oxford
00
Section 04
From Chatbox to Agents
Orchestration · 2 of 2
The Specialist Agents
Who does the work
01
Research Agent
Searches papers and tests novelty.
02
Experiment Agent
Writes code and explores branches.
03
Review Agent
Scores the draft and recommends the next decision.
04
Integrity Agent
Checks claims, evidence, and consistency.
05
Writing Agent
Turns outputs into figures, prose, and cited text.
The orchestrator sits in the middle. Each worker returns artifacts, and the loop continues until the human checkpoint says the work is credible enough to proceed.
Fatih Kansoy
University of Oxford
00
Section 04
The Research Pipeline
The Human's New Role
8 Stages, End-to-End
A workflow can now run across the full research cycle. What stays scarce is not execution alone. It is judgment at each handoff.
At every stage
AI executes
Collect, clean, code, draft, rerun, and package the work.
You judge
Choose the question, the design, the interpretation, and the claim.
AI Executes
You Judge
1 ·Ideation
Getting an idea
Maps the literature and proposes candidate questions.
Decide which question is worth pursuing.
2 ·Design & Feasibility
Empirical strategy · identification
Checks data availability and sketches feasible designs.
Choose the identification strategy and credibility standard.
3 ·Data Assembly
Collecting · cleaning · merging
Collects, cleans, merges, translates, and validates the corpus.
Set inclusion rules and measurement choices.
4 ·Core Analysis
Main specifications · models
Runs the main specifications, models, figures, and tables.
Judge which result is central and which is noise.
5 ·Robustness & Extensions
Alt specs · heterogeneity
Runs alternative specifications, heterogeneity cuts, and stress tests.
Decide which checks actually test the claim.
6 ·Writing
Drafting · tables & figures
Drafts prose, tables, figures, appendices, and code notes.
Shape the argument and strength of the claim.
7 ·Submission & Review
R&R · revising · seminars
Prepares response memos, revisions, seminar notes, and new checks.
Choose which criticisms to concede or resist.
8 ·Publication
Final acceptance · dissemination
Formats the final package, archive, replication files, and dissemination assets.
Stand behind the published claim and its meaning.
Fatih Kansoy
University of Oxford
20
Section
05
05
From Questions
To Workflows
Old questions become startable API-scale research Two examples
Fatih Kansoy
University of Oxford
22
Section 05
Question Space
What changes in practice
The question existed. The workflow did not.
The real shift
Some questions were never impossible.
They were just too expensive to start.
The bottleneck was labour: collecting, cleaning, translating, reading, coding, checking, and repeating.
Too large
Whole archives can now be studied instead of tiny samples chosen only because humans could read them.
Too multilingual
Cross-country corpora stop being blocked by translation, formatting, and document heterogeneity.
Too repetitive
Classification, extraction, robustness checks, and draft revision can run in repeated loops, not one heroic pass.
Too costly to begin
Projects that sat for years because they demanded months of RA time can finally become startable.
AI does not supply the question. It lowers the fixed cost of asking it seriously.
Fatih Kansoy
University of Oxford
23
Section 05
Research Infrastructure
The practical shift
Leave the chat window.
Build a workflow.
2022 → 2026
Answering was the surprise.
Running the process is the real change.
Once the model sits behind an API, it can be called repeatedly, attached to files, and logged like any other research service. The important gain is repeatability.
Chatbox
API workflow
Unit of work
One exchangeA person asks, waits, and manually moves to the next step.
Whole corpusThe model can be called across thousands of documents or tasks.
Context
Short sessionContext is fragile and easily lost across tasks and files.
Files and logsFolders, prompts, outputs, and budgets become part of the record.
Labour
Copy-pasteThe researcher manually bridges collection, analysis, and drafting.
Repeated jobsCollection, reading, coding, and checking can be scripted and rerun.
Output
A responseUseful text, but weak as a reusable research artifact.
Research objectsDatasets, code, tables, figures, notes, and draft sections.
Fatih Kansoy
University of Oxford
24
Section 05
Research Workflow
General pattern
A serious project becomes a loop.
One prompt is not the unit of work. Research-grade use means building a corpus, making it comparable, analysing it, and then forcing another pass.
01
Build corpus
Pull speeches, tweets, PDFs, minutes, and metadata from archives, APIs, and websites.
OutputRaw archive
02
Make comparable
Clean, deduplicate, translate, and structure the material into something one design can actually use.
OutputResearch dataset
03
Read at scale
Classify, extract, label, and compare patterns across the full corpus rather than a hand-coded slice.
OutputMachine-readable evidence
04
Estimate
Run code, event studies, regressions, robustness checks, and produce tables and figures.
OutputResults
05
Review
Challenge claims, inspect code, rewrite weak passages, and make the argument survive another round.
OutputRevised draft
AI handles execution collect, clean, translate, classify, code, plot, and draft inside the loop.
The researcher stays central choose the question, the identification strategy, the interpretation, and the standard of proof.
Fatih Kansoy
University of Oxford
25
Section 05
Climate & Central Banking
Example 1
Do central banks take climate seriously?
A long-standing question became tractable.
The hard part was not the idea. It was joining speeches, minutes, languages, and verification into one comparable corpus. The execution barrier finally fell.
Speeches
37,790
132 central banks
Minutes
3,602
26 institutions
Languages
14
translated and compared
Workflow
Collect archives, translate 14 languages, verify real climate mentions, then estimate the gap between public speeches and private deliberation.
Climate mention rates in central bank speeches and minutes over time
Grouped climate commitment levels in central bank speeches and minutes
Fatih Kansoy
University of Oxford
26
Section 05
Bank of England & Twitter
Example 2
One Twitter archive became several papers.
The archive was there. The lab was not.
With LLM reading and faster coding loops, the same corpus could support questions about attention, format, timing, and communication design rather than simple description.
Official tweets
9,810
Bank of England posts
Public tweets
3.1m
wider conversation archive
Window
2011-2022
one reusable data build
What the workflow changed
01
Build the archive of official tweets, public tweets, metadata, and event windows.
02
Read the full record with LLMs instead of manually coding a small slice.
03
Move to empirical design with event studies, Poisson models, and communication-architecture tests.
Quarterly Twitter attention around Bank of England MPC communication design reform
Format
+181%
photo tweets; videos add +159%
Timing
+150%
attention premium on MPC days
Design
Shifted
bundling raised event-day salience but changed cumulative attention across the quarter
Fatih Kansoy
University of Oxford
27
Section
06
06
Practice:
From Question
to Paper
One research question. Three ways to do it.
Fatih Kansoy
University of Oxford
28
Section 06
Research Design
Case Paper
Does wealth composition shape how you live, work, and vote?
Rising house prices pushed young adults into portfolios dominated by illiquid housing wealth. The question is whether this compositional shift changes behaviour even when total wealth stays similar.
Profile A
House rich,
cash poor
Trapped by illiquidity. Harder to move, retrain, or absorb shocks.
$500k
equity
$10k
liquid
Profile B
Balanced,
strong liquidity
Same wealth class. Very different room to act and adjust.
$250k
equity
$260k
liquid
Data, all public
PSID
Wealth decomposition
psidonline.isr.umich.edu
CPS
Labour supply, income
census.gov/cps
CEX
Consumption, savings
bls.gov/cex
ANES
Political preferences
electionstudies.org
ACS
Housing, demographics
data.census.gov
FRED
FHFA HPI + mortgage rates
fred.stlouisfed.org
Six datasets, all free. The bottleneck was not access. It was the time and skill required to merge, model, and interpret them.
Fatih Kansoy
University of Oxford
29
Section 06
Three Approaches
Same Question. Three Workflows.
1 · Traditional
The Old Way
Formulate the question with colleagues
Check the data and learn the method
Do you have the Stata or R skills?
Merge six datasets and code the design
Interpret, write, and submit
Bottleneck: researcher time
6–18 months
2 · Chatbot
LLM at Each Step
Ask ChatGPT or Claude at each stage
Copy-paste code, run it, paste errors back
Context resets mid-analysis
Dozens of chats, no artifact chain
Hallucinated variables and fragile memory
Bottleneck: orchestration by hand
2–4 months
3 · Agentic
End-to-End System
Write a design file with question and strategy
Agent downloads, cleans, merges, and models
Runs robustness, figures, tables, and draft
Pauses after each stage for you
Revises, reruns, and compiles into artifacts
Bottleneck: judgment, not execution
1–3 weeks
The real shift: the agentic workflow preserves context, files, tools, checkpoints, and outputs across the entire research cycle.
Fatih Kansoy
University of Oxford
30
Section 06
Orchestration
The Pipeline for This Paper
01
Design Doc
Research question
Identification strategy
Data sources and URLs
Variable definitions
You write this
02
Data + Analysis
Download PSID, CPS, ACS, and FRED
Clean, merge, and structure the panel
Build IV, run first stage, check strength
Robustness checks and falsification tests
Agent iterates
03
Manuscript
Figures and tables
Draft each section
Citations and literature scaffolding
LaTeX and replication package
Draft, not final
04
Review
Is the instrument credible?
Does exclusion restriction hold?
Is the effect economically meaningful?
This part cannot be automated
You judge this
Phases 2 and 3 are cheap to automate. Phases 1 and 4 are where training, taste, and scientific judgment still matter most.
Fatih Kansoy
University of Oxford
31
Section 06
The Actual Prompt
End-to-End Agentic Research
The Master Prompt
361 lines · 7 stages · Claude Code / Codex
════════════════════════════════════════════════════════════════════
MASTER RESEARCH PROMPT — WEALTH COMPOSITION AND BEHAVIOUR
════════════════════════════════════════════════════════════════════
Use this prompt with Claude Code, OpenAI Codex, or another agentic coding tool.
Pause after each numbered stage for human review before proceeding.

You are a research agent building an economics paper from scratch. The project
produces a complete, submission-ready manuscript with replication package.

CRITICAL RULES (apply to every stage):
- British English throughout. No American spellings.
- No LLM cliches: no "delve", "crucial", "notably", "it is worth noting",
  "sheds light", "in the realm of", "a growing body of literature".
- No long em dashes. Use commas, semicolons, or full stops.
- Every citation must be real, verified, and correctly attributed.
- Every number in the paper must be traceable to a specific data operation.
- Figures: seaborn, colours = brickred (#9c302b), black (#171411),
  grey (#9f907b), navy (#2d5a8e). No annotations, no gridlines, no chartjunk.
- Tables: booktabs style, no vertical rules, minimal horizontal rules.
- Writing: argumentative, paragraph-based, no bullet points in the paper.
- Paper length: 5,000-7,000 words (excluding references and appendix).
- LaTeX: use natbib with aer.bst or similar author-year style.

════════════════════════════════════════════════════════════════════
STAGE 0 — PROJECT SETUP
════════════════════════════════════════════════════════════════════

Create the folder structure:
  wealth_composition/
  ├── paper/   ├── replication/   ├── scripts/
  ├── data/raw/ and data/derived/  ├── analysis/  ├── notes/
  ├── README.md  └── NEXT.md

Write a one-paragraph README and set NEXT.md to "Stage 1: Data collection."

════════════════════════════════════════════════════════════════════
STAGE 1 — DATA COLLECTION
════════════════════════════════════════════════════════════════════

Download public datasets and log every URL, access date, and file hash:
1. PSID — https://psidonline.isr.umich.edu
2. CPS ASEC — https://www.census.gov/programs-surveys/cps.html
3. CEX — https://www.bls.gov/cex/
4. ANES / CCES — https://electionstudies.org
5. ACS — https://data.census.gov
6. FRED — https://fred.stlouisfed.org

Track household wealth, home equity, labour supply, expenditure, politics,
home value, mortgage status, FHFA HPI, mortgage rates, and CPI deflators.

After downloading, update NEXT.md: "Stage 2: Data cleaning and merge."

PAUSE. Wait for human review of downloaded data before proceeding.

════════════════════════════════════════════════════════════════════
STAGE 2 — DATA CLEANING AND VARIABLE CONSTRUCTION
════════════════════════════════════════════════════════════════════

Write separate cleaning scripts:
  01_clean_psid.py
  02_clean_cps.py
  03_clean_fred.py
  04_clean_acs.py
  05_clean_cex.py
  06_clean_anes.py

Each script:
- Reads from data/raw/
- Writes cleaned parquet to data/derived/
- Prints N, means, and missingness
- Deflates all money variables to 2020 dollars

Construct key variables:
- total_wealth
- home_equity
- liquid_wealth
- housing_share = home_equity / total_wealth

Build the FRED exposure instrument:
- FHFA HPI by state and year
- annual mortgage rate
- entry_exposure = HPI(s,t*) × rate(t*)

Write 07_merge_panels.py:
- Merge PSID with the instrument on state × cohort
- Create the analysis sample: ages 25-40, positive total wealth
- Export analysis_panel.parquet

Also merge CPS, CEX, and ANES panels with the same state × cohort logic.
Print final sample sizes and update NEXT.md: "Stage 3: Descriptive analysis."

PAUSE. Wait for human review of cleaning scripts and sample sizes.

════════════════════════════════════════════════════════════════════
STAGE 3 — DESCRIPTIVE ANALYSIS AND FIGURES
════════════════════════════════════════════════════════════════════

Write 08_descriptives.py and save all outputs to paper/figures/ and paper/tables/.

Produce:
- Figure 1: housing share of wealth by birth cohort
- Figure 2: distribution of housing share
- Table 1: summary statistics by tercile
- Figure 3: first stage, entry exposure predicts housing share

Update NEXT.md: "Stage 4: Estimation."

PAUSE. Wait for human review of figures and descriptives.

════════════════════════════════════════════════════════════════════
STAGE 4 — ESTIMATION
════════════════════════════════════════════════════════════════════

Write 09_estimation.py. Use linearmodels or statsmodels for IV and cluster by state.

Main tables:
- Table 2: housing share and labour supply
- Table 3: first-stage diagnostics
- Table 4: housing share and consumption
- Table 5: housing share and political preferences
- Table 6: robustness and placebo tests

Run OLS and 2SLS, report F-statistics, and save all tables as .tex files.
Update NEXT.md: "Stage 5: Paper writing."

PAUSE. Wait for human review of all estimation results.

════════════════════════════════════════════════════════════════════
STAGE 5 — PAPER WRITING
════════════════════════════════════════════════════════════════════

Write paper/main.tex using natbib, booktabs, graphicx, amsmath, geometry,
setspace, and hyperref.

Title:
"Portfolio Composition and Behaviour: Housing Equity, Labour Supply,
Consumption, and Political Preferences Among Young Adults"

Sections:
1. Introduction
2. Related Literature
3. Data and Measurement
4. Empirical Strategy
5. Results
6. Discussion and Conclusion

WRITING CONSTRAINTS:
- British English only
- No bullet points in the paper
- No fake citations
- Reference every table and figure before it appears
- 40-60 real references

PAUSE. Wait for human review of the full draft.

════════════════════════════════════════════════════════════════════
STAGE 6 — COMPILE, VERIFY, AND PACKAGE
════════════════════════════════════════════════════════════════════

Compile the paper and fix all LaTeX errors.

Verification checklist:
- Trace every number to a table or script line
- Check every citation in text appears in bibliography.bib
- Confirm every figure and table file exists
- Confirm the word count is between 5,000 and 7,000

Build the replication package:
- copy scripts to replication/code/
- copy datasets to replication/data/
- copy figures and tables to replication/output/
- write replication/README.md

Update NEXT.md: "Complete. Ready for human review."

PAUSE. Present compiled PDF and verification report for final review.

════════════════════════════════════════════════════════════════════
END OF PROMPT
════════════════════════════════════════════════════════════════════
Scroll to read full prompt
Fatih Kansoy
University of Oxford
32
Section 06
Live Demo Result
From One Prompt
What the Agent
Actually Built
6
Datasets merged
28.5M
Rows cleaned
26,936
Analysis sample
21
Python scripts
7 + 6
Figures + tables
5,297
Words in paper
The Full Output
Data Pipeline
Downloaded PSID, CPS, ACS, FRED, and related files. Cleaned derived tables, built the state-by-cohort instrument, and merged the analysis panels.
Econometrics
Ran OLS and 2SLS, checked first-stage strength, produced labour, consumption, political, and robustness results.
Manuscript
Drafted the LaTeX paper, generated sections, figures, tables, and tied the numbers back to scripts.
Verification
Tracked quantitative claims, checked citations, and verified that reported outputs matched the actual artifacts.
One research prompt produced a full artifact chain. No manual data cleaning. No copy-paste coding. No manual formatting.
Fatih Kansoy
University of Oxford
32
Section 06
The Lesson
What the Paper Found
Strong Description.
Weak Identification.
The descriptive patterns are strong: balance-sheet composition clearly separates households that look similar in total wealth.
The paper documents large differences in liquidity, labour supply, and downstream outcomes across composition profiles.
But the instrument is weak. The first stage does not justify a strong causal claim.
Robustness checks show the design is fragile. The execution is solid, but the identification still fails.
The substantive lesson is not that the agent failed. It is that good execution cannot rescue a weak design.
The Teaching Point
The Agent Built
the Telescope.
What the agent did
Collected, cleaned, merged, estimated, wrote, and verified. It executed the research design with speed and consistency.
What the agent could not do
Know in advance whether the instrument would be persuasive. That requires theory, taste, and judgment about the data-generating process.
Why this is the real lesson
A positive result would show that the agent can execute. A negative result shows something deeper: judgment remains the binding constraint.
Execution depreciates. The harder and more valuable part is deciding whether the result deserves belief.
Fatih Kansoy
University of Oxford
33
Section 06
The Lesson
What Depreciates.
What Appreciates.
Depreciates ↓
Writing Stata, R, or Python line by line
Cleaning and merging datasets manually
Implementing a known estimator
Formatting the draft for journal style
Producing standard tables and figures
Appreciates ↑
Knowing which questions matter
Taste for a credible research design
Recognising when data is misleading you
Judging whether results are meaningful
Deep institutional and substantive knowledge
The takeaway Execution gets cheaper. Judgment does not.
Fatih Kansoy
University of Oxford
32
Closing
The Research Crisis
Nature 2023
Papers and patents are becoming less disruptive.
Park, Leahey, and Funk study 45 million papers and 3.9 million patents. Across fields, the disruption index falls sharply over time: science is producing more, but breaking less.
45M
Papers
analysed across the sciences
3.9M
Patents
tracked over time
92-100%
Decline
in disruption across fields
This is the backdrop for everything else in the talk. AI arrives in a world where execution scales quickly, but truly novel ideas are already getting harder to produce.
Park, Leahey & Funk (2023) · Nature
Papers and patents are becoming less disruptive over time
Execution is getting easier. Novel ideas are getting harder.
Use the arrows or dots to switch between the paper-side and patent-side figures in the same frame.
Fatih Kansoy
University of Oxford
33
Closing
The Scarce Input
The Scarce Input
Information became abundant.
Execution is becoming abundant.
The novel question
remains scarce.
AI lowers the cost of doing. It does not decide what is worth doing.
Fatih Kansoy
University of Oxford
19