Oxford Certificate Programmes · Worcester College

AI and Machine Learning

Machine learning lets computers find patterns in data and turn them into predictions and decisions. A rigorous, hands-on introduction organised around the full workflow — framing a decision, preparing data, fitting models, evaluating them honestly, and using them responsibly.

InstructorDr Fatih Kansoy
SessionsSummer I, II & III
Dates19 Jul – 29 Aug 2026
Course weekWeek One
LocationWorcester College, Oxford
FormatLectures, seminars & Python labs
LengthTwo-week programme
AssessmentFriday assessment

Course overview

Students will see how supervised models — linear and logistic regression, decision trees, random forests, and boosting — are built and judged; how unsupervised methods such as clustering and principal component analysis reveal structure without labels; and how modern AI systems — embeddings, retrieval, and generative tools — sit alongside classical methods.

Throughout, the course emphasises honest evaluation, the danger of data leakage, the difference between prediction and causation, and the fairness, interpretability, and governance questions that responsible deployment requires. Through worked examples, Python labs, and the discussion of real decisions, students develop the analytical tools to choose appropriate models, evaluate them with the right metrics, audit them for leakage and bias, and communicate both their conclusions and their limitations clearly.

Learning outcomes

Teaching & assessment

Teaching method. Students are taught according to the Oxford Socratic model, where class participation is central. Teaching combines lectures, guided discussion, hands-on Python labs, and group work in and outside class. No prior programming experience is assumed, though it is welcome.

Assessment. Assessment takes place on Friday at the end of the course.

Weekly schedule

DayTopicFocus
MondayFoundations: data, prediction, and trustWhat machine learning is and is not; features and targets; loss; the train/test split; metrics; leakage; and why prediction is not causation.
TuesdayRegression and classification: from models to decisionsLinear and logistic regression; coefficients and uncertainty; turning predicted probabilities into actions with cost-based thresholds.
WednesdayFlexible models and honest evaluationDecision trees, random forests, and boosting; cross-validation; precision, recall, ROC and PR curves; and auditing a model for data leakage.
ThursdayUnsupervised learning, modern AI, and responsible useK-means clustering and PCA; embeddings and retrieval; and fairness, interpretability, monitoring, and governance.
FridayAssessmentEnd-of-course assessment.

Session overview

Session 1

Foundations of Machine Learning

This session sets up the workflow we reuse all week: turning a decision into a prediction problem, separating signal from noise, splitting data honestly, and choosing a loss. We stress what is known before a decision is made, and why data leakage and the prediction–causation gap matter from the very start.

Session 2

Regression and Classification

This session covers the baseline supervised models — linear regression for numbers and logistic regression for probabilities. We discuss coefficients, uncertainty, and how a predicted probability becomes an action through a cost-based threshold rather than a default cut-off.

Session 3

Flexible Models and Honest Evaluation

This session introduces decision trees, random forests, and boosting, together with the bias–variance trade-off that controls overfitting. The added flexibility is paired with stricter evaluation: cross-validation, the right metric for imbalanced problems, and a disciplined leakage audit.

Session 4

Unsupervised Learning and Responsible AI

This session moves from prediction to structure discovery with K-means clustering and PCA, connects classical tools to modern AI through embeddings and retrieval, and closes with the fairness, interpretability, monitoring, and governance questions that responsible deployment demands.

Core bibliography & reading list

All items below are freely and publicly available online.

  1. James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor. An Introduction to Statistical Learning with Applications in Python. Springer, 2023. statlearning.com
  2. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. 2nd ed. Springer, 2009. hastie.su.domains
  3. Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. Mathematics for Machine Learning. Cambridge University Press, 2020. mml-book.github.io
  4. scikit-learn developers. scikit-learn User Guide. scikit-learn.org
  5. VanderPlas, Jake. Python Data Science Handbook. 2nd ed. O'Reilly, 2022. jakevdp.github.io
  6. Google. Machine Learning Crash Course. developers.google.com
  7. Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. deeplearningbook.org
  8. Sanderson, Grant (3Blue1Brown). Neural Networks (visual video series). 3blue1brown.com
  9. Molnar, Christoph. Interpretable Machine Learning. 2nd ed., 2022. christophm.github.io
  10. National Institute of Standards and Technology (NIST). Artificial Intelligence Risk Management Framework (AI RMF 1.0). 2023. nist.gov