Oxford Certificate Programmes · Worcester College
Machine learning lets computers find patterns in data and turn them into predictions and decisions. A rigorous, hands-on introduction organised around the full workflow — framing a decision, preparing data, fitting models, evaluating them honestly, and using them responsibly.
Students will see how supervised models — linear and logistic regression, decision trees, random forests, and boosting — are built and judged; how unsupervised methods such as clustering and principal component analysis reveal structure without labels; and how modern AI systems — embeddings, retrieval, and generative tools — sit alongside classical methods.
Throughout, the course emphasises honest evaluation, the danger of data leakage, the difference between prediction and causation, and the fairness, interpretability, and governance questions that responsible deployment requires. Through worked examples, Python labs, and the discussion of real decisions, students develop the analytical tools to choose appropriate models, evaluate them with the right metrics, audit them for leakage and bias, and communicate both their conclusions and their limitations clearly.
Teaching method. Students are taught according to the Oxford Socratic model, where class participation is central. Teaching combines lectures, guided discussion, hands-on Python labs, and group work in and outside class. No prior programming experience is assumed, though it is welcome.
Assessment. Assessment takes place on Friday at the end of the course.
| Day | Topic | Focus |
|---|---|---|
| Monday | Foundations: data, prediction, and trust | What machine learning is and is not; features and targets; loss; the train/test split; metrics; leakage; and why prediction is not causation. |
| Tuesday | Regression and classification: from models to decisions | Linear and logistic regression; coefficients and uncertainty; turning predicted probabilities into actions with cost-based thresholds. |
| Wednesday | Flexible models and honest evaluation | Decision trees, random forests, and boosting; cross-validation; precision, recall, ROC and PR curves; and auditing a model for data leakage. |
| Thursday | Unsupervised learning, modern AI, and responsible use | K-means clustering and PCA; embeddings and retrieval; and fairness, interpretability, monitoring, and governance. |
| Friday | Assessment | End-of-course assessment. |
This session sets up the workflow we reuse all week: turning a decision into a prediction problem, separating signal from noise, splitting data honestly, and choosing a loss. We stress what is known before a decision is made, and why data leakage and the prediction–causation gap matter from the very start.
This session covers the baseline supervised models — linear regression for numbers and logistic regression for probabilities. We discuss coefficients, uncertainty, and how a predicted probability becomes an action through a cost-based threshold rather than a default cut-off.
This session introduces decision trees, random forests, and boosting, together with the bias–variance trade-off that controls overfitting. The added flexibility is paired with stricter evaluation: cross-validation, the right metric for imbalanced problems, and a disciplined leakage audit.
This session moves from prediction to structure discovery with K-means clustering and PCA, connects classical tools to modern AI through embeddings and retrieval, and closes with the fairness, interpretability, monitoring, and governance questions that responsible deployment demands.
All items below are freely and publicly available online.