The Machine Learning Summer School (MLSS) series, established in 2002, aims to advance knowledge in modern statistical machine learning and inference. Recognizing the growing demand for machine learning expertise, MLSS bridges the gap for students and researchers by providing intensive courses on topics ranging from fundamentals to state-of-the-art methods.
We are delighted to bring this globally renowned series to West Africa for the first time, with Senegal as our host country. MLSS Senegal 2025 offers PhD students and advanced learners a unique opportunity to engage with top experts and explore cutting-edge techniques in machine learning.
Adji Bousso Dieng
Princeton
Vendi Scoring For Science and Machine Learning
In three lectures, we'll explore Vendi Scoring, a new computational framework for addressing challenges in machine learning and the sciences. We'll introduce the Vendi scores, a flexible family of diversity metrics. You'll learn how to leverage these scores to evaluate generative models in terms of diversity, quality, duplication, and memorization. We'll also study algorithms for analyzing large data collections, along with new techniques for Bayesian experimental design, sampling, and retrieval-augmented generation with large language models. Throughout the lectures, we'll not only cover practical techniques but also address foundational scientific questions, such as how to discover drugs and materials with desired properties and how to accelerate simulations.
Ashia Wilson
MIT
Optimization for Machine LearningOverview of optimization for ML: first-order methods, stochastic approximation, convergence and generalization perspectives, and practical considerations for modern large-scale training.
Elvis Dohmatob
Concordia University / Meta / Mila
Mathematical Foundations of Neural Scaling Laws
Neural scaling laws describe how the irreducible test error of large language models decays with power-law trends in model size, dataset size and compute budget. First outlined empirically by Kaplan et al. in 2020, these empirical relationships now guide the design of systems such as ChatGPT, Llama and Gemini. Yet, the underlying reasons for the emergence of this power-law behavior are still not generally understood.
In this series of lectures, we derive neural scaling laws in prototypical settings: associative memory models, regression in high dimensions, multitask sparse parity problem, etc. We apply standard tools like mean-field analysis, random matrix analysis, large-deviation methods to expose the mathematical mechanisms behind power-law scaling and to establish precise error estimates. The tools you acquire will generalize to a wide range of theoretical problems in machine learning.
Franca Hoffmann
Caltech
Foundations of Measure Transport & Applications
Measure transport is a rich mathematical topic at the intersection of analysis, probability and optimization. The core idea behind this theory is to rearrange the mass of a reference measure to match a target measure. In particular, optimal transport seeks a rearrangement that transports mass with minimal cost. In recent years, measure transport has become an indispensable tool for representing probability distributions and for defining measures of similarity between distributions. Characterizing probability distributions is essential for describing uncertainty and lies at the heart of machine learning, and decision-making. A flexible toolbox of methods to approximate complex probability distributions from data are generative models. These models describe distributions by learning transformations of simple reference distributions that are easy to sample from, such as a standard Gaussian. Recent years have seen an explosion in generative modeling techniques to create realistic-looking images and to tackle complex scientific problems such as drug discovery and numerical weather prediction.
This course will first introduce the foundations of measure transport, building on the MLSS course “Regularized and Neural Approaches to Solve Optimal Transport” by Marco Cuturi, and then present its connections and applications in various fields, including probabilistic sampling and generative modeling. Students will have an opportunity to implement these approaches and to observe their advantages and limitations using numerical experiments. Lastly, we will provide a brief overview of active research topics with the goal of motivating additional research and applications of this proliferating field.
Gergely Neu
Universitat Pompeu Fabra
Reinforcement Learning
Reinforcement learning is one of the most important frameworks for formalizing and solving sequential decision-making problems under uncertainty. This tutorial will give an introduction into the subject, reviewing the key fundamental concepts necessary for formulating the RL problem (i.e., Markov decision processes and dynamic programming), classic algorithms based on stochastic approximation (e.g., TD methods, Q-learning), approximate dynamic programming methods (e.g., least-squares TD, deep Q networks), and policy optimization methods (e.g., policy gradients, trust-region policy optimization). The focus will be on the foundations underlying these methods, providing a principled justification for each of them, and connections with the practice of modern RL will be highlighted.
Jean-Philippe Vert
Owkin
Foundation Models for Biology
Foundation models were initially developed to capture the essence of data that humans understand, such as human language and images, but they are also a remarkable tool to capture the essence of data that humans barely understand, such as natural data in many scientific domains. In this course I will introduce the audience to the techniques and applications of foundation models in biology, including how they shed new light on how proteins fold, how cells work and how tissues are organized in normal and disease states.
Marco Cuturi
Apple
Regularized and Neural Approaches to Solve Optimal Transport
This lecture series explores Optimal Transport (OT) with a focus on regularized and neural methods. It is divided into four parts that take you from basic ideas to modern computational techniques.
Part 1: Warm-up: starting with optimal matchings OT problems as extensions / generalizations of CS.101 optimal matchings
Part 2: Kantorovich formulation of OT, discrete computations
Part 3: Monge formulation, duality, Gangbo-McCann-Brenier theorems, ICNNs
Part 4: Dynamic formulation, Benamou Bernier, flow matching
Marieme Ngom
Argonne National Laboratory
Parallel and Parallel and Distributed Deep Learning
As machine learning (ML) models and datasets continue to increase in size and complexity, effective parallelization strategies become essential for modern and production-level AI development.
This two-part tutorial aims at giving a comprehensive and hands-on overview of current parallel/distributed deep learning methods and strategies.
Part I: The first session introduces the fundamental concepts and techniques of parallel/distributed deep learning. We will cover key scaling strategies including data, model, and pipeline parallelism, with a focus on practical applications for large-scale models like large language models (LLMs). We will also briefly present current supercomputers and AI testbeds which are essential to running models at scale. Participants will gain a clear understanding of when and how to apply different parallelization approaches based on model architecture, hardware constraints, and performance requirements.
Part II: The second session provides a hands-on exploration of parallel training implementation. Participants will learn how to scale deep learning workloads across multiple GPUs. Through guided PyTorch-based examples, we\'ll demonstrate practical considerations for distributed training, including configuration, optimization, and troubleshooting.
Mathieu Blondel
Google DeepMind
Differentiable Programming
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible.
In this lecture, I will review the fundamentals (differentiation, probabilistic learning), differentiable programs (parameterized programs, control flows), differentiating through programs (finite differences, automatic differentiation, differentiating through optimization, differentiating through integration) and smoothing programs (by optimization and by integration).
Minhyuk Sung
KAIST
Diffusion Models
Recent breakthroughs in image and video generative models have amazed people with their unprecedented quality, as demonstrated by models such as Stable Diffusion, Midjourney, and FLUX for images, and SORA and Veo2 for videos. These advancements are powered by diffusion and flow models, which have become the standard techniques for generative modeling. Diffusion and flow models offer numerous advantages, including superior output quality and strong capabilities in conditional generation, personalization, zero-shot manipulation, and knowledge distillation.
In these lectures, we will explore both the theoretical foundations and practical applications of diffusion and flow models. You will engage in hands-on mathematical problem solving to deepen your understanding of the underlying theory and learn about real-world applications. Bring your pen and paper!
Natalie Schluter
Apple
Natural Language Processing
In this course, we take a survey of topics in NLP. We consider a handful of central tracks of a main conference venue in the field (i.e., ACL and EMNLP), and explore a selection of the technical foundations behind them (across computer science, mathematics, and linguistics). In the final lecture, we deep dive into a recent paper awarded best paper, to develop an understanding of the work itself, as well as an understanding for the work that goes into such a paper.
Peter Richtarik
KAUST
Federated LearningPrinciples and algorithms for federated learning: communication-efficient optimization, personalization, privacy considerations, and convergence trade-offs.
Samory Kpotufe
Columbia
Vignettes in Learning Theory: Performance Guarantees in i.i.d. and non-i.i.d. Settings
We will start with the basics of statistical learning theory: that is, classical abstractions of learning procedures for classification and regression in i.i.d. settings and the main theoretical insights on the performance on such procedures. In particular, while certain details may be omitted for the sake of time, we hope to drive home the main intuition behind complexity measures such as VC, covering numbers, noise conditions, etc., i.e., quantities that drive the hardness/feasibility of learning problems.
We will then discuss guarantees for modern non-i.i.d. settings such as transfer learning, meta learning, etc., where the training data is drawn from multiple distributions. Guarantees in these latter settings naturally build on those from classical i.i.d. settings, but with additional measures of hardness/feasibility.
Prerequisites are basic knowledge of probability theory, some multi-dimensional calculus, and linear algebra (e.g., symmetric matrices and their spectral decompositions).
Sanmi Koyejo
Stanford
Machine Learning from Human Preferences
Machine learning from human preferences investigates mechanisms for capturing human and societal preferences and values in artificial intelligence (AI) systems and applications, e.g., for socio-technical applications such as algorithmic fairness and many language and robotics tasks when reward functions are otherwise challenging to specify quantitatively. While learning from human preferences has emerged as an increasingly important component of modern AI, e.g., credited with advancing the state of the art in language modeling and reinforcement learning, existing approaches are largely reinvented independently in each subfield, with limited connections drawn among them.
This course will cover the foundations of learning from human preferences from first principles and outline connections to the growing literature on the topic. This includes but is not limited to:
Inverse reinforcement learning, which uses human preferences to specify the reinforcement learning reward function
Metric elicitation, which uses human preferences to specify tradeoffs for cost-sensitive classification
Reinforcement learning from human feedback, where human preferences are used to align a pre-trained language model
| ⏰ |
June 23 | June 24 | June 25 | June 26 | June 27 |
|---|---|---|---|---|---|
| 09:00– 10:30 |
Opening & Mixer |
Natural Language Processing
Natalie Schluter
(2/3)
|
Mathematical Foundations of
Elvis Dohmatob
(1/3)
Neural Scaling Laws |
Mathematical Foundations of
Elvis Dohmatob
(2/3)
Neural Scaling Laws |
Reinforcement Learning
Gergely Neu
(3/3)
|
| Coffee Break | |||||
| 11:00– 12:30 |
Vendi Scoring For Science and Machine Learning
Adji Bousso Dieng
(1/3)
|
Optimization for Machine Learning
Ashia Wilson
(2/3)
|
Natural Language Processing
Natalie Schluter
(3/3)
|
Reinforcement Learning
Gergely Neu
(1/3)
|
Mathematical Foundations of
Elvis Dohmatob
(3/3)
Neural Scaling Laws |
| Lunch Break | |||||
| 14:00– 15:30 |
Optimization for Machine Learning
Ashia Wilson
(1/3)
|
Vendi Scoring For Science and Machine Learning
Adji Bousso Dieng
(2/3)
|
Optimization for Machine Learning
Ashia Wilson
(3/3)
|
Reinforcement Learning
Gergely Neu
(2/3)
|
Differentiable Programming
Mathieu Blondel
(2/3)
|
| Coffee Break | |||||
| 16:00– 17:30 |
Natural Language Processing
Natalie Schluter
(1/3)
|
Parallel and Distributed Deep Learning
Marieme Ngom
(1/2)
|
Parallel and Distributed Deep Learning
Marieme Ngom
(2/2)
|
Differentiable Programming
Mathieu Blondel
(1/3)
|
Differentiable Programming
Mathieu Blondel
(3/3)
|
| 18:00– 19:30 |
Apéro Poster |
Vendi Scoring For Science and Machine Learning
Adji Bousso Dieng
(3/3)
|
|||
|
⏰
|
Saturday (June 28) | Sunday (June 29) |
|
|
09:00—
17:00 |
Free time for
|
Leisure Activities
|
|
|
18:00
— 22:00 — 26:00 — ⏰⏰⏰ |
Dakar by Night
Sunset + Dinner + Lost in Town |
Free time
|
| ⏰ |
June 30 | July 1 | July 2 | July 3 | July 4 |
|---|---|---|---|---|---|
| 09:00– 10:30 |
Regularized and Neural Approaches to Solve Optimal Transport
Marco Cuturi
(1/3)
|
Foundation Models for Biology
Jean-Philippe Vert
(3/3)
|
Vignettes in Learning Theory: Performance Guarantees in i.i.d. and non-i.i.d. Settings
Samory Kpotufe
(3/3)
|
Diffusion Models
Minhyuk Sung
(1/2)
|
Diffusion Models
Minhyuk Sung
(2/2)
|
| Coffee Break | |||||
| 11:00– 12:30 |
Regularized and Neural Approaches to Solve Optimal Transport
Marco Cuturi
(2/3)
|
Vignettes in Learning Theory: Performance Guarantees in i.i.d. and non-i.i.d. Settings
Samory Kpotufe
(1/3)
|
Federated Learning
Peter Richtárik
(1/3)
|
Machine Learning from Human Preferences
Sanmi Koyejo
(2/3)
|
Foundations of Measure Transport & Applications
Franca Hoffmann, Ricardo Baptista
(2/3)
|
| Lunch Break | |||||
| 14:00– 15:30 |
Foundation Models for Biology
Jean-Philippe Vert
(1/3)
|
Vignettes in Learning Theory: Performance Guarantees in i.i.d. and non-i.i.d. Settings
Samory Kpotufe
(2/3)
|
Federated Learning
Peter Richtárik
(2/3)
|
Machine Learning from Human Preferences
Sanmi Koyejo
(3/3)
|
Foundations of Measure Transport & Applications
Franca Hoffmann, Ricardo Baptista
(3/3)
|
| Coffee Break | |||||
| 16:00– 17:30 |
Foundation Models for Biology
Jean-Philippe Vert
(2/3)
|
Regularized and Neural Approaches to Solve Optimal Transport
Marco Cuturi
(3/3)
|
Machine Learning from Human Preferences
Sanmi Koyejo
(1/3)
|
Foundations of Measure Transport & Applications
Franca Hoffmann, Ricardo Baptista
(1/3)
|
Federated Learning
Peter Richtárik
(3/3)
|
| 18:00– 19:30 |
Apéro Poster | Closing Party | |||
Application Period:
December 15 – February 28 →
New deadline: March 14, 2025 (AoE)
Notification of Acceptance:
March 31 →
New deadline: April 4, 2025 (AoE)
📬 Please check your spam folder. If you haven’t received an email, feel free to contact us directly.
Fees: $450 due on May 9 (AOE)
The application process is open to PhD students, post-docs, and other advanced learners. Applicants are required to submit a one-page self-recommendation letter explaining how the summer school will benefit them, along with a one-page CV summarizing their contributions in machine learning.
Applications must be submitted through the Application Form .
Dates: June 23 - July 4
Location: The African Institute for Mathematical Sciences AIMS-Senegal
AIMS Mbour is a scenic location situated along the Atlantic coast of Senegal. The venue is approximately a 1.5-hour drive from Dakar city center and around 30 kilometers from Blaise Diagne International Airport, offering convenient access for international participants.
Hepatitis A: A single dose should be administered at least 15 days before departure, with a booster dose recommended 1 to 3 (or up to 5) years later. For children, vaccination is available from the age of 1 year.
Yellow Fever: A single dose must be administered at least 10 days before travel. For children, vaccination is available from 9 months of age (or in specific circumstances between 6 and 9 months).
For more information, visit: Road to Senegal
If you’re interested in contributing to MLSS Senegal 2025, we welcome your support and ideas. Together, we aim to create a transformative learning experience for all participants. We are also deeply grateful for the invaluable support of local contributors, international advisors, and volunteers whose efforts ensure the success of this event. A full list of contributors and collaborators will be shared as the event approaches.
Awa Dieng
DeepMind
Bamba Diouf
Amazon
Derguene Mbaye
Cheikh Anta Diop University
Mouhamadane Mboup
Galsen AI
Ismaila Seck
Lengo AI
Solym Manou-Abi
University of Poitiers
Ndeye Aissatou Dacosta
AIMS Senegal
Ibrahima Saliou Samba
AIMS Senegal
Mohamed Lamine Diallo
AIMS Senegal
Jeanne Madelaine Seck
AIMS Senegal
Boubacar Fall
AIMS Senegal
Serge Pacome Goudiaby
AIMS Senegal
Coura Balde
AIMS Senegal
Adil Salim
Microsoft
Eugene Ndiaye
Apple