CDS DS 595

AI Methods for Science

Boston University · Spring 2026

About the Course

AI methods are increasingly central to how science gets done, spanning simulation, experiment, theory, and observation. This course aims to equip students with the methods to understand and carry out research at the intersection of AI and the natural sciences. Topics include probabilistic inference, neural networks that encode physical symmetries and domain knowledge, generative models for scientific data, and simulation-based inference. While framed in terms of scientific applications, the methods discussed extend well beyond scientific research, with broad applicability across industry and general AI R&D.

A major focus of the course is on large language models and their emerging role in science. As LLMs become more capable of scientific reasoning and operating autonomously, understanding how to evaluate, adapt, and collaborate with these systems is becoming essential. We explore what it means to work alongside AI scientists, and how to critically assess their capabilities as well as limitations.

Applications are drawn from domains including physics, materials science, and biology. The course involves two assignments emphasizing method design and critical analysis in collaboration with AI tools, plus two projects: a midterm applying AI methods to a scientific problem, and a final project finetuning an LLM to elicit a scientific capability.

Learning Objectives

By the end of this course, students will be able to:

Apply probabilistic inference and sampling methods (e.g., MCMC) to scientific problems
Design neural networks that encode scientific domain knowledge
Train generative models (e.g., diffusion models) to emulate scientific data distributions
Use simulation-based inference to connect simulators with observations
Evaluate the edges and limitations of LLM capabilities for scientific reasoning
Develop intuitions for how to collaborate effectively with AI systems on research tasks
Read and understand AI-for-science research papers

Logistics

Lecture: Mon/Wed 12:20–1:35pm, CAS 218
Discussion: Tue 11:15am–12:05pm, MUG 205
Instructor: Siddharth Mishra-Sharma (smishras@bu.edu)
TF: Wanli Cheng (cwl1997@bu.edu)
Office Hours: Tue 3–5pm or by appointment, CDS 1528
TF Office Hours: Mon/Wed 2:15–3:15pm or by appointment, CDS 14th floor Green Corner

Resources

Syllabus: PDF
Discussion: Ed Discussion
Assignment/lab submission: GitHub Classroom
Computing: GPU access through the Shared Computing Cluster (SCC) and LLM API/finetuning credits will be provided for project work
Coding agents: Getting free subscriptions

There is no required textbook. Many readings reference Understanding Deep Learning by Simon J.D. Prince (MIT Press, 2023); the PDF is available on the website. Other readings are drawn from research papers and online resources.

Schedule and Optional Reading

This schedule will change as the course progresses.

Week 1

Wed Jan 21

Science in the Era of Computation

Course intro and logistics; Historical and philosophical perspectives

Slides

Week 2

Mon Jan 26

Reasoning Under Uncertainty

Bayesian inference; Fitting a model to data; Model selection

Wed Jan 28

Framing Scientific Problems as ML Tasks

Classification, regression, inference, generation, compression, anomaly detection, ...

Tue Jan 27

Lab 1: JAX and Bayesian Inference

Starter

Week 3

Mon Feb 2

Learning by Sampling and Optimization

MCMC; Monte Carlo methods; Variational inference

Wed Feb 4

Building Blocks of Learned Representations

Neural networks primer

Assignment 1 out Starter

Tue Feb 3

Lab 2: Hamiltonian Monte Carlo

Starter

Week 4

Mon Feb 9

Encoding Scientific Structure in Neural Networks I

How data and theory inform NN architectures; CNNs

Wed Feb 11

Encoding Scientific Structure in Neural Networks II

Graphs and locality; GNNs; Sequence and time-series models

Tue Feb 10

Lab 3: Training Neural Networks

Starter

Week 5

—

Mon Feb 16

No class — Presidents' Day

Tue Feb 17

Encoding Scientific Structure in Neural Networks III

Symmetry-preserving neural networks

Wed Feb 18

Learning Distributions from Data I

Density estimation; Latent variable models; VAEs

Assignment 1 due; Assignment 2 out Starter

Week 6

L10

Mon Feb 23

Cancelled — Snowstorm

L11

Wed Feb 25

Learning Distributions from Data II

Diffusion

Tue Feb 24

Lab 4: Variational Autoencoders

Starter

Week 7

L12

Mon Mar 2

Guest Lecture: Ameya Daigavane

AI + bio

L13

Wed Mar 4

Learning Distributions from Data III

Flow matching; applications

Roelants, Flow Matching Intro

Assignment 2 due; Assignment 3 out

Tue Mar 3

Lab 5: Diffusion

Spring Recess

—

Mar 7–15

Spring Recess — No classes

Week 8

L14

Mon Mar 16

Differentiating Through Scientific Simulators

Differentiable programming

L15

Wed Mar 18

Learning Through Exploration

Reinforcement learning and search

Tue Mar 17

Lab 6: Differentiable Programming

Week 9

L16

Mon Mar 23

Inverting Simulators I

Simulation-based inference

Cranmer+, Frontier of SBI

L17

Wed Mar 25

Inverting Simulators II

Simulation-based inference; applications to physics and cosmology

Cranmer+, Frontier of SBI

Assignment 3 due

Tue Mar 24

Lab 7: Reinforcement Learning

Week 10

L18

Mon Mar 30

From Specialized to General Intelligence; Scaling

From task-specific, to domain-specific, to general scientific agents

Final project out

L19

Wed Apr 1

Guest Lecture: Gaia Grosso

AI + particle collider physics

Tue Mar 31

Lab 8: Simulation-Based Inference

Week 11

L20

Mon Apr 6

Quantifying and Predicting LLM Scientific Capabilities

Evaluations and forecasting for scientific R&D tasks

L21

Wed Apr 8

LLM Building Blocks

Attention; Transformers; Compute

Tue Apr 7

Final project work

Week 12

L22

Mon Apr 13

Teaching LLMs to Science

Training and eliciting scientific capabilities

L23

Wed Apr 15

Learning Unified Representations Across Scientific Modalities

Foundation models for science

Choi+, Defining FMs for Computational Science

Final proposal due Fri Apr 17

D10

Tue Apr 14

Final project work

Week 13

—

Mon Apr 20

No class — Patriots' Day

L24

Wed Apr 22

Frontiers

Cutting-edge topic chosen by class

D11

Tue Apr 21

Final project work

Week 14

L25

Mon Apr 27

Being a Human Scientist

Research in an AI-driven scientific landscape

L26

Wed Apr 29

Final Project Presentations

D12

Tue Apr 28

Final project work

Finals

—

Mon May 4

Finals — No exam

Final project due

Discussion Sections

Tuesdays 11:15am–12:05pm in MUG 205.

Week	Date	Topic	Notes
—	Tue Jan 20	No discussion	First day of classes
2	Tue Jan 27	Lab 1: JAX and Bayesian Inference	Starter Due Wed Jan 28
3	Tue Feb 3	Lab 2: Hamiltonian Monte Carlo	Starter Due Wed Feb 4
4	Tue Feb 10	Lab 3: Training Neural Networks	Starter Due Wed Feb 11
—	Tue Feb 17	No discussion	Substitute Monday schedule
6	Tue Feb 24	Lab 4: Variational Autoencoders	Starter Due Wed Feb 25
7	Tue Mar 3	Lab 5: Diffusion	Due Wed Mar 4
—	Mar 7–15	No discussion	Spring Recess
8	Tue Mar 17	Lab 6: Differentiable Programming	Due Wed Mar 18
9	Tue Mar 24	Lab 7: Reinforcement Learning	Due Wed Mar 25
10	Tue Mar 31	Lab 8: Simulation-Based Inference	Due Wed Apr 1
11–14	Apr	Final project work	Proposal due Apr 17, Report due May 4

Topics Not Covered

Due to time constraints, this course does not cover several areas in AI for science, including neural operators, physics-informed learning, surrogate modeling, symbolic regression, causal inference, interpretability methods, experimental design, active learning, and recent AI-for-math developments (e.g., LLM-guided theorem proving). Some of these may be covered in later weeks as the course evolves.

Assessment

Discussion Labs 20%

Assignment 1 15%

Assignment 2 15%

Assignment 3 15%

Final Project 35%

Total 100%

Discussion Labs

Weekly in-class labs reinforce lecture material through hands-on programming. Students work through a notebook during discussion, exploring implementations and comparing results. Graded on participation and completion. Labs are due end of day Wednesday.

Assignments

Three assignments develop skills in method design and critical analysis. AI tools may be used freely, but the analysis and interpretation require critically engaging with what was produced. The discussion labs build foundational skills for these assignments.

Assignment 1: Sampler Synthesis Starter: Design and stress-test a novel sampling or variational inference method
Assignment 2: Ablation Archaeology Starter: Systematically ablate components of a geometric neural network to understand what each design choice contributes and why
Assignment 3: TBA

Final Project

Teams of 2–3 identify a scientific capability that current large language models struggle with, then finetune a language model to improve that capability. This is a two-stage project:

Proposal: Demonstrate that LLMs struggle at a specific scientific capability by curating a dataset or benchmark
Final report (~6 pages, NeurIPS format): Fine-tuned model and evaluation showing improvement, including artifacts used for fine-tuning (datasets, code, reinforcement learning environments)

Timeline

Deliverable	Out	Due
Discussion Labs	Tuesdays	Wednesday following lab
Assignment 1	Wed Feb 4	Wed Feb 18
Assignment 2	Wed Feb 18	Wed Mar 4
Assignment 3	Wed Mar 4	Wed Mar 25
Final Project	Mon Mar 30	Proposal: Fri Apr 17 Report: Mon May 4

Policies

Attendance

Regular attendance in lectures is expected. Please notify the instructor of planned absences.

Late Work

Late submissions are not accepted without prior arrangement. Extensions may be granted for documented emergencies.

Collaboration

Discussion of concepts and approaches is encouraged. However, all submitted code and written work must be your own. When collaborating, you must acknowledge your collaborators.

AI Tools

Learning to work effectively with AI is itself a course objective. Use AI tools freely to explore ideas, debug code, and deepen understanding. Focus on building genuine competence—understanding why something works, not just that it works. Disclose AI assistance in submissions, including its form and extent. See also the CDS GAIA policy.

Academic Conduct

All students are expected to read and abide by the BU Academic Code of Conduct. Plagiarism includes copying or restating work or ideas of another person or AI software without citing the source. In computing coursework, this includes sharing code, reusing code across courses without permission, and uploading assignments to external sites. Please review the examples of plagiarism provided by the BU Computer Science department. All suspected cases of plagiarism will be reported to the Academic Dean.

Accommodations

Boston University is committed to providing reasonable accommodations to students with documented disabilities. Students seeking accommodations should contact Disability & Access Services (25 Buick Street, Suite 300; 617-353-3658) as early as possible in the semester. A new Faculty Accommodation Letter (FAL) must be requested each semester; DAS will send this directly to instructors.

Religious Observance

Students observing a religious holiday during regularly scheduled class time are entitled to an excused absence. Please notify the instructor in advance to make arrangements for any missed work.

Recordings

Recording of lectures requires instructor permission. Students approved for recording as an accommodation must limit use to personal study and may not share recordings.