CMPS 4660/6660: Reinforcement Learning - Fall 2020

Instructor: Zizhan Zheng (zzheng3@tulane.edu), Stanley Thomas 307B
Class Time & Place: TR 12:25PM-01:35PM,  Dinwiddie Hall 103
Office Hours: Wed 10-11AM and by appointment

Course Description (Syllabus)

Reinforcement learning (RL) has found successful applications in various domains, including recommender systems, health care, energy, finance, robotics, transportation, and computer systems. Many people believe that RL is a step toward Artificial General Intelligence (AGI). This course introduces both the classic results and state-of-the-art research in RL at the graduate level. We will cover both the theoretical foundation of RL and its applications through case studies. Topics to be covered include:

Course Materials

Class Meeting Plan

We will meet both in person and online (about 40% of lectures will be online). A detailed class schedule can be found on the course webpage. To compensate for the shortened class time, extra reading and discussion material will be assigned on Canvas.

Homework Assignments  

There will be both written problem assignments and labs (programming assignments). Graduate students will be given extra questions that require advanced algorithmic/analytic techniques. Specific instructions will be given in each assignment. All the assignments will be posted on the course webpage.

Midterm Exam

The midterm will be closed-book and closed-notes, but you will be allowed to bring a cheat sheet to each exam (one letter page single-sided).  A different set of questions will be given to undergraduate and graduate students, respectively.

Final Project

Students will work in groups on a final project. Each group should include up to two members. The project should center on a well-defined problem related to reinforcement learning and (ideally) your specific research area. You will develop the project through close interactions with the instructor and your peers and write a paper that has all the sections of a typical research paper including some preliminary results.

A couple of milestone presentations will be scheduled during the semester and the final presentation will be in the final exam week (Nov. 30 – Dec. 5). The final paper is due after the final presentation. A tentative schedule for the final project can be found on the course website.

Late Policy

Each student has a total of 6 grace days that may be applied to the homework assignments. No more than 2 grace days may be used on any single assignment. Any assignment submitted more than 2 days past the deadline (or the date the student no longer has late day credit) will get zero credit. No late days are allowed for the final presentation and report.

Attendance Policy

Faculty and students must comply with University policies on COVID-19 testing and isolation, which are located here[https://tulane.edu/covid-19/health-strategies]. Faculty and students must wear face coverings in all common areas, including classrooms, and follow social distancing rules. Failure to comply is a violation of the Code of Student Conduct and students will be subject to University discipline, which can include suspension or permanent dismissal.

If a student cannot attend class for any reason, the student is responsible for communicating with their instructor to make up any work they may miss. Faculty will provide online options for class participation, outlined in this document, and unless a student is seriously ill, they are expected to use this option. The University Health Center will provide documentation verifying a student is ill, as well as verification that a student may return to class. With the approval of the Newcomb-Tulane College dean, an instructor may have a student who has excessive absences involuntarily withdrawn from a course with a WF grade after written warning at any time during the semester.

Grading Policy

The weighted average will determine your letter grade roughly as follows:
A  >= 90%; B  >= 80%; C  >= 70%; D  >= 60%; F  < 60%
+/- grades will be given for borderline cases.

All grades will be posted on Canvas.

Class Schedule & Handouts

Acknowledgment: many slides are adapted from Richard Sutton's RL slides, David Silver's RL course, and Berkeley CS 285.

Lecture Date Topic Lecture Topic Reading Assignments
1 Aug 20 (R) Introduction
Logistics; Intro to RL[pdf] SB 1.1-1.5
Probability review
Linear algebra review
Forming groups
(due Sep 1)
2
Aug 25 (T)
Markov decision processes and dynamic programming
Markov Reward Processes; Episodic and continuing tasks SB 3.1-3.5, DB 4.1
3
Aug 27 (R)
Finite MDP
SB 3.1-3.5, CS 2.1-2.2, DB 4.2
4
Sep 1 (T) Bellman equations SB 3.6, CS 2.3, DB 4.3 Homework 1 (due Sep 10)
5
Sep 3 (R)
Bellman optimality equation [pdf] SB 3.6, CS 2.4
6
Sep 8 (T)
Contractions and fixed point theorem; DP for prediction  CS 2.4, A.1-A.2
SB 4.1-4.2

7
Sep 10 (R)
Value iteration SB 4.3-4.7 Homework 2 (due Sep 22)

Sep 15 (T)
Class canceled

8
Sep 17 (R)
Policy iteration SB 4.3-4.7
9
Sep 22 (T)


Model-free prediction
and control


POMDP, LP approach for MDP [pdf],
Monte Carlo prediction [pdf]
SB 17.3, 5.1-5.2, CS 3.1
10
Sep 24 (R)
Student presentation: project proposal
11
Sep 29 (T)
TD(0), n-step TD, TD(λ),
Monte Carlo control
SB 6.1-6.3,  7.1, 12.1-12.2,  5.3-5.7
12
Oct 1 (R)
Sarsa, Q-learning SB 6.4-6.8
13
Oct 6 (T)
Approximation solution methods
Midterm: Tuesday, Oct 6
14
Oct 8 (R)
Value function approximation SB 9-10, CS 3.2
15
Oct 11 (U) Policy gradients SB 13

16
Oct 13 (T) Policy gradients SB 13

17
Oct 15 (R)
Actor-Critic methods SB 13, CS 4.4

18
Oct 20 (T)
Planning and learning Dyna SB 8.1-8.2

19
Oct 22 (R)
 Rollout, Monte Carlo tree search SB 8.3-8.11

20
Oct 27 (T)
Student presentation: project update
21
Oct 29 (R)

Exploration and exploitation

Multi-armed bandit, UCB SB 2, AS 1

22
Nov 3 (T)
Q-learning with UCB exploration

23
Nov 5 (R)
Contextual bandit AS 8
24
Nov 10 (T)



Student presentation: mini-lecture
25
Nov 12 (R)
Student presentation: mini-lecture
26
Nov 17 (T)

Advanced topics

Imitation learning, Inverse RL

27
Nov 19 (R)
Transfer learning; Meta-learning


28
Nov 24 (T)
Multi-agent RL


Nov 30 - Dec 5

Final presentation and final report

ADA/Accessibility Statement

Tulane University strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on your disability, please let me know immediately so that we can privately discuss options. I will never ask for medical documentation from you to support potential accommodation needs. Instead, to establish reasonable accommodations, I may request that you register with the Goldman Center for Student Accessibility. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. Goldman Center contact information: goldman@tulane.edu; (504) 862-8433; accessibility.tulane.edu.

Code of Academic Conduct

The Code of Academic Conduct applies to all students, full-time and part-time, in Tulane University. Tulane University expects and requires behavior compatible with its high standards of scholarship. By accepting admission to the university, a student accepts its regulations (i.e., Code of Academic Conduct and Code of Student Conduct) and acknowledges the right of the university to take disciplinary action, including suspension or expulsion, for conduct judged unsatisfactory or disruptive.

Religious Accommodation Policy

Per Tulane’s religious accommodation policy, I will make every reasonable effort to ensure that students are able to observe religious holidays without jeopardizing their ability to fulfill their academic obligations. Excused absences do not relieve the student from the responsibility for any course work required during the period of absence. Students should notify me within the first two weeks of the semester about their intent to observe any holidays that fall on a class day or on the day of the final exam.

Title IX

Tulane University recognizes the inherent dignity of all individuals and promotes respect for all people. As such, Tulane is committed to providing an environment free of all forms of discrimination including sexual and gender-based discrimination, harassment, and violence like sexual assault, intimate partner violence, and stalking. If you (or someone you know) has experienced or is experiencing these types of behaviors, know that you are not alone. Resources and support are available: you can learn more at allin.tulane.edu.  Any and all of your communications on these matters will be treated as either “Confidential” or “Private” as explained in the chart below. Please know that if you choose to confide in me I am mandated by the university to report to the Title IX Coordinator, as Tulane and I want to be sure you are connected with all the support the university can offer. You do not need to respond to outreach from the university if you do not want. You can also make a report yourself, including an anonymous report, through the form at tulane.edu/concerns.

Confidential Private
Except in extreme circumstances, involving imminent danger to one’s self or others, nothing will be shared without your explicit permission. Conversations are kept as confidential as possible, but information is shared with key staff members so the University can offer resources and accommodations and take action if necessary for safety reasons.
Counseling & Psychological Services (CAPS) | (504) 314-2277 or The Line (24/7) | (504) 264-6074 Case Management & Victim Support Services | (504) 314-2160 or srss@tulane.edu
Student Health Center | (504) 865-5255 Tulane University Police (TUPD) | Uptown - (504) 865-5911. Downtown – (504) 988-5531
Sexual Aggression Peer Hotline and Education (SAPHE) | (504) 654-9543 Title IX Coordinator | (504) 865-5615 or msmith76@tulane.edu

Emergency Preparedness & Response