CMPS 4660/6660: Reinforcement Learning - Fall 2020

Instructor:	Zizhan Zheng (zzheng3@tulane.edu), Stanley Thomas 307B
Class Time & Place:	TR 12:25PM-01:35PM, Dinwiddie Hall 103
Office Hours:	Wed 10-11AM and by appointment

Course Description (Syllabus)

Reinforcement learning (RL) has found successful applications in various domains, including recommender systems, health care, energy, finance, robotics, transportation, and computer systems. Many people believe that RL is a step toward Artificial General Intelligence (AGI). This course introduces both the classic results and state-of-the-art research in RL at the graduate level. We will cover both the theoretical foundation of RL and its applications through case studies. Topics to be covered include:

Markov Decision Processes
Dynamic Programming
Model-Free Prediction and Control
Value Function Approximation
Policy Gradient Methods
Planning and Learning
Exploration and Exploitation
Deep Reinforcement Learning
Multi-Agent Reinforcement Learning
...

Course Materials

Textbook: [SB] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction (2nd edition), A Bradford Book, 2018.
References

[DB] Dimitri Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, 2019.
[CS] Csaba Szepesvári, Algorithms for Reinforcement Learning, 2010.
[AS] Aleksandrs Slivkins, Introduction to Multi-Armed Bandits, 2019.

We will also use some material from book chapters available online and papers from journals and conferences. Information on these will be provided on the course webpage.

Class Meeting Plan

We will meet both in person and online (about 40% of lectures will be online). A detailed class schedule can be found on the course webpage. To compensate for the shortened class time, extra reading and discussion material will be assigned on Canvas.

Homework Assignments

There will be both written problem assignments and labs (programming assignments). Graduate students will be given extra questions that require advanced algorithmic/analytic techniques. Specific instructions will be given in each assignment. All the assignments will be posted on the course webpage.

Midterm Exam

The midterm will be closed-book and closed-notes, but you will be allowed to bring a cheat sheet to each exam (one letter page single-sided). A different set of questions will be given to undergraduate and graduate students, respectively.

Final Project

Students will work in groups on a final project. Each group should include up to two members. The project should center on a well-defined problem related to reinforcement learning and (ideally) your specific research area. You will develop the project through close interactions with the instructor and your peers and write a paper that has all the sections of a typical research paper including some preliminary results.

A couple of milestone presentations will be scheduled during the semester and the final presentation will be in the final exam week (Nov. 30 – Dec. 5). The final paper is due after the final presentation. A tentative schedule for the final project can be found on the course website.

Late Policy

Each student has a total of 6 grace days that may be applied to the homework assignments. No more than 2 grace days may be used on any single assignment. Any assignment submitted more than 2 days past the deadline (or the date the student no longer has late day credit) will get zero credit. No late days are allowed for the final presentation and report.

Attendance Policy

Faculty and students must comply with University policies on COVID-19 testing and isolation, which are located here[https://tulane.edu/covid-19/health-strategies]. Faculty and students must wear face coverings in all common areas, including classrooms, and follow social distancing rules. Failure to comply is a violation of the Code of Student Conduct and students will be subject to University discipline, which can include suspension or permanent dismissal.

If a student cannot attend class for any reason, the student is responsible for communicating with their instructor to make up any work they may miss. Faculty will provide online options for class participation, outlined in this document, and unless a student is seriously ill, they are expected to use this option. The University Health Center will provide documentation verifying a student is ill, as well as verification that a student may return to class. With the approval of the Newcomb-Tulane College dean, an instructor may have a student who has excessive absences involuntarily withdrawn from a course with a WF grade after written warning at any time during the semester.

Grading Policy

Problem Sets - 20%
Labs - 15%
Midterm - 20%
Final Project - 35%
Class Participation - 10%

The weighted average will determine your letter grade roughly as follows:
A >= 90%; B >= 80%; C >= 70%; D >= 60%; F < 60%
+/- grades will be given for borderline cases.

All grades will be posted on Canvas.

Class Schedule & Handouts

Acknowledgment: many slides are adapted from Richard Sutton's RL slides, David Silver's RL course, and Berkeley CS 285.

Lecture	Date	Topic	Lecture Topic	Reading	Assignments
1	Aug 20 (R)	Introduction	Logistics; Intro to RL[pdf]	SB 1.1-1.5 Probability review Linear algebra review	Forming groups (due Sep 1)
2	Aug 25 (T)	Markov decision processes and dynamic programming	Markov Reward Processes; Episodic and continuing tasks	SB 3.1-3.5, DB 4.1
3	Aug 27 (R)		Finite MDP	SB 3.1-3.5, CS 2.1-2.2, DB 4.2
4	Sep 1 (T)		Bellman equations	SB 3.6, CS 2.3, DB 4.3	Homework 1 (due Sep 10)
5	Sep 3 (R)		Bellman optimality equation [pdf]	SB 3.6, CS 2.4
6	Sep 8 (T)		Contractions and fixed point theorem; DP for prediction	CS 2.4, A.1-A.2 SB 4.1-4.2
7	Sep 10 (R)		Value iteration	SB 4.3-4.7	Homework 2 (due Sep 22)
	Sep 15 (T)		Class cancelled
8	Sep 17 (R)		Policy iteration	SB 4.3-4.7
9	Sep 22 (T)	Model-free prediction and control	LP approach for MDP, POMDP [pdf]; Monte Carlo prediction	SB 17.3, 5.1-5.2, CS 3.1
10	Sep 24 (R)		Student presentations: project proposal		Lab 1 (due Oct 6)
11	Sep 29 (T)		Stochastic approximation, TD(0)	SB 6.1-6.3, CS 3.1
12	Oct 1 (R)		TD(0)	SB 6.1-6.3, CS 3.1
13	Oct 6 (T)		n-step TD, TD(λ)	SB 7.1, 12.1-12.2, CS 3.1	Homework 3 (due Oct 13)
14	Oct 8 (R)		TD(λ) [pdf] Monte Carlo control	SB 5.3-5.7
15	Oct 11 (U)		Sarsa; Midterm review [pdf]	SB 6.4
16	Oct 13 (T)		Q-learning [pdf]	SB 6.5-6.7
17	Oct 15 (R)		Midterm: Thursday, Oct 15
18	Oct 20 (T)	Approximation solution methods	On-policy prediction	SB 9.1-9.2, CS 3.2
19	Oct 22 (R)		On-policy prediction	SB 9.3-9.4, CS 3.2
20	Oct 27 (T)		On-policy control; Off-policy methods	SB 9.5, 9.8, 11.1-11.3
	Oct 29 (R)		Class cancelled
21	Nov 3 (T)		Batch methods, DQN [pdf]	SB 16.5, 13 Mnih, et al., “Human-level control through deep reinforcement learning”, Nature, 2015	Lab 2
22	Nov 5 (R)		Student presentations: project update
23	Nov 7 (S)		Policy gradients	SB 13
24	Nov 10 (T)		Policy gradients	SB 13
25	Nov 12 (R)		Mini-lectures	Arie, Eli, and Sri: Deep Q-Networks Farzad and Tianyi: Multi-armed bandits for wireless network
26	Nov 17 (T)	Planning	DDPG [pdf], model-based RL	Lillicrap, et al., “Continuous control with deep reinforcement learning”, ICLR, 2016; SB 8
27	Nov 19 (R)		Mini-lectures	Ningxiao and Xiaolin: Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations Henger: Convergence of Q-learning
28	Nov 24 (T)		Dyna, Rollout, Monte Carlo tree search [pdf]	SB 8, 16.6
			Final presentations: Wednesday, Dec 2, 4:00-6:00pm Final report: Friday, Dec 4, 11:59pm

ADA/Accessibility Statement

Tulane University strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on your disability, please let me know immediately so that we can privately discuss options. I will never ask for medical documentation from you to support potential accommodation needs. Instead, to establish reasonable accommodations, I may request that you register with the Goldman Center for Student Accessibility. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. Goldman Center contact information: goldman@tulane.edu; (504) 862-8433; accessibility.tulane.edu.

Code of Academic Conduct

The Code of Academic Conduct applies to all students, full-time and part-time, in Tulane University. Tulane University expects and requires behavior compatible with its high standards of scholarship. By accepting admission to the university, a student accepts its regulations (i.e., Code of Academic Conduct and Code of Student Conduct) and acknowledges the right of the university to take disciplinary action, including suspension or expulsion, for conduct judged unsatisfactory or disruptive.

Religious Accommodation Policy

Per Tulane’s religious accommodation policy, I will make every reasonable effort to ensure that students are able to observe religious holidays without jeopardizing their ability to fulfill their academic obligations. Excused absences do not relieve the student from the responsibility for any course work required during the period of absence. Students should notify me within the first two weeks of the semester about their intent to observe any holidays that fall on a class day or on the day of the final exam.

Title IX

Tulane University recognizes the inherent dignity of all individuals and promotes respect for all people. As such, Tulane is committed to providing an environment free of all forms of discrimination including sexual and gender-based discrimination, harassment, and violence like sexual assault, intimate partner violence, and stalking. If you (or someone you know) has experienced or is experiencing these types of behaviors, know that you are not alone. Resources and support are available: you can learn more at allin.tulane.edu. Any and all of your communications on these matters will be treated as either “Confidential” or “Private” as explained in the chart below. Please know that if you choose to confide in me I am mandated by the university to report to the Title IX Coordinator, as Tulane and I want to be sure you are connected with all the support the university can offer. You do not need to respond to outreach from the university if you do not want. You can also make a report yourself, including an anonymous report, through the form at tulane.edu/concerns.

Confidential	Private
Except in extreme circumstances, involving imminent danger to one’s self or others, nothing will be shared without your explicit permission.	Conversations are kept as confidential as possible, but information is shared with key staff members so the University can offer resources and accommodations and take action if necessary for safety reasons.
Counseling & Psychological Services (CAPS) \| (504) 314-2277 or The Line (24/7) \| (504) 264-6074	Case Management & Victim Support Services \| (504) 314-2160 or srss@tulane.edu
Student Health Center \| (504) 865-5255	Tulane University Police (TUPD) \| Uptown - (504) 865-5911. Downtown – (504) 988-5531
Sexual Aggression Peer Hotline and Education (SAPHE) \| (504) 654-9543	Title IX Coordinator \| (504) 865-5615 or msmith76@tulane.edu

Confidential	Private
Except in extreme circumstances, involving imminent danger to one’s self or others, nothing will be shared without your explicit permission.	Conversations are kept as confidential as possible, but information is shared with key staff members so the University can offer resources and accommodations and take action if necessary for safety reasons.
Counseling & Psychological Services (CAPS) \| (504) 314-2277 or The Line (24/7) \| (504) 264-6074	Case Management & Victim Support Services \| (504) 314-2160 or srss@tulane.edu
Student Health Center \| (504) 865-5255	Tulane University Police (TUPD) \| Uptown - (504) 865-5911. Downtown – (504) 988-5531
Sexual Aggression Peer Hotline and Education (SAPHE) \| (504) 654-9543	Title IX Coordinator \| (504) 865-5615 or msmith76@tulane.edu