ECS 165A: Database Systems

Instructor: Mohammad Sadoghi
E-mail: [email protected]
Office: Remote
Office Hours: Thursdays 4:00-6:00pm (by appointments)

Teaching Fellow: Apratim Shukla
Email: [email protected]
Office: Zoom
Office Hours: Mondays 1:00-3:00 PM

Teaching Fellow: Shaokang Xie
Email: [email protected]
Office: Academic Surge 2392
Office Hours: Wednesdays 3:00-5:00 PM

Lecture
            Time: Tue/Thu 10:30 AM - 11:50 AM
            Location: Giedt 1002

Discussions Sessions
           A1 Wednesdays 8:00 AM - 8:50 AM (Cruess Hall 107)
           A2 Mondays 4:10 PM - 5:00 PM (Hoagland Hall 113)
           A3 Thursdays 3:10 PM - 4:00 PM (Olson Hall 205)


Overview

This is an innovative data management course that provides an introduction to the design and development of fundamental concepts in relational database management systems (DBMS). You will learn the theory and design behind database systems, the issues that affect their functionality and performance, and most importantly, what it takes to effectively utilize modern databases in practice.

The course has completely been redesigned; all students are expected to work in groups of five on an exciting, open-ended, data-oriented, quarter-long project, in a sense, operating and simulating startup environments. Needless to say, this rewarding experience is accompanied by a significant development effort (in Python) that spans hands-on experience on concepts such as memory and disk management, synchronization and concurrency, logging and recovery, and query optimization and evaluation, to name a few. To materialize these objectives, together this quarter, we will be building simplified L-Store [Paper, Slides] from scratch, an Hybrid Transactional and Analytical Processing (HTAP) database.

The coursework is complementary to the classical well-formed, prescriptive model of assignments/projects that are indeed effective and invaluable in practice. Instead by design, the project is intended to be open-ended, namely, minimal instructions and requirements will be provided, as such it rewards and values research & development, taking risks, above all, it is aimed to foster and tap into the creativity of each individual.

The quarter-long project is broken into a set of three milestones. You will have a choice on how your milestone will be graded. Either (1) the entire grade will be determined by the autograder or (2) 50% of the grade will be determined by the autograder and 50% will be determined through oral presentation. Prior to the deadline for each milestone, each group must decide between option (1) or (2); once each group submits their choice, this cannot be changed for that milestone. Each group must vote on option (1) or (2) internally, and the vote of the majority will determine the choice.

For those who choose the presentation option, it must be done by all five group members, and each group member must be ready to answer questions about any aspect of the project; the latter is the utmost importance of ensuring a comprehensive learning experience and fair division of work among all members.

Furthermore, in each milestone, a bonus of up to 10% can be gained to further encourage taking a risk, going the extra mile, and to just be curious & creative. Part of the bonus is reserved for the fastest and the most optimized implementation of L-Store in class, e.g., how many read/write operations per second (adjusted based on the number of cores, CPU clock frequency, amount of memory, cache size, and other hardware metrics to ensure comparable results). The design bonus will be only determined as part of the presentation, while the performance bonus will be determined by the autograder.

A fact of life is that when there is group work, whether at school or in society, there are occasional conflicts; it is crucial to learn how to resolve our differences and be receptive, open, and kind to one another. In kindness and reflection, we shall aim to resolve all conflicts. It is the group responsibility to handle all internal affairs, and only when absolutely necessary involving the instructor. But note, only under very rare exceptional circumstances, a group re-structuring would be granted because once the group is formed, at least for 10 weeks, we must learn how to work with each other in harmony.

For each group, it is recommended that each member lead one aspect of the project while contributing and learning about other parts; roughly, the main components are (1) memory management (e.g., bufferpool), (2) disk management (e.g., persistence and logging), (3) in-memory indexing (e.g., hashing or tree), (4) data access methods (e.g., APIs and query language), (5) multi-threading and synchronization (e.g., data structures latching), (6) transaction and concurrency (e.g., record-level locking), and (7) testing and benchmarking (correctness verification and performance measurements).

As for the lectures, the list of topics covered would include but not limited to:

  • DBMS Concepts and Architecture
  • Storage and Indexing
  • Query Languages (Relational Algebra and SQL)
  • Query Evaluation and Optimization
  • Concurrency Control and Recovery
  • Database Design, the E-R Model, Normalization, and Tuning
  • Database Security, Blockchain

Syllabus

Textbooks:

Required:
  • "Database Management Systems" (referred to as DB), 3rd Edition. Raghu Ramakrishnan and J. Gehrke. McGraw Hill, 2003, ISBN 0-07-246563-8.
  • "Transaction Processing on Modern Hardware" (referred to as TP), Mohammad Sadoghi and Spyros Blanas. Morgan & Claypool Synthesis Lectures on Data Management. 2019. [Free online access when accessed within the UC Davis network].
  • "Fault-tolerant Distributed Transactions on Blockchain". (referred to as BC) Suyash Gupta, Jelle Hellings and Mohammad Sadoghi. Morgan & Claypool Synthesis Lectures on Data Management. 2021. [Free online access when accessed within the UC Davis network].
  • Additionally, a list of research papers will be added later as part of the optional reading for the enthusiastic students.
Optional:

Workload:

The key component of the course is the quarter-long group project that is broken into a set of three main milestones. All milestones will be graded orally (unless specified otherwise), where the progress is presented by all group members, and each group member must be ready to answer questions about any aspect of the project; the latter is of the utmost importance to ensure a comprehensive learning experience and a fair division of work among all members. Therefore, for each milestone (especially when presentation option is chosen), a portion of the grade is devoted to the presented project as a whole on which all members receive the same grade (50% of the grade), but the remaining portion is individualized (50% of the grade), So, not all group members may receive the same grade for each milestone. For each milestone, a bonus of up to 10% can be rewarded.

For each milestone, each group must submit an attribution section in which they specifically state each person's role and the percentage of their contribution. In a group of 5, it is expected that a person will complete 15-20% of the overall project. Of course, determining the percentage is not approximate, but the point is that every member contributes fairly. No contribution means a grade of 0.

The optional presentation for the oral evaluation is limited to at most 15 minutes per team, the time is strictly enforced. All milestone presentations will be done on Zoom. The breakdown of 15 minutes is as follows:
  • The milestone overview: the design and solution, what was accomplished and how? (8 minutes)
  • Q/A: Questions about various aspects of the project (4 minutes)
  • Demo: A live demonstration of the code, which includes adding, modifying, and querying the data (3 minutes)
Additionally, there will be a pre-milestone individual assignment for which each student is expected to think about and explore all aspects of the project. In other words, each student needs to think about what is required to make an HTAP database and document their finding in a few pages. Of course, it is highly recommended to search the web for tutorials and other useful materials. As part this pre-milestone phase, (1) the group must be formed, namely, when each student submits their report, the name of their group members and their team name must be submitted; and (2) each student must decide which aspect of the project they wish to lead-based on the research. If you need help to form your group and look for partners, kindly contact the TAs. Note any group assignment by TAs will be random. Additionally, An online form will be made available for you to submit your team details.

All students are required to attend every lecture and ask insightful questions. Additionally, in order to encourage class participation, there will be short and exciting pop quizzes. We expect to have 3-5 quizzes spread throughout the quarter. The lowest quiz score will be dropped, but there will be no opportunity for makeup quizzes unless there are exceptional circumstances or illness (doctor notes are required). If a quiz is missed for valid reasons approved by the instructor, no zero grade will be assigned but there will be no retake either. There is zero tolerance policy on the use of AI (e.g., ChatGPT) or plagiarism in general during quizzes or tests, any offense will result in a grade of zero and will be reported to Office of Student Support and Judicial Affairs.

Grading: (tentative and subject to change)

The final grade will be based upon the following components (all submissions are due at midnight):
  • Individual Pre-milestone Assignment: 5% (Due on January 18, 2025)
  • Group Project (three milestones): 60%

    • Milestone 1 (20%): Single-threaded, In-memory L-Store (Due on Feb 11); Optional Oral Eval on Feb 14 (8:00am-7:00pm)
    • Milestone 2 (20%): Single-threaded, In-memory & Durable L-Store (Due on Feb 25); Optional Oral Eval on Feb 28 (8:00am-7:00pm)
    • Milestone 3 (20%): Multi-threaded, In-memory & Durable L-Store (Due on Mar 11); Optinal Oral Eval on Mar 14 (8:00am-7:00pm)
  • In-class Popup Quizes: 5% (Expect 3-5 Quizzes)
  • In-class Midterm: 10% (In Claass on February 20, 2025)
  • Final Exam: 20% (In Class on March 13, 2025)

Communication:

For communication with the instructor, please use email ([email protected]) and not Canvas. The instructor will not check/reply to Canvas messages.

For group communication (monitored by TAs) and other students, we will rely on Piazza.

Course Policy:

In this class, we adopt the UC Davis Code of Academic Conduct available here.

In addition, late submission will result in a 10% penalty for each day late; at most, an extension of up to two days may be granted. Students are strongly advised that any act of cheating will result in a score of 0 for the entire milestone (or the course), and offenses will be reported to the Office of the Dean of Students. You are encouraged to discuss problems and ideas, but the final solution or code must be your own. In the event of a major campus emergency, course requirements, deadlines, and grading percentages are subject to changes that may be necessitated by a revised quarter calendar. If such unusual circumstances arise, students may determine any such changes by contacting the instructor.


Contents

List of Topics (tentative):

  1. Introduction to Database Systems
  2. Storage and Indexing
  3. Transaction Management
  4. Database Query Languages
  5. Schema Design (tentative)
  6. Query Evaluation (tentative)
  7. Secure Transactions & Blockchain (tentative)


Tentative Schedule

[Previous Lecture Recordings: Winter 2020, Winter 2021]
January 7, 2025:
  • Overview
January 9, 2025:
  • L-Store Overview
January 14, 2025: [Lecture Recording:] January 16, 2025: [Lecture Recording:]
  • The Relational Model (DB.Chapter 3)
  • (Individual Pre-milestone Assignment is due on January 18)
January 21, 2025: [Lecture Recording] January 23, 2025: [Lecture Recording: Part 1, Part 2]
  • L-Store - Lineage-based Storage Architecture (TP.Chapters 3.1.3, 3.4, 5.2.1) [Paper, Slides]
January 28, 2025: [Lecture Recording: Part 1, Part 2]
  • L-Store - Lineage-based Storage Architecture - Continued (TP.Chapters 3.1.3, 3.4, 5.2.1) [Paper, Slides]
January 30, 2025: [Lecture Recording] February 4, 2025: [Lecture Recording: Part 1, Part 2] February 6, 2025: [Lecture Recording: Part 1, Part 2] February 11, 2025: [Lecture Recording: Part 1, Part 2]
  • Storing Data: Disks and Files - Continued (DB.Chapter 9)
  • (Milestone 1 is due)
February 13, 2025: [Lecture Recording: Part 1, Part 2] February 18, 2025: [Lecture Recording] February 20, 2025:
    In-class Midterm
February 25, 2025: [Lecture Recording] February 27, 2025: [Lecture Recording: Lecture Recording: Part 1, Part 2]
  • Overview of Concurrency in L-Store: 2VCC - Two-version Concurrency Control (TP.Chapters 3.1.3) [Paper, Slides]
March 4, 2025: [Lecture Recording: CC.Part 1] March 6, 2025: [Lecture Recording: CC.Part 2, Lecture Recording: CC.Part 3] March 11, 2025: [Lecture Recording]
  • L-Store Concurrency Controls: QueCC - A Queue-oriented, Control-free Concurrency Architecture (TP.Chapters 5.1.6) [Paper, Slides]
  • (Milestone 3 is due)
March 13, 2025:
    In-class Final Exam
Optional [Lecture Recording: Part 1, Part 2] Optional [Lecture Recording] Optional [Lecture Recording: Part 1, Part 2]

Announcements

January 1, 2025: Welcome to ECS 165A. Kindly note that the lecture starts on January 7, and the discussion starts on the week of January 13. Looking forward to an amazing quarter.

Top Presentations

First Milestone

Second Milestone

Third Milestone


Grades

Grades will be made available on your Canvas account.