Teaching Fellow: Junchao Chen
Teaching Fellow: Dakai Kang
This is an innovative data management course that provides an introduction
to the design and development of fundamental concepts in relational database
management systems (DBMS). You will learn the theory and design behind database
systems, the issues that affect their functionality and performance, and
most importantly, what it takes to effectively utilize modern databases in practice.
The course has completely been redesigned, all students are expected to work in a group of five on an exciting, open-ended, data-oriented, quarter-long project, in a sense, operating and simulating startup environments. Needless to say, this rewarding experience is accompanied by a significant development effort (in Python) that spans hands-on experience on concepts such as memory and disk management, synchronization and concurrency, logging and recovery, and query optimization and evaluation, to name a few. To materialize these objectives, together this quarter, we will be building simplified L-Store [Paper, Slides] from scratch, an Hybrid Transactional and Analytical Processing (HTAP) database.
The course work is complementary to the classical well-formed, prescriptive model of assignments/projects that are indeed effective and invaluable in practice. Instead by design, the project is intended to be open-ended, namely, minimal instructions and requirements will be provided, as such it rewards and values research & development, taking risks, above all, it is aimed to foster and tap into the creativity of each individual.
The quarter-long project is broken into a set of three milestones, all milestones will be graded both orally (60% of the grade) and using autograder (the remaining 40% of the grade). The presentation is done by all five group members, and each group member must be ready to answer questions about any aspect of the project; the latter is the utmost importance to ensure comprehensive learning experience and fair division of work among all members. Furthermore, in each milestone, a bonus of up to 10% can be gained to further encourage taking a risk, going the extra mile, and to just be curious & creative. Part of the bonus is reserved for fastest and the most optimized implementation of L-Store in class, e.g., how many read/write operations per second (adjusted based on the number of cores, CPU clock frequency, amount of memory, cache size, and other hardware metrics to ensure comparable results).
A fact of life, when there is group work, whether at school or in society, there are occasional conflicts; and it is crucial to learn how to resolve our differences and be receptive, open, and kind to one another. In kindness and reflection, we shall aim to resolve all conflicts. It is the group responsibility to handle all internal affairs, and only when absolutely necessary involving the instructor. But note, only under very rare exceptional circumstances, a group re-structuring would be granted because once the group is formed, at least for 10 weeks, we must learn how to work with each other in harmony.
For each group, it is recommended that each member lead one aspect of the project while contributing and learning about other parts; roughly, the main components are (1) memory management (e.g., bufferpool), (2) disk management (e.g., persistence and logging), (3) in-memory indexing (e.g., hashing or tree), (4) data access methods (e.g., APIs and query language), (5) multi-threading and synchronization (e.g., data structures latching), (6) transaction and concurrency (e.g., record-level locking), and (7) testing and benchmarking (correctness verification and performance measurements).
As for the lectures, the list of topics covered would include but not limited to:
- DBMS Concepts and Architecture
- Storage and Indexing
- Query Languages (Relational Algebra and SQL)
- Query Evaluation and Optimization
- Concurrency Control and Recovery
- Database Design, the E-R Model, Normalization, and Tuning
- Database Security, Blockchain
- "Database Management Systems" (referred to as DB), 3rd Edition. Raghu Ramakrishnan and J. Gehrke. McGraw Hill, 2003, ISBN 0-07-246563-8.
- "Transaction Processing on Modern Hardware" (referred to as TP), Mohammad Sadoghi and Spyros Blanas. Morgan & Claypool Synthesis Lectures on Data Management. 2019. [Free online access when accessed within the UC Davis network].
- "Fault-tolerant Distributed Transactions on Blockchain". (referred to as BC) Suyash Gupta, Jelle Hellings and Mohammad Sadoghi. Morgan & Claypool Synthesis Lectures on Data Management. 2021. [Free online access when accessed within the UC Davis network].
- Additionally, a list of research papers will be added later as part of the optional reading for the enthusiastic students.
- "Readings in Database Systems", 4th Edition. Joseph M. Hellerstein, Michael Stonebraker. 2005
- ResilientDB Fabric Resources: Website, Architecture Journey, Vision/Slides, Bitcoin Radio, Lessons Demo, Tutorial Slides: [Theory | System, Video], Code, Hands-on, Wiki, Blog.
- "The Elements of Style", 4th Edition. William Strunk Jr. and E. B. White. 1999
- "Style: Toward Clarity and Grace". Joseph M. Williams. 1995
Workload:The key component of the course is the quarter-long group project that is broken into a set of three main milestones. All milestones will be graded orally (unless specified otherwise), where the progress is presented by all group members, and each group member must be ready to answer questions about any aspect of the project; the latter is the utmost importance to ensure comprehensive learning experience and fair division of work among all members. Therefore, for each milestone, a portion of the grade is devoted to the presented project as a whole on which all members receive the same grade (60% of the grade), but the remaining portion is individualized (40% of the grade), so for each milestone, not all group members may receive the same grade. For each milestone, a bonus of up to 10% can be rewarded.
The presentation for the oral evaluation is limited to at most 15 minutes per team, the time is strictly enforced. The breakdown of 15 minutes is as follows:
- The milestone overview: the design and solution, what was accomplished and how? (8 minutes)
- Q/A: Questions about various aspects of the project (4 minutes)
- Demo: A live demonstration of the code, which includes adding, modifying, and querying the data (3 minutes)
All milestone presentations will be done on Zoom.
Grading: (tentative and subject to change due to COVID)The final grade will be based upon the following components (all submissions are due at midnight):
- Individual Pre-milestone Assignment: 5% (Due on January 24, 2023)
- Group Project (three milestones): 60%
- Milestone 1 (20%): Single-threaded, In-memory L-Store (Due on February 14); Oral Eval on February 17 (8:00am-7:00pm)
- Milestone 2 (20%): Single-threaded, In-memory & Durable L-Store (Due on February 28); Oral Eval on March 3 (8:00am-7:00pm)
- Milestone 3 (20%): Multi-threaded, In-memory & Durable L-Store (Due on March 14); Oral Eval on March 17 (8:00am-7:00pm)
- In-class Midterm: 10% (February 23, 2022)
- Final Exam: 25% (March 20, 2023 at 3:30pm)
Communication:For communication with the instructor, please use email ([email protected]) and not Canvas. The instructor will not check/reply to Canvas messages.
For group communication (monitored by TAs) and other students, we will rely on Piazza.
Course Policy:In this class, we adopt the UC Davis Code of Academic Conduct available here.
In addition, late submission will result in a 10% penalty for each day late; at most an extension of up to two days may be granted. Students are strongly advised that any act of cheating will result in a score of 0 for the entire milestone (or the course) and offenses will be reported to the Office of the Dean of Students. You are encouraged to discuss problems and ideas but the final solution or code must be your own. In the event of a major campus emergency, course requirements, deadlines, and grading percentages are subject to changes that may be necessitated by a revised quarter calendar. If such unusual circumstances arise, students may determine any such changes by contacting the instructor.
List of Topics (tentative):
- Introduction to Database Systems
- Database Management Systems (DB.Chapter 1)
- The Relational Model (DB.Chapter 3, TP.Chapter 2)
- Optional Advanced Topics: Overview of Modern Distributed Data Processing Systems (Hadoop, Spark)
- Disks and Files (DB.Chapter 8)
- File Organization and Indexing (DB.Chapter 9)
- Lineage-based Storage Architecture (L-Store: Paper, Slides)
- Tree-Structured Indexes (TP.Chapter 6, DB.Chapter 9, and Intuitive and Simple Visual)
- Hash-Based Indexes (TP.Chapter 6 & DB.Chapter 10)
- Database Design & Index Selection (DB.Chapter 20a, Chapter 20b)
- Optional Advanced Topics: Latch-free, In-memory Indexes (TP.Chapter 6 & Bw-Tree: Paper, Slides)
- Optional Advanced Topics: Index Maintenance (Indirection: Paper)
- Optional Advanced Topics: Resilient Distributed Datasets (RDDs: Paper, Slides)
- Optional Advanced Topics: In-memory, High-dimensional Indexes
- Transaction Properties (TP.Chapter 2 & DB.Chapter 16)
- Concurrency Control (DB.Chapter 17 & TP.Chapters 3-5)
- Optional Advanced Topics: Crash Recovery (DB.Chapter 20)
- Optional Advanced Topics: In-memory Databases (TP.Chapter 3 & Microsoft Hekaton: Paper)
- Optional Advanced Topics: Multi-version Optimistic Concurrency Control (TP.Chapter 3 & 2VCC: Paper)
- Optional Advanced Topics: Deterministic Concurrency Control (TP.Chapter 5 & QueCC: Paper)
- Optional Advanced Topics: Distributed Non-blocking Transactions (EasyCommit: Paper, Slides)
- Optional Advanced Topics: Distributed Deterministic Transactions (TP.Chapter 4 & Calvin: Paper, Distributed Workflow: Paper)
- Optional Advanced Topics: Avoiding Distributed Coordination Using Partitioning & Replication (TP.Chapter 6 & Schism: Paper)
- Relational Algebra (DB.Chapter 4)
- SQL: Queries, Programming, and Triggers (DB.Chapter 5)
- Optional Advanced Topics: Hive (Paper, Slides by Facebook & Cloudera)
- Optional Advanced Topics: SparkSQL (Paper, Slides by UC, Berkeley, AMPlab & databricks)
- External Sorting (DB.Chapter 11)
- Evaluation of Relational Operators (DB.Chapter 12a [Join], DB.Chapter 12b [Selection, Projection, etc.])
- Relational Query Optimization (DB.Chapters 13 & 14)
- Optional Advanced Topics: Materialized Views (Survey Paper, Slides)
- Optional Advanced Topics: Blockchain Transaction Processing. [Encyclopedia: Preprint]
- Optional Advanced Topics: ResilientDB: Website, Architecture Journey, Vision/Slides, Bitcoin Radio, Lessons Demo, Tutorial Slides: [Theory | System, Video], Code, Hands-on, Wiki, Blog.
- Optional Advanced Topics: Blockchain Landscape and AI Renaissance: The Bright Path Forward. [Tutorial: Part 1, Part 2]
- Introduction to Relational Databases (DB.Chapter 1)
- The Relational Model (DB.Chapter 3)
- Indexing in Modern Databases (Modern Databases)
- L-Store - Lineage-based Storage Architecture (TP.Chapters 3.1.3, 3.4, 5.2.1) [Paper, Slides] (Individual Pre-milestone Assignment is due on January 24)
- L-Store - Lineage-based Storage Architecture - Continued (TP.Chapters 3.1.3, 3.4, 5.2.1) [Paper, Slides]
- L-Store - Lineage-based Storage Architecture - Continued (TP.Chapters 3.1.3, 3.4, 5.2.1) [Paper, Slides]
- Storage and Indexing (DB.Chapter 8)
- Storing Data: Disks and Files (DB.Chapter 9)
- Transaction Management Overview (TP.Chapter 2 & DB.Chapter 16)
- Overview of Concurrency in L-Store: 2VCC - Two-version Concurrency Control (TP.Chapters 3.1.3) (Milestone 1 is due)
- Overview of Concurrency in L-Store: 2VCC - Two-version Concurrency Control - Continued (TP.Chapters 3.1.3)
- L-Store Concurrency Controls: QueCC - A Queue-oriented, Control-free Concurrency Architecture (TP.Chapters 5.1.6)
- Concurrency Control: Theory & Multi-granularity Locking (DB.Chapter 17 & TP.Chapters 3-5) (Milestone 2 is due)
- Concurrency Control: Optimistic and Timestamp (DB.Chapter 17 & TP.Chapters 3-5)
- Concurrency Control: Multi-version CC (DB.Chapter 17 & TP.Chapters 3-5)
- Relational Algebra Query Language (DB.Chapter 4)
- SQL: Queries, Programming, and Triggers (DB.Chapter 5) (Milestone 3 is due)
January 1, 2023: Welcome to ECS 165A. Kindly note that the lecture starts on January 10, and the discussion starts on the week of January 16. Looking forward to an amazing quarter.
Grades will be made available on your Canvas account.