This is an innovative data management course that provides an introduction
to the design and development of fundamental concepts in relational database
management systems (DBMS). You will learn the theory and design behind database
systems, the issues that affect their functionality and performance, and
most importantly, what it takes to effectively utilize modern databases in practice.
The course has completely been redesigned, all students are expected to work
in a group of five on an exciting, open-ended, data-oriented, quarter-long project,
in a sense, operating and simulating startup environments. Needless to say,
this rewarding experience is accompanied by a significant development effort (in Python)
that spans hands-on experience on concepts such as memory and disk management,
synchronization and concurrency, logging and recovery, and query optimization
and evaluation, to name a few. To materialize these objectives, together this quarter,
we will be building simplified L-Store
[Paper, Slides]
from scratch, an Hybrid Transactional and Analytical Processing (HTAP) database.
The course work is complementary to the classical well-formed, prescriptive
model of assignments/projects that are indeed effective and invaluable in practice.
Instead by design, the project is intended to be open-ended, namely, minimal
instructions and requirements will be provided, as such it rewards and values
research & development, taking risks, above all, it is aimed to foster and
tap into the creativity of each individual.
The quarter-long project is broken into a set of three milestones, primarily all
milestones will be graded orally (additionally we may employ autograder), where
the progress is presented by all five group members, and each group member must be
ready to answer questions about any aspect of the project; the latter is the utmost
importance to ensure comprehensive learning experience and fair division of work
among all members. Furthermore, in each milestone, a bonus of up to 20% can be
gained to further encourage taking a risk, going the extra mile, and to just be
curious & creative. Part of the bonus is reserved for fastest and the
most optimized implementation of L-Store in class, e.g., how many read/write
operations per second (adjusted based on the number of cores, CPU clock frequency,
amount of memory, cache size, and other hardware metrics to ensure comparable
results).
A fact of life, when there is group work, whether at school or in society,
there are occasional conflicts; and it is crucial to learn how to
resolve our differences and be receptive, open, and kind to one another. In
kindness and reflection, we shall aim to resolve all conflicts. It is the group
responsibility to handle all internal affairs, and only when absolutely necessary
involving the instructor. But note, only under very rare exceptional circumstances,
a group re-structuring would be granted because once the group is formed, at least
for 10 weeks, we must learn how to work with each other in harmony.
For each group, it is recommended that each member lead one aspect of the project
while contributing and learning about other parts; roughly, the main components
are
(1) memory management (e.g., bufferpool),
(2) disk management (e.g., persistence and logging),
(3) in-memory indexing (e.g., hashing or tree),
(4) data access methods (e.g., APIs and query language),
(5) multi-threading and synchronization (e.g., data structures latching),
(6) transaction and concurrency (e.g., record-level locking), and
(7) testing and benchmarking (correctness verification and performance measurements).
As for the lectures, the list of topics covered would include but not limited to:
-
DBMS Concepts and Architecture
-
Storage and Indexing
-
Query Languages (Relational Algebra and SQL)
-
Query Evaluation and Optimization
-
Concurrency Control and Recovery
-
Database Design, the E-R Model, Normalization, and Tuning
-
Database Security, Blockchain