00 |
Course Overview & Logistics |
• A Relational Model of Data for Large Shared Data Banks • Architecture of a Database System • What Goes Around Comes Around • Reflections on the Future of Database Systems |
01 |
Relational Model & Algebra |
• A Relational Model of Data for Large Shared Data Banks • Relational Algebra and Relational Calculus • Derivability, Redundancy, and Consistency of Relations • The Transaction Concept: Virtues and Limitations • A Critique of the SQL Database Language • The Third Manifesto • Foundations of Databases • Access Path Selection in a Relational Database Management System |
02 |
Modern SQL |
• SQL:1999, Formerly Known as SQL3 • Advanced SQL:1999 • SQL:2011 - Under the Covers • Pivoted Knowledge • A Decomposition Storage Model • SQL:1999, SQL:2003, SQL:2006 • Temporal Support in SQL:2011 • Efficiently Compiling Efficient Query Plans • Spark SQL: Relational Data Processing in Spark • CockroachDB: The Resilient Geo-Distributed SQL Database |
03 |
Database Storage I |
• The Design and Implementation of Modern Column-Oriented Database Systems • C-Store: A Column-oriented DBMS • Design Tradeoffs for SSD Performance • A Five-Minute Rule for Modern Storage Media • Database Architecture Optimized for Memory Access • ARIES: A Transaction Recovery Method • The Design and Implementation of a Log-Structured File System • LevelDB Documentation • LMDB: Memory-Mapped Key-Value Storage |
04 |
Database Storage II |
• ARIES: Fine-Granularity Locking and Partial Rollbacks • The Log-Structured Merge-Tree (LSM-Tree) • WiscKey: Separating Keys from Values in SSD-Conscious Storage • A Comparison of Approaches to Large-Scale Data Analysis • Efficiently Compiling Efficient Query Plans • The 5 Minute Rule for Trading Memory for Disk Accesses • The DBMIN Storage Manager • LeanStore: In-Memory Data Management • The Bw-Tree: A B-tree for New Hardware Platforms |
05 |
Storage Models & Compression |
• Column-Stores vs. Row-Stores • Integrating Compression and Execution in Column-Oriented Databases • Data Blocks: Hybrid OLTP and OLAP • MonetDB/X100: Hyper-Pipelining Query Execution • Database Cracking: Fancy Scan, not Poor Man's Sort! • ZStandard: Fast and Efficient Compression • Amazon Redshift and Simpler Data Warehouses |
06 |
Memory Management |
• Buffer Management in Database Systems • The Five-Minute Rule Ten Years Later • Managing Non-Volatile Memory • Main Memory Hash Join Algorithms • Memory Management for High-Performance Applications • C-ARF: Clock-Based Adaptive Replacement Filter • Optimizing In-Memory Databases for AI Workloads |
07 |
Hash Tables |
• Extendible Hashing—A Fast Access Method • Linear Hashing: A New Tool for File Addressing • Dynamic Hash Tables • Cache-Conscious Collision Resolution • Cuckoo Hashing • Hopscotch Hashing • Robin Hood Hashing |
08 |
Indexes & Filters I |
• Ubiquitous B-Tree • R-Trees: A Dynamic Index Structure • The R*-Tree: Efficient and Robust Access • Bloom Filters in Probabilistic Verification • Cache-Conscious Indexing • Cuckoo Filter: Practically Better Than Bloom |
09 |
Indexes & Filters II |
• The Adaptive Radix Tree (ART) • FAST: Fast Architecture Sensitive Tree Search • Bitmap Index Design and Evaluation • Less is More: Lightweight Filter Design • The Bw-Tree: A B-tree for New Hardware • The Log-Structured Merge-Tree (LSM-Tree) • TiDB: A Raft-Based HTAP Database |
10 |
Index Concurrency Control |
• Optimistic Methods for Concurrency Control • Generalized Isolation Level Definitions • Granularity of Locks and Degrees of Consistency • A Case for Fractured Mirrors • Implementing Data Cubes Efficiently • Efficient Locking for B-Trees • MassTree: A Cache-Conscious Concurrent B+-Tree |
11 |
Sorting & Aggregation Algorithms |
• AlphaSort: A RISC Machine Sort • Efficient External Sorting • Implementing Sorting in Database Systems • Efficient Aggregation for Graph Summarization • Scalable Progressive Analytics • HyperLogLog in Practice |
12 |
Joins Algorithms |
• Hash Joins and Hash Teams in SQL Server • Main-Memory Hash Joins on Multi-Core CPUs • Sort vs. Hash Revisited • A Comparison of Adaptive Radix Trees and Hash Tables • Improving Hash Joins on Intel Xeon Phi • Grace Hash Join: A Hybrid Hash Join • The Radix-Clustered Join Algorithm |
13 |
Query Execution I |
• Volcano - An Extensible Query Evaluation System • Morsel-Driven Parallelism • Vectorwise: Beyond Column Stores • Adaptive Execution of Compiled Queries • Everything About Compiled and Vectorized Queries • Morsel-Driven Parallelism (NUMA-Aware) |
14 |
Query Execution II |
• How to Architect a Query Compiler • Relaxed Operator Fusion • Adaptive Query Processing: Technology in Evolution • Dremel: Interactive Analysis of Web-Scale Datasets • ClickHouse: High-Performance Column-Oriented DB • Vectorized Execution for Query Engines |
15 |
Query Planning & Optimization |
• Access Path Selection in a Relational DBMS • The Volcano Optimizer Generator • The Cascades Framework • An Overview of Query Optimization • Orca: A Modular Query Optimizer • Eddy: Continuously Adaptive Query Processing |
16 |
Concurrency Control Theory |
• A Critique of ANSI SQL Isolation Levels • Concurrency Control in Distributed Databases • A Majority Consensus Approach • Serializable Isolation for Snapshot Databases • Granularity of Locks and Degrees of Consistency |
17 |
Two-Phase Locking Concurrency Control |
• An Algorithm for Concurrency Control in Distributed DB • Distributed Deadlock Detection • Hierarchical Locking in B-tree Indexes • Strict 2PL: A High-Performance Locking Protocol • Locking in Databases: A Survey |
18 |
Timestamp Ordering Concurrency Control |
• Distributed Timestamp Order Concurrency Control • Timestamp-Based Protocols for Distributed DB • Timestamp-Based Concurrency Control for Heterogeneous DB • Timestamp-Based Two-Phase Locking • An Implementation of Causal Ordering • Silo: Speeding Up In-Memory Databases with Hardware Transactions |
19 |
Multi-Version Concurrency Control |
• High-Performance Concurrency Control for Main-Memory DB • Serializable Snapshot Isolation in PostgreSQL • An Empirical Evaluation of MVCC • Cicada: Dependably Fast Multi-Core Transactions • Hyder: A Transactional Record Manager for Shared Flash |
20 |
Database Logging |
• Lightweight Locking for Main Memory DB • Scalable Logging with Non-Volatile Memory • Fast Databases with Fast Durability • The End of a Myth: Distributed Transactions Can Scale • Scalable Atomic Visibility with RAMP Transactions • Kafka: A Distributed Messaging System |
21 |
Database Recovery |
• Recovering With Limited Overlap • Crash Recovery in a Distributed System • Constant Time Recovery in Azure SQL • NVM Write Allocation • Fast Recovery in In-Memory Databases • Non-Volatile Memory DBMS |
22 |
Introduction to Distributed Databases |
• The Case for Determinism in DB Systems • Bigtable: A Distributed Storage System • Dynamo: Amazon’s Highly Available Key-Value Store • CAP Twelve Years Later • Spanner: Google’s Globally-Distributed DB |
23 |
Distributed OLTP Database Systems |
• Calvin: Fast Distributed Transactions • Amazon Aurora: Cloud-Native Relational DB • F1: A Distributed SQL Database • Orleans: Distributed Virtual Actors • SLOG: Serializable, Low-Latency Geo-Replicated Transactions |
24 |
Distributed OLAP Database Systems |
• Presto: SQL on Everything • Spark SQL: Relational Data Processing • SnowFlake: A Query Workspace • Apache Druid: Real-Time Analytics DB |
25 |
Final Review + Systems Potpourri |
• What's Really New with NewSQL? • The Predictive Database • The End of an Architectural Era • Retrospection on DB System Performance • Research for Practice: Prediction-Serving Systems • The Case for Learned Index Structures • Designing Data-Intensive Applications |