Reading Papers

Database Systems Research Paper Reading List

This lecture curriculum is of CMU DB Intro Course by Andy Pavlo. You can find more details on the course here

Lecture #	Lecture Name	Research Papers (Unique Links)
00	Course Overview & Logistics	• A Relational Model of Data for Large Shared Data Banks • Architecture of a Database System • What Goes Around Comes Around • Reflections on the Future of Database Systems
01	Relational Model & Algebra	• A Relational Model of Data for Large Shared Data Banks • Relational Algebra and Relational Calculus • Derivability, Redundancy, and Consistency of Relations • The Transaction Concept: Virtues and Limitations • A Critique of the SQL Database Language • The Third Manifesto • Foundations of Databases • Access Path Selection in a Relational Database Management System
02	Modern SQL	• SQL:1999, Formerly Known as SQL3 • Advanced SQL:1999 • SQL:2011 - Under the Covers • Pivoted Knowledge • A Decomposition Storage Model • SQL:1999, SQL:2003, SQL:2006 • Temporal Support in SQL:2011 • Efficiently Compiling Efficient Query Plans • Spark SQL: Relational Data Processing in Spark • CockroachDB: The Resilient Geo-Distributed SQL Database
03	Database Storage I	• The Design and Implementation of Modern Column-Oriented Database Systems • C-Store: A Column-oriented DBMS • Design Tradeoffs for SSD Performance • A Five-Minute Rule for Modern Storage Media • Database Architecture Optimized for Memory Access • ARIES: A Transaction Recovery Method • The Design and Implementation of a Log-Structured File System • LevelDB Documentation • LMDB: Memory-Mapped Key-Value Storage
04	Database Storage II	• ARIES: Fine-Granularity Locking and Partial Rollbacks • The Log-Structured Merge-Tree (LSM-Tree) • WiscKey: Separating Keys from Values in SSD-Conscious Storage • A Comparison of Approaches to Large-Scale Data Analysis • Efficiently Compiling Efficient Query Plans • The 5 Minute Rule for Trading Memory for Disk Accesses • The DBMIN Storage Manager • LeanStore: In-Memory Data Management • The Bw-Tree: A B-tree for New Hardware Platforms
05	Storage Models & Compression	• Column-Stores vs. Row-Stores • Integrating Compression and Execution in Column-Oriented Databases • Data Blocks: Hybrid OLTP and OLAP • MonetDB/X100: Hyper-Pipelining Query Execution • Database Cracking: Fancy Scan, not Poor Man's Sort! • ZStandard: Fast and Efficient Compression • Amazon Redshift and Simpler Data Warehouses
06	Memory Management	• Buffer Management in Database Systems • The Five-Minute Rule Ten Years Later • Managing Non-Volatile Memory • Main Memory Hash Join Algorithms • Memory Management for High-Performance Applications • C-ARF: Clock-Based Adaptive Replacement Filter • Optimizing In-Memory Databases for AI Workloads
07	Hash Tables	• Extendible Hashing—A Fast Access Method • Linear Hashing: A New Tool for File Addressing • Dynamic Hash Tables • Cache-Conscious Collision Resolution • Cuckoo Hashing • Hopscotch Hashing • Robin Hood Hashing
08	Indexes & Filters I	• Ubiquitous B-Tree • R-Trees: A Dynamic Index Structure • The R*-Tree: Efficient and Robust Access • Bloom Filters in Probabilistic Verification • Cache-Conscious Indexing • Cuckoo Filter: Practically Better Than Bloom
09	Indexes & Filters II	• The Adaptive Radix Tree (ART) • FAST: Fast Architecture Sensitive Tree Search • Bitmap Index Design and Evaluation • Less is More: Lightweight Filter Design • The Bw-Tree: A B-tree for New Hardware • The Log-Structured Merge-Tree (LSM-Tree) • TiDB: A Raft-Based HTAP Database
10	Index Concurrency Control	• Optimistic Methods for Concurrency Control • Generalized Isolation Level Definitions • Granularity of Locks and Degrees of Consistency • A Case for Fractured Mirrors • Implementing Data Cubes Efficiently • Efficient Locking for B-Trees • MassTree: A Cache-Conscious Concurrent B+-Tree
11	Sorting & Aggregation Algorithms	• AlphaSort: A RISC Machine Sort • Efficient External Sorting • Implementing Sorting in Database Systems • Efficient Aggregation for Graph Summarization • Scalable Progressive Analytics • HyperLogLog in Practice
12	Joins Algorithms	• Hash Joins and Hash Teams in SQL Server • Main-Memory Hash Joins on Multi-Core CPUs • Sort vs. Hash Revisited • A Comparison of Adaptive Radix Trees and Hash Tables • Improving Hash Joins on Intel Xeon Phi • Grace Hash Join: A Hybrid Hash Join • The Radix-Clustered Join Algorithm
13	Query Execution I	• Volcano - An Extensible Query Evaluation System • Morsel-Driven Parallelism • Vectorwise: Beyond Column Stores • Adaptive Execution of Compiled Queries • Everything About Compiled and Vectorized Queries • Morsel-Driven Parallelism (NUMA-Aware)
14	Query Execution II	• How to Architect a Query Compiler • Relaxed Operator Fusion • Adaptive Query Processing: Technology in Evolution • Dremel: Interactive Analysis of Web-Scale Datasets • ClickHouse: High-Performance Column-Oriented DB • Vectorized Execution for Query Engines
15	Query Planning & Optimization	• Access Path Selection in a Relational DBMS • The Volcano Optimizer Generator • The Cascades Framework • An Overview of Query Optimization • Orca: A Modular Query Optimizer • Eddy: Continuously Adaptive Query Processing
16	Concurrency Control Theory	• A Critique of ANSI SQL Isolation Levels • Concurrency Control in Distributed Databases • A Majority Consensus Approach • Serializable Isolation for Snapshot Databases • Granularity of Locks and Degrees of Consistency
17	Two-Phase Locking Concurrency Control	• An Algorithm for Concurrency Control in Distributed DB • Distributed Deadlock Detection • Hierarchical Locking in B-tree Indexes • Strict 2PL: A High-Performance Locking Protocol • Locking in Databases: A Survey
18	Timestamp Ordering Concurrency Control	• Distributed Timestamp Order Concurrency Control • Timestamp-Based Protocols for Distributed DB • Timestamp-Based Concurrency Control for Heterogeneous DB • Timestamp-Based Two-Phase Locking • An Implementation of Causal Ordering • Silo: Speeding Up In-Memory Databases with Hardware Transactions
19	Multi-Version Concurrency Control	• High-Performance Concurrency Control for Main-Memory DB • Serializable Snapshot Isolation in PostgreSQL • An Empirical Evaluation of MVCC • Cicada: Dependably Fast Multi-Core Transactions • Hyder: A Transactional Record Manager for Shared Flash
20	Database Logging	• Lightweight Locking for Main Memory DB • Scalable Logging with Non-Volatile Memory • Fast Databases with Fast Durability • The End of a Myth: Distributed Transactions Can Scale • Scalable Atomic Visibility with RAMP Transactions • Kafka: A Distributed Messaging System
21	Database Recovery	• Recovering With Limited Overlap • Crash Recovery in a Distributed System • Constant Time Recovery in Azure SQL • NVM Write Allocation • Fast Recovery in In-Memory Databases • Non-Volatile Memory DBMS
22	Introduction to Distributed Databases	• The Case for Determinism in DB Systems • Bigtable: A Distributed Storage System • Dynamo: Amazon’s Highly Available Key-Value Store • CAP Twelve Years Later • Spanner: Google’s Globally-Distributed DB
23	Distributed OLTP Database Systems	• Calvin: Fast Distributed Transactions • Amazon Aurora: Cloud-Native Relational DB • F1: A Distributed SQL Database • Orleans: Distributed Virtual Actors • SLOG: Serializable, Low-Latency Geo-Replicated Transactions
24	Distributed OLAP Database Systems	• Presto: SQL on Everything • Spark SQL: Relational Data Processing • SnowFlake: A Query Workspace • Apache Druid: Real-Time Analytics DB
25	Final Review + Systems Potpourri	• What's Really New with NewSQL? • The Predictive Database • The End of an Architectural Era • Retrospection on DB System Performance • Research for Practice: Prediction-Serving Systems • The Case for Learned Index Structures • Designing Data-Intensive Applications

📊 Analytics

View Dashboard →

📈 This page is being tracked with GoatCounter. Click here to view detailed analytics including page views, referrers, and visitor statistics.

Show Public Statistics

Live view counter for this page