Skip to content

Reading Papers

Database Systems Research Paper Reading List

This lecture curriculum is of CMU DB Intro Course by Andy Pavlo. You can find more details on the course here

Lecture # Lecture Name Research Papers (Unique Links)
00 Course Overview & Logistics A Relational Model of Data for Large Shared Data Banks
Architecture of a Database System
What Goes Around Comes Around
Reflections on the Future of Database Systems
01 Relational Model & Algebra A Relational Model of Data for Large Shared Data Banks
Relational Algebra and Relational Calculus
Derivability, Redundancy, and Consistency of Relations
The Transaction Concept: Virtues and Limitations
A Critique of the SQL Database Language
The Third Manifesto
Foundations of Databases
Access Path Selection in a Relational Database Management System
02 Modern SQL SQL:1999, Formerly Known as SQL3
Advanced SQL:1999
SQL:2011 - Under the Covers
Pivoted Knowledge
A Decomposition Storage Model
SQL:1999, SQL:2003, SQL:2006
Temporal Support in SQL:2011
Efficiently Compiling Efficient Query Plans
Spark SQL: Relational Data Processing in Spark
CockroachDB: The Resilient Geo-Distributed SQL Database
03 Database Storage I The Design and Implementation of Modern Column-Oriented Database Systems
C-Store: A Column-oriented DBMS
Design Tradeoffs for SSD Performance
A Five-Minute Rule for Modern Storage Media
Database Architecture Optimized for Memory Access
ARIES: A Transaction Recovery Method
The Design and Implementation of a Log-Structured File System
LevelDB Documentation
LMDB: Memory-Mapped Key-Value Storage
04 Database Storage II ARIES: Fine-Granularity Locking and Partial Rollbacks
The Log-Structured Merge-Tree (LSM-Tree)
WiscKey: Separating Keys from Values in SSD-Conscious Storage
A Comparison of Approaches to Large-Scale Data Analysis
Efficiently Compiling Efficient Query Plans
The 5 Minute Rule for Trading Memory for Disk Accesses
The DBMIN Storage Manager
LeanStore: In-Memory Data Management
The Bw-Tree: A B-tree for New Hardware Platforms
05 Storage Models & Compression Column-Stores vs. Row-Stores
Integrating Compression and Execution in Column-Oriented Databases
Data Blocks: Hybrid OLTP and OLAP
MonetDB/X100: Hyper-Pipelining Query Execution
Database Cracking: Fancy Scan, not Poor Man's Sort!
ZStandard: Fast and Efficient Compression
Amazon Redshift and Simpler Data Warehouses
06 Memory Management Buffer Management in Database Systems
The Five-Minute Rule Ten Years Later
Managing Non-Volatile Memory
Main Memory Hash Join Algorithms
Memory Management for High-Performance Applications
C-ARF: Clock-Based Adaptive Replacement Filter
Optimizing In-Memory Databases for AI Workloads
07 Hash Tables Extendible Hashing—A Fast Access Method
Linear Hashing: A New Tool for File Addressing
Dynamic Hash Tables
Cache-Conscious Collision Resolution
Cuckoo Hashing
Hopscotch Hashing
Robin Hood Hashing
08 Indexes & Filters I Ubiquitous B-Tree
R-Trees: A Dynamic Index Structure
The R*-Tree: Efficient and Robust Access
Bloom Filters in Probabilistic Verification
Cache-Conscious Indexing
Cuckoo Filter: Practically Better Than Bloom
09 Indexes & Filters II The Adaptive Radix Tree (ART)
FAST: Fast Architecture Sensitive Tree Search
Bitmap Index Design and Evaluation
Less is More: Lightweight Filter Design
The Bw-Tree: A B-tree for New Hardware
The Log-Structured Merge-Tree (LSM-Tree)
TiDB: A Raft-Based HTAP Database
10 Index Concurrency Control Optimistic Methods for Concurrency Control
Generalized Isolation Level Definitions
Granularity of Locks and Degrees of Consistency
A Case for Fractured Mirrors
Implementing Data Cubes Efficiently
Efficient Locking for B-Trees
MassTree: A Cache-Conscious Concurrent B+-Tree
11 Sorting & Aggregation Algorithms AlphaSort: A RISC Machine Sort
Efficient External Sorting
Implementing Sorting in Database Systems
Efficient Aggregation for Graph Summarization
Scalable Progressive Analytics
HyperLogLog in Practice
12 Joins Algorithms Hash Joins and Hash Teams in SQL Server
Main-Memory Hash Joins on Multi-Core CPUs
Sort vs. Hash Revisited
A Comparison of Adaptive Radix Trees and Hash Tables
Improving Hash Joins on Intel Xeon Phi
Grace Hash Join: A Hybrid Hash Join
The Radix-Clustered Join Algorithm
13 Query Execution I Volcano - An Extensible Query Evaluation System
Morsel-Driven Parallelism
Vectorwise: Beyond Column Stores
Adaptive Execution of Compiled Queries
Everything About Compiled and Vectorized Queries
Morsel-Driven Parallelism (NUMA-Aware)
14 Query Execution II How to Architect a Query Compiler
Relaxed Operator Fusion
Adaptive Query Processing: Technology in Evolution
Dremel: Interactive Analysis of Web-Scale Datasets
ClickHouse: High-Performance Column-Oriented DB
Vectorized Execution for Query Engines
15 Query Planning & Optimization Access Path Selection in a Relational DBMS
The Volcano Optimizer Generator
The Cascades Framework
An Overview of Query Optimization
Orca: A Modular Query Optimizer
Eddy: Continuously Adaptive Query Processing
16 Concurrency Control Theory A Critique of ANSI SQL Isolation Levels
Concurrency Control in Distributed Databases
A Majority Consensus Approach
Serializable Isolation for Snapshot Databases
Granularity of Locks and Degrees of Consistency
17 Two-Phase Locking Concurrency Control An Algorithm for Concurrency Control in Distributed DB
Distributed Deadlock Detection
Hierarchical Locking in B-tree Indexes
Strict 2PL: A High-Performance Locking Protocol
Locking in Databases: A Survey
18 Timestamp Ordering Concurrency Control Distributed Timestamp Order Concurrency Control
Timestamp-Based Protocols for Distributed DB
Timestamp-Based Concurrency Control for Heterogeneous DB
Timestamp-Based Two-Phase Locking
An Implementation of Causal Ordering
Silo: Speeding Up In-Memory Databases with Hardware Transactions
19 Multi-Version Concurrency Control High-Performance Concurrency Control for Main-Memory DB
Serializable Snapshot Isolation in PostgreSQL
An Empirical Evaluation of MVCC
Cicada: Dependably Fast Multi-Core Transactions
Hyder: A Transactional Record Manager for Shared Flash
20 Database Logging Lightweight Locking for Main Memory DB
Scalable Logging with Non-Volatile Memory
Fast Databases with Fast Durability
The End of a Myth: Distributed Transactions Can Scale
Scalable Atomic Visibility with RAMP Transactions
Kafka: A Distributed Messaging System
21 Database Recovery Recovering With Limited Overlap
Crash Recovery in a Distributed System
Constant Time Recovery in Azure SQL
NVM Write Allocation
Fast Recovery in In-Memory Databases
Non-Volatile Memory DBMS
22 Introduction to Distributed Databases The Case for Determinism in DB Systems
Bigtable: A Distributed Storage System
Dynamo: Amazon’s Highly Available Key-Value Store
CAP Twelve Years Later
Spanner: Google’s Globally-Distributed DB
23 Distributed OLTP Database Systems Calvin: Fast Distributed Transactions
Amazon Aurora: Cloud-Native Relational DB
F1: A Distributed SQL Database
Orleans: Distributed Virtual Actors
SLOG: Serializable, Low-Latency Geo-Replicated Transactions
24 Distributed OLAP Database Systems Presto: SQL on Everything
Spark SQL: Relational Data Processing
SnowFlake: A Query Workspace
Apache Druid: Real-Time Analytics DB
25 Final Review + Systems Potpourri What's Really New with NewSQL?
The Predictive Database
The End of an Architectural Era
Retrospection on DB System Performance
Research for Practice: Prediction-Serving Systems
The Case for Learned Index Structures
Designing Data-Intensive Applications