Understanding Database Indexing And Concurrent Operations
Introduction
Database indexing is a fundamental concept that significantly impacts application performance. However, creating indexes on production databases presents unique challenges, especially when dealing with concurrent operations. In this comprehensive guide, we’ll explore the intricacies of index creation, the challenges of concurrent modifications, and modern solutions to these challenges.
Table of Contents
- Understanding Basic Index Creation
- The Challenge of Concurrent Operations
- Standard CREATE INDEX vs. CREATE INDEX CONCURRENTLY
- Implementation Deep Dive
- Practical Examples with SQL
- Best Practices and Production Considerations
- Performance Implications
- Advanced Concepts and Edge Cases
1. Understanding Basic Index Creation
What is an Index?
An index in a database is essentially a data structure (typically a B+ tree) that provides quick access to rows in a table based on the values in specific columns. Think of it as the index in a book – instead of reading every page to find a topic, you can quickly jump to the relevant section.
Basic Structure of a B+ Tree Index
A B+ tree index consists of:
- Internal nodes containing keys for navigation
- Leaf nodes containing the indexed values and row pointers
- Balanced structure ensuring consistent performance
The Basic Creation Process
When creating a basic index, the database must:
- Read all relevant data from the table
- Sort the values
- Build the B+ tree structure
- Store row pointers (usually ROWIDs or TIDs)
- Update the system catalog
sequenceDiagram
participant U as User Operations
participant D as Database
participant I as Index Creation Process
participant L as WAL (Write-Ahead Log)
Note over I: Standard CREATE INDEX
I->>D: Acquire Exclusive Lock
Note over D: Blocks all writes
I->>D: Read Table Data
I->>I: Build B+ Tree
I->>D: Update Catalog
I->>D: Release Lock
Note over I: CREATE INDEX CONCURRENTLY
I->>L: Record Initial LSN
I->>D: Read Initial Table Snapshot
par Concurrent Operations
U->>D: Continue Writing
D->>L: Log Changes
and Index Building
I->>I: Build Initial B+ Tree
end
I->>L: Get New LSN
I->>L: Read Change Log (Initial LSN to New LSN)
I->>I: Apply Changes to Index
loop Until No New Changes
I->>L: Check New Changes
alt New Changes Exist
I->>I: Apply Delta Changes
else No Changes
I->>D: Finalize Index
I->>D: Update Catalog
end
end
2. The Challenge of Concurrent Operations
The Fundamental Problem
The primary challenge with index creation comes from the dynamic nature of database operations. While building an index, the underlying data might change due to:
- INSERT operations adding new rows
- UPDATE operations modifying existing values
- DELETE operations removing rows
This creates several potential issues:
- Missing new rows added during index creation
- Indexing modified values that have changed
- Including deleted rows that should no longer exist
Data Consistency Requirements
For an index to be useful, it must maintain these properties:
- Completeness: All existing rows must be indexed
- Accuracy: Index entries must match actual data
- Consistency: No missing or duplicate entries
3. Standard CREATE INDEX vs. CREATE INDEX CONCURRENTLY
Standard CREATE INDEX
The traditional approach uses exclusive locks:
CREATE INDEX idx_employee_salary ON employees(salary);
This operation:
- Blocks all write operations (INSERT, UPDATE, DELETE)
- Allows read operations to continue
- Provides guaranteed consistency
- Completes faster than concurrent creation
CREATE INDEX CONCURRENTLY
The modern approach allows concurrent modifications:
CREATE INDEX CONCURRENTLY idx_employee_salary ON employees(salary);
Key differences:
- Allows both read and write operations
- Takes approximately twice as long
- Uses a more complex algorithm
- Requires additional system resources
4. Implementation Deep Dive
Standard Implementation
The traditional index creation process:
-- Example table
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
salary NUMERIC(10,2),
department VARCHAR(50)
);
-- Standard index creation (blocks writes)
CREATE INDEX idx_salary ON employees(salary);
During this operation:
- The database acquires an exclusive lock
- Reads begin from the table
- The B+ tree is constructed
- The catalog is updated
- The lock is released
Concurrent Implementation
The concurrent index creation process is more complex:
-- Concurrent index creation (allows writes)
CREATE INDEX CONCURRENTLY idx_salary_concurrent ON employees(salary);
The process involves:
- Recording the initial Write-Ahead Log (WAL) position
- Building initial index structure
- Tracking changes during construction
- Applying changes iteratively
- Finalizing when caught up
5. Practical Examples with SQL
Let’s explore this with a complete example:
-- Create a sample table
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
salary NUMERIC(10,2),
department VARCHAR(50)
);
-- Insert sample data
INSERT INTO employees (name, salary, department)
SELECT
'Employee ' || i,
50000 + (random() * 50000)::numeric(10,2),
CASE (i % 4)
WHEN 0 THEN 'Engineering'
WHEN 1 THEN 'Sales'
WHEN 2 THEN 'Marketing'
WHEN 3 THEN 'HR'
END
FROM generate_series(1, 100000) i;
-- Create an index concurrently
CREATE INDEX CONCURRENTLY idx_salary_dept
ON employees(salary, department);
-- Meanwhile, in another session, these operations can run:
INSERT INTO employees (name, salary, department)
VALUES ('New Employee', 75000, 'Engineering');
UPDATE employees
SET salary = salary * 1.1
WHERE department = 'Sales';
DELETE FROM employees
WHERE salary < 55000;
-- Check index usage
EXPLAIN ANALYZE
SELECT * FROM employees
WHERE salary BETWEEN 60000 AND 70000
AND department = 'Engineering';
Expected output and behavior:
-- Before index creation (sequential scan):
Seq Scan on employees
Filter: (salary >= 60000 AND salary <= 70000 AND department = 'Engineering')
Rows Removed by Filter: 95234
Actual time: 15.432..25.678 ms
-- After index creation (index scan):
Index Scan using idx_salary_dept on employees
Index Cond: (salary >= 60000 AND salary <= 70000 AND department = 'Engineering')
Actual time: 0.543..1.234 ms
6. Best Practices and Production Considerations
When to Use Concurrent Index Creation
Prefer CREATE INDEX CONCURRENTLY when:
- Working on production systems
- Dealing with active OLTP workloads
- Long-running index creation is expected
- System downtime isn’t acceptable
Monitoring Progress
Track index creation progress using system views:
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(schemaname|| '.' ||indexname::text)) as index_size,
idx_scan as number_of_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN pg_indexes i ON c.relname=i.tablename
LEFT OUTER JOIN pg_stat_all_indexes psai ON i.indexname=psai.indexrelname
WHERE schemaname='public' AND indexname = 'idx_salary_dept';
7. Performance Implications
Resource Usage
Concurrent index creation requires:
- Additional disk I/O for WAL reading
- More memory for tracking changes
- Extended CPU usage for multiple passes
- Approximately 2x the time of standard creation
Impact on Other Operations
While allowing concurrent operations, be aware of:
- Slightly reduced write performance
- Increased WAL generation
- Higher overall system load
- Potential for temporary space usage
8. Advanced Concepts and Edge Cases
Handling Failures
If concurrent index creation fails:
-- Check for invalid indexes
SELECT schemaname, tablename, indexname,
pg_get_indexdef(i.indexrelid) as index_definition
FROM pg_indexes i
JOIN pg_class c ON i.indexname = c.relname
WHERE NOT pg_index.indisvalid;
-- Clean up invalid index
DROP INDEX CONCURRENTLY if_exists invalid_index_name;
Partial Indexes with Concurrent Creation
Creating conditional indexes concurrently:
CREATE INDEX CONCURRENTLY idx_high_salary
ON employees(salary)
WHERE salary > 100000;
Conclusion
Understanding the intricacies of database indexing and concurrent operations is crucial for maintaining high-performance database systems. While standard index creation provides simplicity and speed, concurrent index creation offers the flexibility needed in modern, always-on applications. The choice between these approaches depends on your specific requirements for system availability, performance, and resource utilization.
Remember:
- Always test index creation on staging environments first
- Monitor system resources during index creation
- Have a rollback plan ready
- Consider maintenance windows for standard index creation
- Use concurrent index creation for production systems when possible
By following these guidelines and understanding the underlying mechanisms, you can effectively manage database indexes while maintaining system availability and performance.