Introduction to Delta Architecture

Sep 25, 2019 min read

In my previous blogs I introduced Kappa and Lambda Architectures. These are big data architectures designed to support massive amounts of data both in real time and at rest. The key difference between those two architectures is the presence of a data lake/data hub to consolidate all data in one place. Lambda architecture seems more practical as it uses cheaper storage media for long-term batch processing.

However, Lambda architecture uses HDFS as the data lake, and the key concept of a data lake is immutability. This causes overheads in data processing in the batch layer — any transformation on the batch layer potentially leads to recreating the data structures.

What is Delta Architecture?

Delta architecture assumes that any new streaming records are processed like delta (incremental) records and are not processed as new records. Conceptually this architecture pattern is similar to Lambda as it is based on speed and hot path. The one big difference is that delta architecture no longer considers the data lake as immutable — any batch transformation can update the existing data structures in the data lake (process delta records). This capability makes it easier for the cold path to be processed.

The Key Benefit

This bridges the gap between the batch and streaming layer and unifies them for seamless processing and lesser overheads. Organizations no longer have to look at data processing in silos, and don’t have to treat data differently based on the speed of ingestion and processing.

Delta Processing

Representative Delta Architecture

Delta in literal sense is also used to denote incremental change (as in Delta in Mathematics). Incremental change in data processing comes from any new data created/updated/streamed since the last processing time. The data inserts/updates can be merged with existing data in the data layer and the file system files can be updated.

The file systems now support CRUD (Create/Read/Update/Delete) operations with available technology and the file system can be made ACID compliant. At the time of writing, the delta technology was supported in Databricks Delta.