Data Engineering

Greek Architectures of Data Processing
Greek Architectures of …

Greeks are famous for many things. But the first thing that comes to mind when we hear “Greek” is Greek Architecture and Greek Alphabets (alpha, beta, gamma…).

Greek Architecture is the flowering of geometry.

What happens when we merge Greek alphabets with Greek architecture? We …

Processing Real-Time Streams in Databricks – Part 2
Processing Real-Time …

The first part of this post can be found at Processing Real-Time Streams in Databricks – Part 1.

This is the continuation of Part 1 — we won’t repeat the architecture and setup. In this section we go into Azure Databricks and create the cluster and notebook to ingest data in real-time, …

Processing Real-Time Streams in Databricks – Part 1
Processing Real-Time …

Databricks is becoming the new normal in data processing technologies in cloud, both Azure and AWS. This is a step-by-step guide to get started on real-time (streaming) analytics using Spark Streaming on Databricks.

Architecture

The demo was built to show the speed layer (hot path) of a typical …

Introduction to Delta Architecture
Introduction to Delta …

In my previous blogs I introduced Kappa and Lambda Architectures. These are big data architectures designed to support massive amounts of data both in real time and at rest. The key difference between those two architectures is the presence of a data lake/data hub to consolidate all data in one …

Graph Databases for Enterprises
Graph Databases for …

We have all heard about SQL databases and have used SQL Server, Oracle, MySQL, etc. for storing or modeling data. The SQL family of databases stores data in relational, tabular structures. With the advent of new channels of data generation and consumption, SQL databases sometimes don’t provide …

Lambda Architecture Using Databricks
Lambda Architecture Using …

For details about what Lambda architecture is, read Introduction to Lambda Architecture.

From a technology point of view, Databricks is becoming the new normal in data processing technologies in both Azure and AWS. This post provides a view of lambda architecture with Databricks at front and center. …

Kappa Architecture – Another Way of Data Processing
Kappa Architecture – …

Kappa architecture was proposed by Jay Kreps (co-creator of Apache Kafka) as a simplification of the Lambda architecture. The core idea: remove the batch layer entirely and treat everything as a stream.

Kappa Architecture Diagram

The Core Concept

In Lambda architecture, you maintain two separate processing paths — batch and …

Introduction to Lambda Architecture
Introduction to Lambda …

Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. The architecture was introduced by Nathan Marz and is based on three layers: the Batch Layer, the Speed Layer, and the Serving Layer. …

How to Structure the Data Lake
How to Structure the Data …

A data lake is a framework, concept, and guidance on where to place data (Microsoft named their product Azure Data Lake, but the concept is broader). From a technology point of view, it suggests storing all data in object-oriented or hierarchical storage. This is the concept of data locality — data …