Big Data

How is Data Governance (DG) Different in the Digital World?
How is Data Governance (DG) Different in the Digital World?

The need for Data Governance has been established — it has become one of the key initiatives organizations are focusing on when it comes to managing data. This blog talks about the …

Greek Architectures of Data Processing
Greek Architectures of Data Processing

Greeks are famous for many things. But the first thing that comes to mind when we hear “Greek” is Greek Architecture and Greek Alphabets (alpha, beta, gamma…). …

Introduction to Delta Architecture
Introduction to Delta Architecture

In my previous blogs I introduced Kappa and Lambda Architectures. These are big data architectures designed to support massive amounts of data both in real time and at rest. The …

Lambda Architecture Using Databricks
Lambda Architecture Using Databricks

For details about what Lambda architecture is, read Introduction to Lambda Architecture. From a technology point of view, Databricks is becoming the new normal in data processing …

Kappa Architecture – Another Way of Data Processing
Kappa Architecture – Another Way of Data Processing

Kappa architecture was proposed by Jay Kreps (co-creator of Apache Kafka) as a simplification of the Lambda architecture. The core idea: remove the batch layer entirely and treat …

Introduction to Lambda Architecture
Introduction to Lambda Architecture

Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. The …

How to Structure the Data Lake
How to Structure the Data Lake

A data lake is a framework, concept, and guidance on where to place data (Microsoft named their product Azure Data Lake, but the concept is broader). From a technology point of …

5 Things to Get the Best Out of Azure Databricks
5 Things to Get the Best Out of Azure Databricks

Most of the customers I talk to are directly or indirectly asking to scale their workloads and use Databricks. It has become the new normal in data processing in cloud. If you are …

Introduction to Distributed Computing
Introduction to Distributed Computing

Distributed computing technology enables the compute load to be spread, or distributed, across multiple nodes (computers) connected via a network. The networked machines share the …