Big Data
How is Data Governance …
The need for Data Governance has been established — it has become one of the key initiatives organizations are focusing on when it comes to managing data. This blog talks about the differences in Data Governance in the digital era compared to traditional Data Governance practices.
DG Process and …
Greek Architectures of …
Greeks are famous for many things. But the first thing that comes to mind when we hear “Greek” is Greek Architecture and Greek Alphabets (alpha, beta, gamma…).
Greek Architecture is the flowering of geometry.
What happens when we merge Greek alphabets with Greek architecture? We …
Lambda Architecture Using …
For details about what Lambda architecture is, read Introduction to Lambda Architecture.
From a technology point of view, Databricks is becoming the new normal in data processing technologies in both Azure and AWS. This post provides a view of lambda architecture with Databricks at front and center. …
Kappa Architecture – …
Kappa architecture was proposed by Jay Kreps (co-creator of Apache Kafka) as a simplification of the Lambda architecture. The core idea: remove the batch layer entirely and treat everything as a stream.

The Core Concept
In Lambda architecture, you maintain two separate processing paths — batch and …
Introduction to Lambda …
Lambda architecture is a data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. The architecture was introduced by Nathan Marz and is based on three layers: the Batch Layer, the Speed Layer, and the Serving Layer. …
How to Structure the Data …
A data lake is a framework, concept, and guidance on where to place data (Microsoft named their product Azure Data Lake, but the concept is broader). From a technology point of view, it suggests storing all data in object-oriented or hierarchical storage. This is the concept of data locality — data …
5 Things to Get the Best …
Most of the customers I talk to are directly or indirectly asking to scale their workloads and use Databricks. It has become the new normal in data processing in cloud. If you are using or plan to use Azure Databricks, this post will guide you on some interesting things to investigate as you start. …
Introduction to …
Distributed computing technology enables the compute load to be spread, or distributed, across multiple nodes (computers) connected via a network. The networked machines share the same goal and share the compute load to effectively collaborate and provide the resources to obtain that goal.
Early …








