Databricks

Processing Real-Time Streams in Databricks – Part 2
Processing Real-Time …

The first part of this post can be found at Processing Real-Time Streams in Databricks – Part 1.

This is the continuation of Part 1 — we won’t repeat the architecture and setup. In this section we go into Azure Databricks and create the cluster and notebook to ingest data in real-time, …

Processing Real-Time Streams in Databricks – Part 1
Processing Real-Time …

Databricks is becoming the new normal in data processing technologies in cloud, both Azure and AWS. This is a step-by-step guide to get started on real-time (streaming) analytics using Spark Streaming on Databricks.

Architecture

The demo was built to show the speed layer (hot path) of a typical …

Introduction to Delta Architecture
Introduction to Delta …

In my previous blogs I introduced Kappa and Lambda Architectures. These are big data architectures designed to support massive amounts of data both in real time and at rest. The key difference between those two architectures is the presence of a data lake/data hub to consolidate all data in one …

Lambda Architecture Using Databricks
Lambda Architecture Using …

For details about what Lambda architecture is, read Introduction to Lambda Architecture.

From a technology point of view, Databricks is becoming the new normal in data processing technologies in both Azure and AWS. This post provides a view of lambda architecture with Databricks at front and center. …

5 Things to Get the Best Out of Azure Databricks
5 Things to Get the Best …

Most of the customers I talk to are directly or indirectly asking to scale their workloads and use Databricks. It has become the new normal in data processing in cloud. If you are using or plan to use Azure Databricks, this post will guide you on some interesting things to investigate as you start. …

Introduction to Distributed Computing
Introduction to …

Distributed computing technology enables the compute load to be spread, or distributed, across multiple nodes (computers) connected via a network. The networked machines share the same goal and share the compute load to effectively collaborate and provide the resources to obtain that goal.

Early …