Concurrency

Processing Real-time streams in Databricks – Part 2

11/07/201905/05/2020
Big Data, Tips

In this section we go Azure Databricks and create the cluster and notebook to ingest the data in real-time and process and visualize the stream

Processing Real-Time Streams in Databricks – Part 1

10/23/201905/09/2020
Big Data, Tips

Databricks is becoming the new normal in data processing technologies in cloud, both Azure and AWS. This is step by step guide to get started on Realtime (streaming) analytics using spark streaming on Databricks

Introduction to Delta Architecture

09/25/201905/04/2020
Analytics, Big Data

Delta architecture processes any new streaming records like delta (incremental) records and data lake is no longer immutable data structure

Analytics Maturity (Part 2) – Crossing the Chasm

08/28/201905/04/2020
Analytics

The key steps organizations can take to cross that hurdle/chasm and move ahead of the roadblock and prepare the foundation which will enable them to move along the curve

Kappa Architecture – Another way of Data Processing

07/31/201905/04/2020
Analytics, Big Data, Tips

Imagine a scenario where we can maintain an immutable persistent stream of data and instead of processing the data twice, we can use the stream to replay the data for a different time using the code. That is the premise of Kappa architecture

How to structure the Data Lake

07/19/201905/04/2020
Analytics, Big Data, Cloud, Tips

The key reasons for the need of good data lake structure are: 1) Security: need of role-based security on the lake for read access. 2) Extendibility: it should be easy to extend the lake after first round and more systems can be added 3) Usability: it should be easy to use and find the data in the lake and the users should not get lost 4) Governance: it should be simple to apply governance practices to the lake in terms of quality, metadata management and ILM

Lambda Architecture using Databricks

07/17/201905/04/2020
Analytics, Big Data, Cloud, Tips

From technology point of view Databricks is becoming the new normal in data processing technologies, in both Azure and AWS. This post provides a view of lambda architecture and uses Databricks at front and center. Databricks has capabilities to replace multiple tools and those are described in bit detail below

Introduction to Lambda Architecture

07/17/201905/04/2020
Analytics, Big Data, Cloud

Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. This approach of architecture attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data.

Monitor and Manage Costs on Azure

07/09/201905/04/2020
Analytics, Cloud, Tips

Cost Management solution in Azure helps in monitoring, optimizing and controlling costs of Azure Resources in the subscription and Resource Groups. Cost Management shows organizational cost and usage patterns with advanced analytics.

5 Things to get best out of Azure Databricks

06/30/201905/04/2020
Big Data, Cloud, Tips

Databricks has become the new normal in the data processing in cloud. If you are using or plan to use Azure Databricks, this post is will guide you on some interesting things that you can plan to investigate as you start.