One post tagged with "data"

Strategies for skewed Spark datasets: window ordering use case

May 26, 2024 · 5 min read

A software and devops engineer, with a passion for data and audio.

One of the biggest challenge of designing Spark transformations is handling skewed datasets. Depending on your database schema and driver, you might be pulling this data already skewed, and you might also need to order it based on columns that are inherently skewed. In this article, after a brief explanation on how to identify skews, I dive into mitigation strategies for ordering over skewed window.