In a TPC-DS like benchmark, 40% of the queries have redundant IO and a large fraction of them spend half of their time in stages with redundant IOs.įor instance, one query shows close to 3x improvement with this optimization. This optimization introduces new super operators which can avoid redundant IOs. But an engine does not actually have to scan the same data multiple times - it can combine multiple logical scans into a single physical one. This is not a mistake - self-joins are a common example.
![how to install spark pay on the surface how to install spark pay on the surface](https://thumbs.dreamstime.com/z/man-hand-put-medicines-recipient-table-glass-surface-man-hand-put-medicines-recipient-table-glass-surface-132269792.jpg)
Many queries refer to the same table multiple times. Even though our version running inside Azure Synapse today is a derivative of Apache Spark™ 2.4.4, we compared it with the latest open-source release of Apache Spark™ 3.0.1 and saw Azure Synapse was 2x faster in total runtime for the Test-DS comparison.Īlso, we observed up to 18x query performance improvement on Azure Synapse compared to the open-source Apache Spark™ 3.0.1.īelow are some of the techniques and optimizations we implemented to achieve these results. To compare the performance, we derived queries from TPC-DS with 1TB scale and ran them on 8 nodes Azure E8V3 cluster (15 executors – 28g memory, 4 cores). Indexing: Optimizing data representation in the data lake for the workload yields significant improvement.Intelligent caching: Using local SSDs, native code, and hardware-assisted parsing.
![how to install spark pay on the surface how to install spark pay on the surface](https://www.mdpi.com/applsci/applsci-11-04445/article_deploy/html/images/applsci-11-04445-g003.png)
Cluster optimizations: Picking the right VM types, rapid provisioning of VMs.Autoscaling: Automatically scaling clusters up and down with load, even as a job is running.
![how to install spark pay on the surface how to install spark pay on the surface](https://venturebeat.com/wp-content/uploads/2018/06/Screen-Shot-2018-06-21-at-5.04.51-PM.jpg)