Intermediate Collocations #engineering#data-engineering#big-data#performance

Spark Job Partition Tuning Language Collocations

Practise the standard verbs for tuning Spark job partitions to avoid skew.

0 / 5 completed

1 / 5

Fill in: 'We ___ the number of shuffle partitions to roughly match the cluster's total core count, rather than trusting Spark's default value for every job regardless of size.'

2 / 5

Fill in: 'Leaving shuffle partitions at an unsuitable default on a heavily skewed dataset can ___ one enormous partition still running long after every other task has already finished.'

3 / 5

Fill in: 'We ___ the Spark UI's stage timeline after a slow job, since a single task taking far longer than its peers usually points straight to a skewed partition.'

4 / 5

Fill in: 'We ___ a salting key to a heavily skewed join column, spreading the largest keys across several partitions instead of concentrating them all on one.'

5 / 5

Fill in: 'We ___ partition sizes after any tuning change against the previous run, so a supposed improvement isn't just an assumption without a measured before-and-after comparison.'