Practise the standard verbs for tuning Spark job partitions to avoid skew.
0 / 5 completed
1 / 5
Fill in: 'We ___ the number of shuffle partitions to roughly match the cluster's total core count, rather than trusting Spark's default value for every job regardless of size.'
We 'tune' partition count — the standard, simple collocation for adjusting a job's parallelism setting deliberately. The other options are less idiomatic here.
2 / 5
Fill in: 'Leaving shuffle partitions at an unsuitable default on a heavily skewed dataset can ___ one enormous partition still running long after every other task has already finished.'
We say a poor partition count will 'leave' one huge task straggling — the standard, natural collocation for the resulting skew. The other options aren't idiomatic here.
3 / 5
Fill in: 'We ___ the Spark UI's stage timeline after a slow job, since a single task taking far longer than its peers usually points straight to a skewed partition.'
We 'inspect' a stage timeline — the standard, simple collocation for examining execution details in the Spark UI. The other options are less idiomatic here.
4 / 5
Fill in: 'We ___ a salting key to a heavily skewed join column, spreading the largest keys across several partitions instead of concentrating them all on one.'
We 'add a key' — the standard, simple collocation for introducing a technique that redistributes skewed data. The other options are less idiomatic here.
5 / 5
Fill in: 'We ___ partition sizes after any tuning change against the previous run, so a supposed improvement isn't just an assumption without a measured before-and-after comparison.'
We 'compare' partition sizes — the standard, simple collocation for contrasting a metric across two runs. The other options are less idiomatic here.