Learn to say Avro, Parquet, Protobuf, Arrow, and ORC correctly — the data serialisation and storage formats used in modern data engineering.
0 / 5 completed
1 / 5
How is 'Avro' pronounced?
Avro is pronounced /ˈeɪvroʊ/ — 'AY-vroh'. The name is taken from Avro, the British aircraft manufacturer (A.V. Roe and Company), continuing the aviation naming theme in the Hadoop ecosystem (alongside Hadoop itself). 'Av' = /eɪv/ (diphthong /eɪ/ as in 'rave', 'cave'). 'ro' = /roʊ/ (diphthong /oʊ/). Two syllables: AY-vroh, stress on the first. Apache Avro is a row-based data serialisation format with a schema embedded in or alongside the data, commonly used in Kafka messaging and Hadoop data pipelines.
2 / 5
How is 'Parquet' pronounced?
Parquet is pronounced /pɑːrˈkeɪ/ — 'par-KAY'. The name is borrowed from the French word for a type of wooden flooring, pronounced in the French manner. 'Par' = /pɑːr/ (rhotic vowel). '-quet' = /keɪ/ (silent 't', French-style ending as in 'bouquet', 'croquet'). Two syllables: par-KAY, stress on the second. Apache Parquet is a columnar data storage format for the Hadoop ecosystem, designed for efficient compression and encoding schemes, significantly faster than row-based formats for analytical queries over large datasets.
3 / 5
How is 'Protobuf' pronounced?
Protobuf is pronounced /ˈproʊtəbʌf/ — 'PROH-tuh-buf'. The name is a shortening of 'Protocol Buffers'. 'Proto' = /ˈproʊtə/ — 'PROH-tuh' (diphthong /oʊ/ + schwa). 'buf' = /bʌf/ (short /ʌ/ as in 'buff', 'rough'). Three syllables: PROH-tuh-buf, stress on the first. Protocol Buffers (Protobuf) is Google's binary serialisation format used with gRPC and for efficient data storage. It requires a .proto schema file and generates code in multiple languages for fast serialisation.
4 / 5
How is 'Arrow' pronounced?
Arrow is pronounced /ˈæroʊ/ — 'AR-oh'. It is the familiar English word for a pointed projectile, evoking directional, fast data movement. 'Ar' = /ær/ (short /æ/ as in 'have', 'man'). '-row' = /roʊ/ (diphthong /oʊ/). Two syllables: AR-oh, stress on the first. Apache Arrow is an in-memory columnar data format designed for efficient analytics, enabling zero-copy reads and interoperability between data processing systems like pandas, Spark, and Polars without serialisation overhead.
5 / 5
How is 'ORC' pronounced in a data engineering context?
ORC (Optimised Row Columnar) is pronounced /ɔːrk/ — 'ORC', rhyming with 'fork', 'pork', and 'cork'. ORC is pronounced as a single word, not spelled out letter by letter. The vowel is the long /ɔː/ sound. Single syllable: ORC. ORC is a columnar storage file format for Hadoop designed to improve reading, writing, and processing performance. It provides better compression than text formats and supports predicate pushdown for faster query execution in Hive and Spark.