AVRO
- Row-based storage format
- Its schema is also stored with it
- robust support for data schemas that changes over time, i.e. schema evolution.
- Avro provides rich data structures. For example, you can create a record that contains an array, an enumerated type, and a sub-record.
- When to use
- Data from the landing zone is usually read as a whole for further processing by downstream systems
- Any source schema change is easily handled (schema evolution).
Parquet
- Parquet stores the data in a column-oriented way
- Values of each and every column are organized so that all the columns are adjacent, enabling better compression rate.
- It is especially good for the queries which read columns from a “wide” (with many columns) table since only needed columns are read and the IO(Input/Output) is minimized.
- Nested data structures in a flat columnar format.