How should the records be stored in Amazon S3 to improve query performance?
CSV files
Parquet files
Compressed JSON
RecordIO
Explanations:
CSV files are inefficient for query performance in Athena because they are not columnar and lack compression, resulting in slower queries for large datasets.
Parquet files are columnar storage format, optimized for querying large datasets efficiently in Athena. Parquet supports compression and predicate pushdown, improving query performance.
Compressed JSON is still a row-based format, which does not offer the same level of query performance improvements as Parquet. JSON is not optimized for fast analytics in Athena.
RecordIO is a format typically used for machine learning and is not ideal for optimized querying in Athena. It is not designed for high-performance analytics on large datasets.