How can the data scientist convert the file format with the LEAST amount of effort?
Use an AWS Glue crawler to convert the file format.
Write a script to convert the file format. Run the script as an AWS Glue job.
Write a script to convert the file format. Run the script on an Amazon EMR cluster.
Write a script to convert the file format. Run the script in an Amazon SageMaker notebook.
Explanations:
AWS Glue crawlers are used for discovering and cataloging data, not for data format conversion. A crawler alone cannot convert the file format from CSV to Parquet.
Writing a script to convert the file format and running it as an AWS Glue job is a simple and scalable approach. AWS Glue provides built-in transformations and can process large datasets like 20 TB efficiently.
While Amazon EMR can process large data and perform format conversion, it requires more setup and configuration, making it less efficient compared to AWS Glue for this specific task.
Amazon SageMaker notebooks are primarily used for machine learning tasks. Although they can process data, they are not optimized for large-scale data format conversion like Glue or EMR.