Which solution should the Data Scientist build to satisfy the requirements?

1 Comment

  1. Madison
    Author

    If I’m correct, the answer is:
    Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.

Leave a Reply

Your email address will not be published. Required fields are marked *

5 × three =