Which is the MOST cost-effective solution for collecting and storing the data?
Put each record in Amazon Kinesis Data Streams. Use an AWS Lambda function to write each record to an object in Amazon S3 with a prefix that organizes the records by hour and hashes the record’s key. Analyze recent data from Kinesis Data Streams and historical data from Amazon S3.
Put each record in Amazon Kinesis Data Streams. Set up Amazon Kinesis Data Firehouse to read records from the stream and group them into objects in Amazon S3. Analyze recent data from Kinesis Data Streams and historical data from Amazon S3.
Put each record into an Amazon DynamoDB table. Analyze the recent data by querying the table. Use an AWS Lambda function connected to a DynamoDB stream to group records together, write them into objects in Amazon S3, and then delete the record from the DynamoDB table. Analyze recent data from the DynamoDB table and historical data from Amazon S3
Put each record into an object in Amazon S3 with a prefix what organizes the records by hour and hashes the record’s key. Use S3 lifecycle management to transition objects to S3 infrequent access storage to reduce storage costs. Analyze recent and historical data by accessing the data in Amazon S3
Explanations:
While using Amazon Kinesis Data Streams with AWS Lambda for writing to S3 allows for real-time data processing, it may introduce unnecessary complexity and costs. Storing every record as an individual object in S3 can lead to increased storage costs and inefficiencies, especially with a high volume of data.
This option effectively uses Amazon Kinesis Data Streams for real-time data collection and processing. Kinesis Data Firehose can efficiently batch and deliver the data to S3, optimizing costs and simplifying the storage structure. This solution meets both the requirement for near real-time availability and indefinite storage for analysis.
Storing data in DynamoDB incurs higher costs compared to S3, especially with the volume of data being produced. Although it allows for quick access to recent data, the added complexity of using DynamoDB streams and subsequent processing to transfer data to S3 increases operational overhead and costs.
Directly storing each record in S3 without using a streaming solution does not meet the requirement for immediate availability for analysis. While S3 lifecycle management can help manage storage costs, this option lacks the real-time processing capabilities needed to analyze data within a few seconds of ingestion.