Which solution provides near-real-time data querying that is scalable with minimal data loss?
Publish data to Amazon Kinesis Data Streams. Use Kinesis Data Analytics to query the data.
Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.
Explanations:
Amazon Kinesis Data Streams can handle high ingestion rates and provides durability by retaining data for up to 7 days. Kinesis Data Analytics allows for near-real-time querying of streaming data, making this option scalable and minimizing data loss.
While Amazon Kinesis Data Firehose can buffer and deliver data to Amazon Redshift, there is a higher latency compared to Kinesis Data Streams for near-real-time querying. Additionally, using Redshift involves batch processing, which may not support the immediate querying needed for real-time data analysis.
Storing data in an EC2 instance store is not durable and will result in data loss upon reboot. Even though Kinesis Data Firehose and S3 are reliable for data storage, the lack of durability from the instance store contradicts the requirement to minimize data loss.
Storing data on an Amazon EBS volume does provide durability, but using ElastiCache for Redis does not effectively support querying of large volumes of ingested data. Redis is primarily an in-memory data store for caching and would not facilitate near-real-time querying of large datasets efficiently.