Which solution provides near-real-time data querying that is scalable with minimal data loss?
Publish data to Amazon Kinesis Data Streams, Use Kinesis Data Analytics to query the data.
Publish data to Amazon Kinesis Data Firehose with Amazon Redshift as the destination. Use Amazon Redshift to query the data.
Store ingested data in an EC2 instance store. Publish data to Amazon Kinesis Data Firehose with Amazon S3 as the destination. Use Amazon Athena to query the data.
Store ingested data in an Amazon Elastic Block Store (Amazon EBS) volume. Publish data to Amazon ElastiCache for Redis. Subscribe to the Redis channel to query the data.
Explanations:
Amazon Kinesis Data Streams allows for real-time data ingestion and processing, and Kinesis Data Analytics enables near-real-time querying of the streaming data. This solution is highly scalable and minimizes data loss by retaining data in the stream for a configurable retention period, allowing consumers to process data as it arrives.
While Kinesis Data Firehose can be used to deliver data to Amazon Redshift for querying, this introduces latency because data must first be batched and stored in Redshift before it can be queried. This method is less suited for near-real-time data querying compared to Kinesis Data Streams.
Storing data in an EC2 instance store is not persistent; data will be lost on reboot. Although Kinesis Data Firehose can deliver data to S3 and Athena can query data in S3, the solution does not effectively prevent data loss during instance reboots. Furthermore, querying from S3 typically has higher latency compared to real-time streams.
Using Amazon EBS for storage introduces latency and potential data loss upon failure of the EC2 instance if not backed up correctly. Although ElastiCache for Redis allows for fast querying, it does not provide the same level of durability and real-time processing capabilities for high ingestion rates as Kinesis Data Streams.