Which storage strategy is the MOST cost-effective and meets the design requirements?
Design the application to store each incoming record as a single .csv file in an Amazon S3 bucket to allow for indexed retrieval. Configure a lifecycle policy to delete data older than 120 days.
Design the application to store each incoming record in an Amazon DynamoDB table properly configured for the scale. Configure the DynamoDB Time to Live (TTL) feature to delete records older than 120 days.
Design the application to store each incoming record in a single table in an Amazon RDS MySQL database. Run a nightly cron job that runs a query to delete any records older than 120 days.
Design the application to batch incoming records before writing them to an Amazon S3 bucket. Update the metadata for the object to contain the list of records in the batch and use the Amazon S3 metadata search feature to retrieve the data. Configure a lifecycle policy to delete the data after 120 days.
Explanations:
While storing records in a .csv file in S3 allows for indexed retrieval and is cost-effective, it may not provide the low-latency retrieval needed for millions of records per minute. Also, managing many small files can lead to increased overhead and slower performance.
Using Amazon DynamoDB is suitable for high ingestion rates and low-latency retrieval. The TTL feature allows for automatic deletion of records after 120 days, aligning perfectly with the ephemeral data requirement. DynamoDB can handle the scale of millions of small records efficiently.
Using Amazon RDS MySQL for this scenario may lead to higher costs and management overhead. The nightly cron job for deleting old records can introduce complexity and may not be timely enough for the data retention requirement. Additionally, RDS may not be optimized for the ingestion rate described.
While batching records in S3 can help with storage efficiency, using metadata for retrieval can complicate the access pattern and may not provide the low-latency performance needed for real-time applications. Furthermore, managing batches might lead to increased latency during retrieval.