Which solution will meet these requirements in the MOST operationally efficient way?
Create an AWS Lambda function to filter the data that exceeds DynamoDB item size limits. Store the larger data in an Amazon DocumentDB (with MongoDB compatibility) database.
Store the large data as objects in an Amazon S3 bucket. In a DynamoDB table, create an item that has an attribute that points to the S3 URL of the data.
Split all incoming large data into a collection of items that have the same partition key. Write the data to a DynamoDB table in a single operation by using the BatchWriteItem API operation.
Create an AWS Lambda function that uses gzip compression to compress the large objects as they are written to a DynamoDB table.
Explanations:
While using AWS Lambda to filter out large data and storing it in Amazon DocumentDB could work, it adds complexity and operational overhead. DocumentDB is not necessary if the data can be effectively handled by DynamoDB and S3. This option also requires managing two different databases, which can increase operational burden.
Storing large data in Amazon S3 and referencing it in DynamoDB is the most efficient solution. DynamoDB has a strict item size limit of 400 KB, and S3 is designed for storing large objects. This method allows for seamless scaling, minimal operational complexity, and efficient retrieval of large data while keeping metadata in DynamoDB.
Splitting large data into smaller items can be cumbersome and complicates data management. This approach also risks exceeding the 400 KB limit per item in DynamoDB, as well as complicating the data retrieval process by requiring multiple reads. It adds unnecessary complexity to the application.
While gzip compression can reduce data size, it does not address the fundamental issue of DynamoDB’s item size limit. Additionally, implementing compression adds operational overhead, and compressed data would require decompression when accessed, complicating data retrieval and adding latency.