Which storage option provides the most processing flexibility and will allow access control with IAM?
Use a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users.
Use an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies.
Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies.
Configure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users.
Explanations:
Amazon DynamoDB is designed for structured data and does not natively support large binary objects like images efficiently. While it can enforce IAM policies, it lacks the scalability and flexibility needed for handling large volumes of image data generated daily.
Amazon S3 is ideal for storing large volumes of unstructured data like images and supports IAM for access control. The use of bucket policies allows for fine-grained access management, enabling the company to restrict data access to specific IAM users while providing the flexibility needed for machine learning applications.
Amazon EMR with HDFS is more suited for processing data rather than storing large amounts of raw image data. While it can be secured with IAM policies, managing access can be complex, and it does not provide the same level of straightforward access control for individual files as S3 does.
Amazon EFS is a file system that can be mounted to EC2 instances, but it is not optimized for large-scale storage of image data like S3. While it can utilize IAM for access control, the flexibility and cost-effectiveness of S3 make it a better choice for high-volume image storage compared to EFS.