Which is the most appropriate?
Use S3 with reduced redundancy lo store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto- scaling and an Elastic Load Balancer.
Model the environment using CloudFormation use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index.
Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones.
Use a single-AZ RDS MySQL instance lo store the search index 33d the JPEG images use an EC2 instance to serve the website and translate user queries into SQL.
Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container Tor the website on EC2 instances and use Route53 with DNS round-robin.
Explanations:
Using S3 with reduced redundancy is not advisable for a large data set like 17TB of scanned images, as it does not provide the durability and availability needed for such an archive. Auto-scaling EC2 instances can manage load, but the reduced redundancy in S3 increases the risk of data loss.
While using CloudFormation and an open-source search application can provide flexibility, striping EBS volumes adds complexity and may not ensure the necessary durability and availability. Moreover, it relies heavily on the EC2 instance and does not leverage S3’s benefits for storage.
Using S3 with standard redundancy ensures high durability (99.999999999%) for the 17TB of JPEGs. CloudSearch provides a managed search service that can scale with demand, and Elastic Beanstalk facilitates easy deployment and management of the web application across multiple availability zones, enhancing availability and durability.
A single-AZ RDS MySQL instance lacks high availability, and relying solely on it for both the search index and JPEG storage is inefficient and risky. If the instance fails, both services would go down. Additionally, storing large image files in RDS is not optimal compared to S3.
While using CloudFront can improve download performance for JPEGs, installing the current commercial search product and using EC2 instances can be more costly and less manageable than using a managed service like CloudSearch. DNS round-robin with Route53 does not ensure high availability across multiple instances, especially if one fails.