Which solution will meet these requirements?
Develop custom libraries to perform optical character recognition (OCR) on the forms. Deploy the libraries to an Amazon Elastic Kubernetes Service (Amazon EKS) cluster as an application tier. Use this tier to process the forms when forms are uploaded. Store the output in Amazon S3. Parse this output by extracting the data into an Amazon DynamoDB table. Submit the data to the target system’s APL. Host the new application tier on EC2 instances.
Extend the system with an application tier that uses AWS Step Functions and AWS Lambda. Configure this tier to use artificial intelligence and machine learning (AI/ML) models that are trained and hosted on an EC2 instance to perform optical character recognition (OCR) on the forms when forms are uploaded. Store the output in Amazon S3. Parse this output by extracting the data that is required within the application tier. Submit the data to the target system’s API.
Host a new application tier on EC2 instances. Use this tier to call endpoints that host artificial intelligence and machine teaming (AI/ML) models that are trained and hosted in Amazon SageMaker to perform optical character recognition (OCR) on the forms. Store the output in Amazon ElastiCache. Parse this output by extracting the data that is required within the application tier. Submit the data to the target system’s API.
Extend the system with an application tier that uses AWS Step Functions and AWS Lambda. Configure this tier to use Amazon Textract and Amazon Comprehend to perform optical character recognition (OCR) on the forms when forms are uploaded. Store the output in Amazon S3. Parse this output by extracting the data that is required within the application tier. Submit the data to the target system’s API.
Explanations:
While developing custom libraries for OCR could work, it requires significant development effort and maintenance, leading to increased operational overhead. Additionally, deploying on EKS is more complex than necessary for this application.
Although using AWS Step Functions and Lambda can automate processes, training AI/ML models on EC2 adds complexity and may not ensure the highest accuracy compared to dedicated services like Textract.
Calling endpoints for AI/ML models in SageMaker adds latency and complexity. Using ElastiCache for output storage is also not optimal, as S3 is better suited for file storage.
This option leverages AWS Step Functions and Lambda for automation, while using Amazon Textract and Comprehend for accurate OCR and data extraction. It minimizes operational overhead, ensures accuracy, and is a fully managed solution that accelerates time to market.