How can the company resolve this issue MOST cost-effectively?
Set up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images. Increase the size of the instances and the number of instances that are used by the SageMaker endpoint.
Extend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3. Decompress the files by using a separate Lambda function that invokes the existing Lambda function to run the inference pipeline.
Use auto scaling for SageMaker. Set up an AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images.
Deploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine. Extend the long-running Lambda function that runs on AWS IoT Greengrass to invoke the Lambda function with the captured images and run the inference on the edge component that forwards the results directly to the web service.
Explanations:
Setting up AWS Direct Connect with 10 Gbps would be costly and still would not fully address the latency issue caused by uploading large images. Increasing SageMaker endpoint size and number of instances would not directly resolve the underlying network bottleneck.
Compressing images may reduce the bandwidth requirements, but decompressing and invoking additional Lambda functions would introduce complexity and might still not resolve the network issue effectively, especially with large-scale deployments.
While AWS Direct Connect could reduce network latency, the main issue is not just network capacity but the overall cost and scalability of uploading large amounts of data to Amazon S3 and invoking the SageMaker endpoint. This option would still incur high costs and latency.
Running the Lambda function and ML models directly on the edge with AWS IoT Greengrass reduces reliance on the network by performing inference locally. This minimizes latency, reduces data transfer costs, and scales better for thousands of machines.