Which solution will meet these requirements in the MOST cost-effective way?
Real-time inference with auto scaling
Serverless inference with provisioned concurrency
Asynchronous inference
A batch transform task
Explanations:
Real-time inference with auto scaling can handle variable traffic but may lead to higher costs due to the need for provisioning multiple instances during peak hours. This option is not the most cost-effective given the predictable traffic pattern.
Serverless inference with provisioned concurrency allows the model to scale automatically based on the incoming requests, optimizing costs while ensuring low latency during peak usage times. This option is most cost-effective since it can handle the anticipated traffic efficiently.
Asynchronous inference is better suited for scenarios where requests can be processed in a non-time-sensitive manner. However, since the media company requires quick recommendations for real-time user engagement, this option does not meet the need for immediate response times.
A batch transform task processes data in bulk, which does not align with the requirement for delivering timely recommendations to users. This approach is also less suitable given the low volume of data and the need for immediate delivery.