What is the LIKELY cause of the failure?
The ECS service was deleted.
The ECS configuration does not contain an Auto Scaling group.
The ECS instance task execution IAM role was modified.
The ECS task role was modified.
Explanations:
If the ECS service was deleted, the tasks would not be able to poll the SQS queue at all, leading to a completely empty queue rather than a queue filling up. The presence of 400 errors indicates that the ECS tasks are still running and attempting to interact with DynamoDB.
The absence of an Auto Scaling group would not directly cause the 400 errors observed. While scaling issues could lead to insufficient instances to process the queue, the task execution would still occur and result in different types of errors. The error suggests a problem with permissions or the request structure rather than scaling.
If the ECS instance task execution IAM role was modified but not affecting permissions relevant to DynamoDB updates, this would not lead to 400 errors specifically related to update requests. Task execution roles primarily handle permissions for pulling images and writing logs, not for accessing DynamoDB directly.
If the ECS task role was modified and revoked permissions necessary for updating DynamoDB, this could lead to the 400 errors seen in the logs when attempting to write to the database. A 400 error often indicates a bad request, which can be caused by missing required permissions or incorrect request parameters.