What is the MOST likely cause of the 5-minute connection outage?
After a database crash, Aurora needed to replay the redo log from the last database checkpoint
The client-side application is caching the DNS data and its TTL is set too high
After failover, the Aurora DB cluster needs time to warm up before accepting client connections
There were no active Aurora Replicas in the Aurora DB cluster
Explanations:
While it is true that Aurora may need to replay logs after a crash, this situation describes a failover event rather than a database crash. The failover process itself only took about 15 seconds, which does not account for the additional 5-minute connection outage.
The client-side application caching DNS data with a high TTL could lead to connection issues. If the application is still trying to connect to the old primary instance after the failover, it may not resolve to the new primary instance in time, resulting in a connection failure that lasts until the cache is refreshed.
Although Aurora may take a brief moment to warm up after a failover, it is designed to handle failovers quickly and usually does not require a long time before accepting connections. The 5-minute outage suggests another issue, likely related to DNS resolution rather than warm-up time.
The existence of multiple Aurora Replicas suggests redundancy and availability. If there were no active replicas, it would likely lead to a different type of outage rather than the specified connection failure during a failover event. Therefore, this option is not a likely cause for the 5-minute outage.