Which solution will meet these requirements MOST cost-effectively?

A company has a large data workload that runs for 6 hours each day.The company cannot lose any data while the process is running.A solutions architect is designing an Amazon EMR cluster configuration to support this critical data workload.

Which solution will meet these requirements MOST cost-effectively?

Configure a long-running cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.

Configure a transient cluster that runs the primary node on an On-Demand Instance and the core nodes and task nodes on Spot Instances.

Configure a long-running cluster that runs the primary node on an On-Demand Instance, the core nodes on Spot Instances, and the task nodes on Spot Instances.

Explanations:

A long-running cluster incurs costs even when not in use, making it less cost-effective for a workload that only runs for 6 hours daily. While using On-Demand Instances for primary and core nodes ensures data safety, the reliance on Spot Instances for task nodes does not guarantee cost efficiency since the overall cluster remains active outside the workload’s operational hours.

A transient cluster that runs the primary and core nodes on On-Demand Instances ensures data integrity while minimizing costs. Using Spot Instances for task nodes reduces costs significantly, and since the cluster is only active during the 6-hour workload, it is the most cost-effective option.

While a transient cluster running the primary node on an On-Demand Instance provides data safety, using Spot Instances for core nodes introduces risk, as Spot Instances can be interrupted, leading to potential data loss during critical processing. This option is not reliable for a workload that cannot afford any data loss.

Similar to Option A, a long-running cluster incurs ongoing costs, which is inefficient for a workload running only 6 hours a day. Additionally, using Spot Instances for core nodes introduces the risk of data loss due to interruptions, which contradicts the requirement for no data loss during the process.

Learn & move to cloud

Which solution will meet these requirements MOST cost-effectively?

Explanations:

Leave a Reply Cancel reply