How should the data scientist meet these requirements?
Mount the EFS file system to a SageMaker notebook and run a script that copies the data to an Amazon FSx for Lustre file system. Run the SageMaker training job with the FSx for Lustre file system as the data source.
Launch a transient Amazon EMR cluster. Configure steps to mount the EFS file system and copy the data to an Amazon S3 bucket by using S3DistCp. Run the SageMaker training job with Amazon S3 as the data source.
Mount the EFS file system to an Amazon EC2 instance and use the AWS CLI to copy the data to an Amazon S3 bucket. Run the SageMaker training job with Amazon S3 as the data source.
Run a SageMaker training job with an EFS file system as the data source.
Explanations:
This option requires an additional step of copying data from EFS to FSx for Lustre before running the training job, which adds unnecessary complexity and time.
This option involves launching a transient EMR cluster to copy data from EFS to S3, which introduces more steps and resource management overhead, thus not meeting the goal of minimizing steps and integration work.
Similar to option B, this requires manually copying data from EFS to S3 using an EC2 instance, adding extra steps and increasing operational overhead.
This option allows for directly using the EFS file system as the data source for the SageMaker training job, which minimizes the number of steps and simplifies the integration process, aligning with management’s requirements.