Which solution will meet these requirements with the LEAST development effort?

A company hosts a machine learning (ML) dataset repository on Amazon S3.A data scientist is preparing the repository to train a model.The data scientist needs to redact personally identifiable information (PH) from the dataset.

Which solution will meet these requirements with the LEAST development effort?

Use Amazon SageMaker Data Wrangler with a custom transformation to identify and redact the PII.

Create a custom AWS Lambda function to read the files, identify the PII. and redact the PII

Use AWS Glue DataBrew to identity and redact the PII

Use an AWS Glue development endpoint to implement the PII redaction from within a notebook

Explanations:

While Amazon SageMaker Data Wrangler offers data transformation capabilities, creating a custom transformation to identify and redact PII would require significant development effort, making it less ideal for this use case compared to other options.

Developing a custom AWS Lambda function involves more coding and setup compared to the other options. This option requires building the logic to read files, identify PII, and perform redaction, which increases development effort unnecessarily.

AWS Glue DataBrew is specifically designed for data preparation tasks, including PII identification and redaction. It provides built-in functionalities for data cleaning and transformation with a user-friendly interface, minimizing development effort significantly.

Using an AWS Glue development endpoint involves writing code in a notebook environment to implement PII redaction. This requires more coding and setup compared to AWS Glue DataBrew, leading to higher development effort.

Learn & move to cloud

Which solution will meet these requirements with the LEAST development effort?

Explanations:

Leave a Reply Cancel reply