Which strategy will allow the data scientist to identify fraudulent accounts?
Execute the built-in FindDuplicates Amazon Athena query.
Create a FindMatches machine learning transform in AWS Glue.
Create an AWS Glue crawler to infer duplicate accounts in the source data.
Search for duplicate accounts in the AWS Glue Data Catalog.
Explanations:
The “FindDuplicates” query in Amazon Athena is a query template to find duplicates in a dataset. It doesn’t help with identifying fraudulent accounts in real-time or by applying machine learning models to detect fraud patterns.
The FindMatches transform in AWS Glue is specifically designed for identifying duplicate or related records, including potentially fraudulent accounts, by applying machine learning. This method is suitable for the scenario where the data scientist needs to detect known fraudulent users.
An AWS Glue crawler infers metadata about the structure of the data and does not perform data cleansing or fraud detection. It’s not capable of identifying duplicate or fraudulent accounts.
Searching for duplicate accounts in the AWS Glue Data Catalog involves looking at metadata rather than actual data, and would not help in identifying fraudulent accounts based on the application logs or real data.