What should the solutions architect do to prevent AWS Glue from reprocessing old data?
Edit the job to use job bookmarks.
Edit the job to delete data after the data is processed.
Edit the job by setting the NumberOfWorkers field to 1.
Use a FindMatches machine learning (ML) transform.
Explanations:
Job bookmarks allow AWS Glue to track which data has been processed, preventing the reprocessing of old data. By enabling job bookmarks, only new data added since the last run will be processed.
Deleting data after processing does not prevent the job from reprocessing old data; it only removes data from S3. The job would still run on all data unless job bookmarks are implemented.
Setting the NumberOfWorkers field to 1 does not impact the processing of old data; it only affects the parallelism of the job. The issue of reprocessing all data remains unaddressed.
A FindMatches ML transform is used for deduplication and data matching but does not inherently prevent AWS Glue from reprocessing old data. It does not address the core issue of data tracking.