Which architecture should the Data Scientist use to build this solution?

1 Comment

  1. Walter
    Author

    I systematize that the answer is:
    Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.

Leave a Reply

Your email address will not be published. Required fields are marked *

nine + 5 =