The data platform must meet the following requirements:• Provide near-real-time analytics of the inbound genomic data• Ensure the data is flexible, parallel, and durable• Deliver results of processing to a data warehouseWhich strategy should a solutions architect use to meet these requirements?

A company is developing a gene reporting device that will collect genomic information to assist researchers with collecting large samples of data from a diverse population.The device will push 8 KB of genomic data every second to a data platform that will need to process and analyze the data and provide information back to researchers.

The data platform must meet the following requirements:• Provide near-real-time analytics of the inbound genomic data• Ensure the data is flexible, parallel, and durable• Deliver results of processing to a data warehouseWhich strategy should a solutions architect use to meet these requirements?

Use Amazon Kinesis Data Firehose to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon RDS instance.

Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.

Use Amazon S3 to collect the inbound device data, analyze the data from Amazon SQS with Kinesis, and save the results to an Amazon Redshift cluster.

Use an Amazon API Gateway to put requests into an Amazon SQS queue, analyze the data with an AWS Lambda function, and save the results to an Amazon Redshift cluster using Amazon EMR.

Explanations:

Amazon Kinesis Data Firehose is designed for loading streaming data into data lakes, data stores, and analytics services. However, it does not provide near-real-time analytics capabilities as effectively as Kinesis Data Streams. Additionally, using Amazon RDS may not be optimal for processing large volumes of genomic data due to scalability concerns.

Amazon Kinesis Data Streams can collect large volumes of data with low latency, allowing for near-real-time analytics. It supports flexible, parallel processing through multiple consumers and can handle high-throughput data ingestion. By saving the results to an Amazon Redshift cluster using Amazon EMR, it ensures durability and effective data warehousing.

Using Amazon S3 to collect data does not provide the near-real-time analytics required since S3 is primarily a storage service. Analyzing data from Amazon SQS (a message queuing service) with Kinesis complicates the architecture and does not meet the requirement for real-time processing. Saving results to an Amazon Redshift cluster would require additional steps, further complicating the solution.

While using Amazon API Gateway with SQS and AWS Lambda allows for processing incoming requests, it may introduce latency not suitable for near-real-time analytics. Additionally, this approach relies on Lambda, which has limitations in handling large volumes of data and may not support parallel processing as effectively as Kinesis Data Streams. The integration with Amazon Redshift through EMR adds complexity and may not be the most efficient route for processing genomic data.

Learn & move to cloud

Explanations:

Leave a Reply Cancel reply