Which solution will meet these requirements?

A company has 5 TB of datasets.The datasets consist of 1 million user profiles and 10 million connections.The user profiles have connections as many-to-many relationships.The company needs a performance efficient way to find mutual connections up to five levels.

Which solution will meet these requirements?

Use an Amazon S3 bucket to store the datasets. Use Amazon Athena to perform SQL JOIN queries to find connections.

Use Amazon Neptune to store the datasets with edges and vertices. Query the data to find connections.

Use an Amazon S3 bucket to store the datasets. Use Amazon QuickSight to visualize connections.

Use Amazon RDS to store the datasets with multiple tables. Perform SQL JOIN queries to find connections.

Explanations:

Amazon S3 is suitable for storing large datasets, but using Amazon Athena for SQL JOIN queries on a many-to-many relationship involving complex queries (like finding mutual connections up to five levels) can lead to performance issues. Athena is not optimized for complex graph queries and may not efficiently handle deep relationships.

Amazon Neptune is a fully managed graph database service that is optimized for storing and querying highly connected data. It supports both property graphs and RDF, making it ideal for modeling many-to-many relationships like user profiles and connections. Querying for mutual connections up to five levels is efficient in Neptune using Gremlin or SPARQL query languages.

Amazon QuickSight is a business intelligence tool designed for data visualization and analysis. It does not directly handle datasets or perform complex queries on many-to-many relationships. It can visualize data but cannot efficiently find mutual connections in a graph-like structure.

Amazon RDS can store relational data effectively, but for many-to-many relationships involving millions of records, SQL JOIN queries can become complex and slow. Finding mutual connections up to five levels would likely lead to performance issues due to the overhead of multiple JOIN operations and is not as efficient as using a graph database.

Learn & move to cloud

Which solution will meet these requirements?

Explanations:

Leave a Reply Cancel reply