Which solution will meet these requirements with the LEAST amount of compute resources?
Import the data by using the None option.
Import the data by using the Stratified option.
Import the data by using the First K option. Infer the value of K from domain knowledge.
Import the data by using the Randomized option. Infer the random size from domain knowledge.
Explanations:
The None option does not import any data, which would not allow for any analysis or modeling to be performed. This option does not meet the requirements for ingesting data for predictive maintenance.
The Stratified option is typically used for ensuring that the sample maintains the same proportion of classes as the whole dataset. While this could be useful for certain types of analyses, it may involve additional computation to maintain these proportions, making it less efficient for the stated requirement of using the least amount of compute resources.
The First K option allows the data scientist to import a specified number of records based on domain knowledge. This is efficient as it reduces the amount of data processed while still allowing for sufficient exploratory data analysis (EDA) to understand the statistical properties without overloading the system.
The Randomized option involves randomly selecting a subset of the data, which may require additional compute resources for sampling and could introduce variability that complicates EDA. This approach does not guarantee that the most relevant data for anomaly detection is included, making it less efficient than the First K option.