Which combination of feature engineering techniques should the data scientist use to meet these requirements?
(Choose two.)
Named entity recognition
Coreference
Stemming
Term frequency-inverse document frequency (TF-IDF)
Sentiment analysis
Explanations:
Named entity recognition (NER) is used to identify and classify entities such as people, organizations, or locations in text. While it can be useful for specific analysis, it is not required for initial exploratory analysis or creating a word cloud.
Coreference resolution involves identifying when different words refer to the same entity. This is not typically part of the feature engineering for exploratory analysis or word cloud generation.
Stemming reduces words to their root form, which helps in feature extraction for NLP models. It is useful in exploratory analysis and text preprocessing, especially when preparing for word clouds and basic chart visualizations.
TF-IDF measures the importance of words in a document relative to the entire corpus. It is a common and effective technique for feature engineering in text analysis, useful for generating insights and word clouds.
Sentiment analysis involves determining the sentiment of the text (positive, negative, neutral), which is not directly needed for the initial exploratory analysis or word cloud. It is more suited for sentiment classification tasks.