What should the ML specialist do to initialize the model to fine-tune the model with the custom data?
Initialize the model with pretrained weights in all layers except the last fully connected layer.
Initialize the model with pretrained weights in all layers. Stack a classifier on top of the first output position. Train the classifier with the labeled data.
Initialize the model with random weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
Initialize the model with pretrained weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
Explanations:
BERT should be initialized with pretrained weights in all layers, including the last fully connected layer, for fine-tuning. Only modifying the last layer is not recommended.
While using pretrained weights for all layers is correct, stacking a classifier on top of the first output position is not the typical approach for text classification with BERT.
Initializing with random weights defeats the purpose of transfer learning. Using pretrained weights for all layers is the standard approach for fine-tuning.
This is the standard procedure: initialize with pretrained weights in all layers, replace the last fully connected layer with a classifier, and train it with the labeled data.