Which of the following techniques can be used for normalization in text mining?
You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible?
What is a sentence parser typically used for?
Which of the following is a good test dataset characteristic?
Which of the following is a reasonable way to select the number of principal components "k"?
What is pca.components_ in Sklearn?
How do you handle missing or corrupted data in a dataset?
Which of the following is a disadvantage of decision trees?
When performing regression or classification, which of the following is the correct way to preprocess the data?
Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?
In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
Why is second order differencing in time series needed?
What is the purpose of performing cross-validation?
Which of the following statements about regularization is not correct?
To find the minimum or the maximum of a function, we set the gradient to zero because:
The most widely used metrics and tools to assess a classification model are:
Suppose you have trained a logistic regression classifier and it outputs a new example x with a prediction ho(x) = 0.2. This means
How can you prevent a clustering algorithm from getting stuck in bad local optima?
Which of the following is true about Naive Bayes ?
Which of the folllowing is an example of feature extraction?