# Data Science Questions and Answers – Clustering

This set of Data Science Multiple Choice Questions & Answers (MCQs) focuses on “Clustering”.

1. Hierarchical clustering should be primarily used for exploration.
a) True
b) False

Explanation: Hierarchical clustering is deterministic.
2. Which of the following function is used for k-means clustering?
a) k-means
b) k-mean
c) heatmap
d) none of the mentioned

Explanation: K-means requires a number of clusters.
3. Which of the following clustering requires merging approach?
a) Partitional
b) Hierarchical
c) Naive Bayes
d) None of the mentioned

Explanation: Hierarchical clustering requires a defined distance as well.
4. K-means is not deterministic and it also consists of number of iterations.
a) True
b) False

Explanation: K-means clustering produces the final estimate of cluster centroids.

5. Which of the following clustering type has characteristic shown in the below figure?

a) Partitional
b) Hierarchical
c) Naive bayes
d) None of the mentioned

Explanation: By producing a cluster tree or dendrogram, hierarchical clustering organises data on a number of scales.

6. Point out the correct statement.
a) The choice of an appropriate metric will influence the shape of the clusters
b) Hierarchical clustering is also called HCA
c) In general, the merges and splits are determined in a greedy manner
d) All of the mentioned

Explanation: Some elements may appear to be close to one another at one distance yet be further apart at another.

7. Which of the following is finally produced by Hierarchical Clustering?
a) final estimate of cluster centroids
b) tree showing how close things are to each other
c) assignment of each point to clusters
d) all of the mentioned

Explanation: Hierarchical clustering is an agglomerative approach.

8. Which of the following is required by K-means clustering?
a) defined distance metric
b) number of clusters
c) initial guess as to cluster centroids
d) all of the mentioned

Explanation: K-means clustering follows partitioning approach.

9. Point out the wrong statement.
a) k-means clustering is a method of vector quantization
b) k-means clustering aims to partition n observations into k clusters
c) k-nearest neighbor is same as k-means
d) none of the mentioned

Explanation: k-nearest neighbor has nothing to do with k-means.

10. Which of the following combination is incorrect?
a) Continuous – euclidean distance
b) Continuous – correlation similarity
c) Binary – manhattan distance
d) None of the mentioned

Explanation: You should choose a distance/similarity that makes sense for your problem.

Cluster analysis, often known as clustering, is the problem of arranging a set of items so that objects in the same group (called a cluster) are more comparable (in some sense) to those in other groups (clusters). It is a conventional approach for statistical data analysis used in many domains, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning, and is a main goal of exploratory data analysis. Cluster analysis is the general problem to be solved, not a specific solution.