# Data Science Questions and Answers – Clustering

This set of Data Science Multiple Choice Questions & Answers (MCQs) focuses on “Clustering”.

**1. Hierarchical clustering should be primarily used for exploration.**

**a) True**

b) False

**Explanation:** Hierarchical clustering is deterministic.

**2. Which of the following function is used for k-means clustering?**

**a) k-means**

b) k-mean

c) heatmap

d) none of the mentioned

**Explanation:** K-means requires a number of clusters.

**3. Which of the following clustering requires merging approach?**

a) Partitional

**b) Hierarchical**

c) Naive Bayes

d) None of the mentioned

**Explanation:** Hierarchical clustering requires a defined distance as well.

**4. K-means is not deterministic and it also consists of number of iterations.**

**a) True**

b) False

**Explanation:** K-means clustering produces the final estimate of cluster centroids.

**5. Which of the following clustering type has characteristic shown in the below figure?**

a) Partitional

**b) Hierarchical**

c) Naive bayes

d) None of the mentioned

**Explanation: **By producing a cluster tree or dendrogram, hierarchical clustering organises data on a number of scales.

**6. Point out the correct statement.**

a) The choice of an appropriate metric will influence the shape of the clusters

b) Hierarchical clustering is also called HCA

c) In general, the merges and splits are determined in a greedy manner

**d) All of the mentioned**

**Explanation:** Some elements may appear to be close to one another at one distance yet be further apart at another.

**7. Which of the following is finally produced by Hierarchical Clustering?**

a) final estimate of cluster centroids

**b) tree showing how close things are to each other**

c) assignment of each point to clusters

d) all of the mentioned

**Explanation:** Hierarchical clustering is an agglomerative approach.

**8. Which of the following is required by K-means clustering?**

a) defined distance metric

b) number of clusters

c) initial guess as to cluster centroids

**d) all of the mentioned**

**Explanation:** K-means clustering follows partitioning approach.

**9. Point out the wrong statement.**

a) k-means clustering is a method of vector quantization

b) k-means clustering aims to partition n observations into k clusters

**c) k-nearest neighbor is same as k-means**

d) none of the mentioned

**Explanation:** k-nearest neighbor has nothing to do with k-means.

**10. Which of the following combination is incorrect?**

a) Continuous – euclidean distance

b) Continuous – correlation similarity

c) Binary – manhattan distance

**d) None of the mentioned**

**Explanation:** You should choose a distance/similarity that makes sense for your problem.

Cluster analysis, often known as clustering, is the problem of arranging a set of items so that objects in the same group (called a cluster) are more comparable (in some sense) to those in other groups (clusters). It is a conventional approach for statistical data analysis used in many domains, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning, and is a main goal of exploratory data analysis. Cluster analysis is the general problem to be solved, not a specific solution.