This handout is designed to help you replicate the statistical analyses that were covered in the Market Segmentation lecture. You should have this handout handy when you work on the (Market Segmentation) programming assignment where you will be asked to apply the learnings from the lecture on a different dataset.

Cluster Analysis

Cluster Analysis refers to a class of techniques used to classify individuals into groups such that:

Individuals within a group should be as similar as possible
Individuals belonging to different groups should be as dissimilar as possible

This handout shows three different cluster analysis techniques:

Hierarchical clustering
K-Means
Latent Class Analysis

In order to run these statistical methods, you need to install these R packages:

NbClust
mclust
gmodels

set.seed(1990)
library(NbClust)
library(mclust)
library(gmodels)

Reading and outputing data

The dataset includes data from 73 students (24 MBAs and 49 undergrads). These students were asked to allocate 100 points across six automobile attributes (Trendy, Styling, Reliability, Sportiness, Performance, and Comfort) in a way that reflects their importance in the purchase decision of which car to buy. We use this dataset to answer the following questions:

Are there different benefit segments among this student population?
How many segments?
How are they different in their constant-sum allocation?
How can we transform this information into actionable levers from a managerial standpoint?

Let us start by reading the data. Remember to start by setting the appropriate directory using the following code

setwd("your_directory")

We can now read the raw data:

seg_data <- read.csv(file = "SegmentationData.csv",row.names=1)
head(seg_data)

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis is one of the most popular technique used for market segmentation. It is a numerical procedure which attempts to separate a set of observations into clusters from the bottom-up by joining single individuals sequentially until we obtain one large cluster. Hence, this technique doesn’t require the pre-specification of the number of clusters,which can be assessed through the “dendogram” (a tree-like representation of the data).

More specifically, the algorithm works as follow:

Each respondent is initially assigned to his or her own cluster
Identify the distance between each cluster (intially between pairs of respondents)
The two closest clusters are combined into one
Repeat steps 2 and 3 until there is one unique cluster containing all the observations
Represent the clusters in a dendogram

A key aspect of hierarchical clustering consists in choosing how to compute the distance between two clusters. Is it equal to the maximal distance between two points from each of these clusters? Or the minimal distance? What about the distance between two points? In this handout, we will use Ward’s criterion which aims to minimize the total variance within-cluster. To do so, we use the R function hclust. We start by standardizing the data so that every variable is on the same scale. We then compute the euclidean distance between observations.

std_seg_data <- scale(seg_data[,c("Trendy", "Styling", "Reliability", "Sportiness", "Performance", "Comfort")]) 
dist <- dist(std_seg_data, method = "euclidean")
as.matrix(dist)[1:5,1:5]

         1        2        3        4        5
1 0.000000 3.730216 2.802191 1.775616 2.746615
2 3.730216 0.000000 4.218662 3.017462 2.984534
3 2.802191 4.218662 0.000000 1.974683 3.331082
4 1.775616 3.017462 1.974683 0.000000 2.924141
5 2.746615 2.984534 3.331082 2.924141 0.000000

We now use the function hclust() to apply hierarchical clustering on our data. We use the Ward criterion which aims to minimize the within-cluster variance. \ We obtain the dendogram below which can help us decide the number of clusters to retain. This number seems to be either 3 or 4. \ Note: It is important to set the seed to a specific value. This way you would always get the same labeling of the clusters. Otherwise, cluster 1 in one analysis may correspond to cluster 3 in another.

set.seed(1990)
clust <- hclust(dist, method = "ward.D2")
plot(clust)

The four-cluster solution

We start by 4 clusters as we see below:

set.seed(1990)
clust <- hclust(dist, method = "ward.D2")
plot(clust)
h_cluster <- cutree(clust, 4)
rect.hclust(clust, k=4, border="red")

Let us now look at some description of this clustering. The table below informs us with the number of individuals in each cluster:

table(h_cluster)

h_cluster
 1  2  3  4 
18 29 17  9

The table below reports the profiles of the four clusters (i.e., the clustering variables means by cluster). Looking at this table, we can describe the clusters as follows:

Cluster 1 values reliability and Performance
Cluster 2 values Sportiness and Comfort
Cluster 3 values Trendiness and Style
Cluster 4 values Style and Sportiness

Hence, it seems that Cluster 4 is a combination of Clusters 2 and 3. This suggests that 3 clusters may be better at capturing the heterogeneity of the subjects in this dataset.

hclust_summary <- aggregate(std_seg_data[,c("Trendy", "Styling", "Reliability", "Sportiness", "Performance", "Comfort")],by=list(h_cluster),FUN=mean)
hclust_summary

Three-Cluster Solution

plot(clust)
h_cluster <- cutree(clust, 3)
rect.hclust(clust, k=3, border="red")

table(h_cluster)

h_cluster
 1  2  3 
27 29 17

hclust_summary <- aggregate(std_seg_data[,c("Trendy", "Styling", "Reliability", "Sportiness", "Performance", "Comfort")],by=list(h_cluster),FUN=mean)
hclust_summary

This solution seems to have clusters of similar sizes. In addition, we can easily caracterize each of them. The first cluster cares about Performance and Reliability while Cluster 2 values Comfort and Sportiness. Finally, the third cluster cares about the appearance. Below, we rename those clusters according to their characteristics.

h_cluster <- factor(h_cluster,levels = c(1,2,3),
                    labels = c("Perf.", "Comfort", "Appearance"))

We can also focus on a given cluster by using the following code. Here the first one on the left:

plot(cut(as.dendrogram(clust), h=9)$lower[[3]])

Number of Clusters

As seen above, one can use the dendogram to decide on the appropriate number of clusters. The function NbClust examines all the indexes/criteria used to determine the optimal number of clusters and outputs the optimal number based on the majority rule. Note that since it’s a constant-sum allocation, we must use only 5 variables to avoid collinearity issues.

set.seed(1990)
NbClust(data=std_seg_data[,1:5], min.nc=3, max.nc=15, index="all", method="ward.D2")

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 7 proposed 3 as the best number of clusters 
* 2 proposed 4 as the best number of clusters 
* 2 proposed 5 as the best number of clusters 
* 2 proposed 6 as the best number of clusters 
* 4 proposed 9 as the best number of clusters 
* 3 proposed 12 as the best number of clusters 
* 1 proposed 13 as the best number of clusters 
* 2 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  3 
 
 
******************************************************************* 
$All.index
       KL      CH Hartigan     CCC    Scott   Marriot    TrCovW   TraceW Friedman  Rubin Cindex     DB Silhouette   Duda Pseudot2   Beale
3  4.2356 19.1708   9.1557 -3.8247 125.3102 546888028 3415.0147 232.5975   5.4783 1.5477 0.4026 1.6869     0.1987 0.6935   6.6287  1.2967
4  0.3189 17.2540   8.3534 -5.0953 159.1836 611300402 2593.8512 205.6938   7.0207 1.7502 0.3825 1.5521     0.2111 0.7443   7.2126  1.0261
5  0.5079 16.3550   8.9849 -6.0343 202.7795 525669059 2134.0463 183.4809   8.8405 1.9621 0.3560 1.5191     0.2206 0.5410   6.7863  2.3600
6  1.2239 16.3655   7.4554 -5.4291 240.3721 452300734 1641.4307 162.0669  10.4134 2.2213 0.4427 1.3403     0.2335 0.3851  11.1763  4.3725
7  0.9987 16.1533   6.8924 -5.0795 272.0101 399115542 1309.7081 145.8389  11.7806 2.4685 0.4429 1.1950     0.2414 0.6398   6.7563  1.6266
8  0.8637 16.0296   7.0554 -4.7518 305.5157 329419867 1063.7346 132.0490  13.4009 2.7263 0.4268 1.1583     0.2445 0.7768   8.9079  0.8713
9  1.0446 16.1775   6.6456 -4.2941 344.2663 245198975  894.1088 119.1193  15.5956 3.0222 0.3921 1.1251     0.2550 0.7288  10.7909  1.1258
10 0.9691 16.3520   6.6364 -3.8667 376.7922 193878168  720.2895 107.9138  17.7061 3.3360 0.3630 1.2087     0.2093 0.4678   4.5515  2.8491
11 1.2274 16.6619   5.7269 -3.3850 405.2744 158805977  566.4595  97.6295  19.4921 3.6874 0.4062 1.1481     0.2253 0.3736   5.0300  3.9357
12 1.4727 16.7917   4.4283 -3.0689 430.4837 133803651  464.2880  89.3741  21.0350 4.0280 0.4749 1.0742     0.2312 0.7289   7.8114  1.1113
13 0.9527 16.6021   4.4291 -3.0135 462.2226 101664547  418.1911  83.3250  23.7505 4.3204 0.4593 1.1319     0.2094 0.5973   3.3715  1.7587
14 1.0413 16.5171   4.2289 -2.9133 490.8040  79707962  370.8162  77.5969  25.9989 4.6394 0.4458 1.1454     0.2167 0.5413   5.9328  2.3211
15 0.9571 16.4550   4.2829 -2.8227 517.1570  63774583  329.0855  72.4071  28.2284 4.9719 0.4303 1.1209     0.2158 1.9784  -0.9891 -1.0319
   Ratkowsky    Ball Ptbiserial    Frey McClain   Dunn Hubert SDindex Dindex   SDbw
3     0.3432 77.5325     0.4553 -0.0754  1.2360 0.1677 0.0074  2.5719 1.6213 0.9774
4     0.3267 51.4234     0.4936  0.1486  1.3198 0.1677 0.0080  2.2785 1.5292 0.9946
5     0.3123 36.6962     0.5216 -0.0562  1.5473 0.1686 0.0088  2.2161 1.4514 0.6734
6     0.3017 27.0112     0.5383  0.0428  1.5724 0.2159 0.0091  1.9374 1.3769 0.5657
7     0.2912 20.8341     0.5458  0.1763  1.6005 0.2200 0.0092  1.7638 1.3091 0.4389
8     0.2811 16.5061     0.5535  0.0382  1.6962 0.2200 0.0094  1.7668 1.2513 0.4271
9     0.2725 13.2355     0.5828  0.9510  1.7837 0.2200 0.0103  1.7508 1.2091 0.4453
10    0.2644 10.7914     0.5028  0.0019  2.7486 0.1690 0.0106  2.1167 1.1366 0.4186
11    0.2573  8.8754     0.5089  0.0231  2.7520 0.1935 0.0109  2.1415 1.0868 0.3715
12    0.2502  7.4478     0.5125  1.7152  2.7587 0.2300 0.0109  2.0783 1.0454 0.3555
13    0.2429  6.4096     0.4248  0.0688  4.2473 0.2300 0.0113  2.4720 1.0019 0.3382
14    0.2365  5.5426     0.4268  0.1162  4.3133 0.2300 0.0119  2.5598 0.9698 0.3191
15    0.2305  4.8271     0.4257  0.0279  4.4504 0.2300 0.0125  2.6614 0.9406 0.3038

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
3          0.4234            20.4305       0.2745
4          0.4864            22.1752       0.4062
5          0.2868            19.8896       0.0573
6          0.2552            20.4342       0.0034
7          0.3776            19.7832       0.1668
8          0.5502            25.3445       0.5019
9          0.5399            24.7090       0.3493
10         0.1164            30.3727       0.0422
11         0.0442            64.8953       0.0177
12         0.4864            22.1752       0.3589
13         0.1725            23.9899       0.1581
14         0.2552            20.4342       0.0638
15        -0.0536           -39.3104       1.0000

$Best.nc
                    KL      CH Hartigan     CCC   Scott  Marriot   TrCovW TraceW Friedman   Rubin Cindex      DB Silhouette   Duda PseudoT2
Number_clusters 3.0000  3.0000   6.0000 15.0000  5.0000        9   4.0000 6.0000  13.0000 12.0000  5.000 12.0000      9.000 3.0000   3.0000
Value_Index     4.2356 19.1708   1.5295 -2.8227 43.5959 32900086 821.1635 5.1859   2.7154 -0.0482  0.356  1.0742      0.255 0.6935   6.6287
                 Beale Ratkowsky    Ball PtBiserial Frey McClain  Dunn Hubert SDindex Dindex    SDbw
Number_clusters 3.0000    3.0000  4.0000     9.0000    2   3.000 12.00      0  9.0000      0 15.0000
Value_Index     1.2967    0.3432 26.1091     0.5828   NA   1.236  0.23      0  1.7508      0  0.3038

$Best.partition
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
 1  1  1  2  1  3  2  2  1  2  1  2  2  3  1  2  3  2  3  3  1  2  1  1  3  3  2  3  2  2  3  3  2  3  3  2  1  2  2  1  2  1  2  1  3  2  2  2 
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 
 1  2  1  3  2  2  2  2  3  2  2  3  3  3  2  3  3  2  2  3  2  3  2  3  1

Targeting the Clusters/segments

We can now study our demographics and choice data in light of these cluster assignments using the funcion CrossTable:

Demographics

CrossTable(seg_data$MBA,h_cluster,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
             | h_cluster 
seg_data$MBA |      Perf. |    Comfort | Appearance |  Row Total | 
-------------|------------|------------|------------|------------|
         MBA |         14 |          6 |          4 |         24 | 
             |      0.583 |      0.250 |      0.167 |      0.329 | 
             |      0.519 |      0.207 |      0.235 |            | 
-------------|------------|------------|------------|------------|
   Undergrad |         13 |         23 |         13 |         49 | 
             |      0.265 |      0.469 |      0.265 |      0.671 | 
             |      0.481 |      0.793 |      0.765 |            | 
-------------|------------|------------|------------|------------|
Column Total |         27 |         29 |         17 |         73 | 
             |      0.370 |      0.397 |      0.233 |            | 
-------------|------------|------------|------------|------------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  7.03013     d.f. =  2     p =  0.02974588

Choice

CrossTable(h_cluster,seg_data$Choice,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
             | seg_data$Choice 
   h_cluster |       BMW |     Lexus |  Mercedes | Row Total | 
-------------|-----------|-----------|-----------|-----------|
       Perf. |        14 |         9 |         4 |        27 | 
             |     0.519 |     0.333 |     0.148 |     0.370 | 
             |     0.438 |     0.409 |     0.211 |           | 
-------------|-----------|-----------|-----------|-----------|
     Comfort |        10 |         8 |        11 |        29 | 
             |     0.345 |     0.276 |     0.379 |     0.397 | 
             |     0.312 |     0.364 |     0.579 |           | 
-------------|-----------|-----------|-----------|-----------|
  Appearance |         8 |         5 |         4 |        17 | 
             |     0.471 |     0.294 |     0.235 |     0.233 | 
             |     0.250 |     0.227 |     0.211 |           | 
-------------|-----------|-----------|-----------|-----------|
Column Total |        32 |        22 |        19 |        73 | 
             |     0.438 |     0.301 |     0.260 |           | 
-------------|-----------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  4.095664     d.f. =  4     p =  0.393214

See lecture on how to identify variables for targeting.

K-Means

We now focus on a different method called K-Means. This method, which requires us to specify in advance the number of clusters, aims to group the observations based on their similarity using an optimization procedure. Indeed, the aim is to minimize the within-cluster variation which is defined as the sum of square of the euclidean distance between each data point to the centroid of its cluster. More precisely, the algorithm works as follow:

Start by assigning each point to a cluster randomly
Compute the centroid of each cluster and the distances of each point to each centroid
Reassign each observation to the closest Centroid
Repeat Steps 2 and 3 until the within-cluster variance is minimized

Three Cluster Solution obtained using K-Means

Let us start by observing how the algorithm works on our data for 3 segments. We use the function kmeans(). Don’t forget to set the seed to a specific vaule (e.g., 1990).

set.seed(1990)
car_Cluster3 <-kmeans(std_seg_data, 3, iter.max=100,nstart=100)
car_Cluster3

K-means clustering with 3 clusters of sizes 18, 32, 23

Cluster means:
        Trendy    Styling Reliability Sportiness Performance     Comfort
1 -0.637247817 -0.6837159   1.1781135 -1.0328905   0.7785740  0.08642535
2 -0.003271873 -0.3788069  -0.3496669  0.4977728  -0.0445069  0.53615835
3  0.503267855  1.0621176  -0.4355087  0.1157956  -0.5473961 -0.81359668

Clustering vector:
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
 1  1  2  1  1  3  2  1  1  3  1  2  2  3  1  2  3  2  2  3  1  2  1  1  2  3  2  3  1  2  3  3  1  3  3  2  1  2  2  3  2  1  3  1  2  1  2  2 
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 
 3  2  2  3  2  2  2  2  3  2  2  2  2  3  2  3  3  2  2  3  3  3  2  3  1 

Within cluster sum of squares by cluster:
[1]  81.39207  83.90060 111.49649
 (between_SS / total_SS =  35.9 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"

Kmean_Cluster<-factor(car_Cluster3$cluster,levels = c(1,2,3),
                    labels = c("Perf. KM", "Comfort KM", "Appearance KM"))

Find the optimal number of clusters

A key question when using the K-Means clustering technique consists in choosing the optimal number of segments. In order to do that, we can use the function NbClust() as in hierarchical clustering by specifying the method kmeans as below. From the output, We see that the three-cluster solution is best.

set.seed(1990)
NbClust(data=std_seg_data[,1:5], min.nc=3, max.nc=15, index="all", method="kmeans")

*** : The Hubert index is a graphical method of determining the number of clusters.
                In the plot of Hubert index, we seek a significant knee that corresponds to a 
                significant increase of the value of the measure i.e the significant peak in Hubert
                index second differences plot.

*** : The D index is a graphical method of determining the number of clusters. 
                In the plot of D index, we seek a significant knee (the significant peak in Dindex
                second differences plot) that corresponds to a significant increase of the value of
                the measure. 
 
******************************************************************* 
* Among all indices:                                                
* 7 proposed 3 as the best number of clusters 
* 2 proposed 4 as the best number of clusters 
* 1 proposed 6 as the best number of clusters 
* 3 proposed 7 as the best number of clusters 
* 5 proposed 8 as the best number of clusters 
* 1 proposed 10 as the best number of clusters 
* 1 proposed 11 as the best number of clusters 
* 1 proposed 12 as the best number of clusters 
* 2 proposed 15 as the best number of clusters 

                   ***** Conclusion *****                            
 
* According to the majority rule, the best number of clusters is  3 
 
 
******************************************************************* 
$All.index
         KL      CH Hartigan     CCC    Scott   Marriot    TrCovW   TraceW Friedman  Rubin Cindex     DB Silhouette   Duda Pseudot2   Beale
3    2.6615 23.0027   9.9985 -2.3384 161.6503 332431565 3142.6116 217.2313   6.9183 1.6572 0.4029 1.6368     0.2211 0.7619   6.8769  0.9339
4    2.6211 20.5604   5.9101 -3.4050 206.2171 320952678 2494.7351 190.0809   8.7864 1.8939 0.3875 1.4764     0.2351 1.4900  -4.9326 -0.9006
5    0.3090 17.9546   3.5125 -5.0208 211.9276 463754305 1884.0863 175.0844   9.3054 2.0562 0.3747 1.3972     0.2146 1.6044  -3.7672 -0.9825
6    0.1667 15.5757  15.2717 -5.9661 250.0332 396233308 1841.2783 166.4846  11.9435 2.1624 0.3825 1.3828     0.1678 1.0555  -0.7881 -0.1542
7    2.8226 18.2077   7.6528 -3.7082 297.2619 282403511 1117.8685 135.5809  13.1919 2.6552 0.3826 1.2398     0.2296 2.1023  -5.2433 -1.2308
8  115.6198 18.2290   3.1015 -3.2577 325.0955 251921280  850.0973 121.4935  14.7248 2.9631 0.4170 1.1432     0.2537 1.9135  -9.0706 -1.2807
9    0.0103 16.8361   6.1543 -3.8303 341.9554 253084884  780.5087 115.9604  15.5678 3.1045 0.4631 1.2340     0.1908 2.2632  -7.8141 -1.3975
10   0.7749 16.8213   7.0516 -3.5342 370.8399 210349050  629.2076 105.7877  17.4933 3.4030 0.3716 1.2501     0.2030 0.6031   2.6329  1.7167
11   5.0114 17.2605   3.1097 -2.9648 406.5994 155949665  518.0238  95.1388  19.4733 3.7839 0.4513 1.1235     0.2118 2.0940  -8.3592 -1.3081
12   0.2795 16.4907   5.3651 -3.2843 446.4373 107536543  493.6384  90.5948  24.4889 3.9737 0.4432 1.1876     0.2075 0.9828   0.1929  0.0507
13   1.0804 16.6162   5.0360 -3.0034 473.8984  86637820  423.0730  83.2710  26.1131 4.3232 0.4365 1.1268     0.2093 1.4350  -3.6377 -0.8433
14   1.9907 16.7292   3.2865 -2.7603 487.2320  83705207  340.7177  76.8230  26.1213 4.6861 0.4219 1.1091     0.2127 1.3559  -1.3124 -0.5477
15   1.0396 16.3524   3.1017 -2.8976 510.4032  69956428  321.4077  72.7695  27.6550 4.9471 0.4171 1.0293     0.2340 9.3984  -5.3616  0.0000
   Ratkowsky    Ball Ptbiserial     Frey McClain   Dunn Hubert SDindex Dindex   SDbw
3     0.3612 72.4104     0.4486   0.1157  1.2726 0.1677 0.0068  2.4006 1.5698 0.9262
4     0.3427 47.5202     0.4837   1.0258  1.5254 0.1342 0.0073  2.1293 1.4750 0.7671
5     0.3201 35.0169     0.4512  -1.0806  2.0027 0.1651 0.0073  2.0377 1.4305 0.6090
6     0.2949 27.7474     0.3805  -0.2084  2.5868 0.0905 0.0073  1.9698 1.3957 0.5129
7     0.2981 19.3687     0.4840  -0.2534  2.2978 0.1520 0.0089  1.8817 1.2548 0.4441
8     0.2876 15.1867     0.5323  -5.0634  2.1224 0.2019 0.0094  1.8598 1.1989 0.4061
9     0.2744 12.8845     0.3980   0.0564  3.8572 0.1324 0.0090  2.0189 1.1776 0.3576
10    0.2657 10.5788     0.4099  -0.3349  4.1242 0.1690 0.0104  2.2013 1.1187 0.3598
11    0.2586  8.6490     0.4576 -23.5430  3.4951 0.1321 0.0109  2.1409 1.0738 0.3560
12    0.2496  7.5496     0.4283   0.5173  4.0171 0.1460 0.0110  2.5475 1.0423 0.3064
13    0.2430  6.4055     0.4021   0.0993  4.7440 0.1487 0.0117  2.8269 0.9999 0.3015
14    0.2370  5.4874     0.4017  -0.2287  4.8915 0.1309 0.0122  2.6940 0.9648 0.2900
15    0.2305  4.8513     0.4087   0.6242  4.7589 0.1309 0.0121  2.6062 0.9323 0.2517

$All.CriticalValues
   CritValue_Duda CritValue_PseudoT2 Fvalue_Beale
3          0.4864            23.2312       0.4623
4          0.2552            43.7876       1.0000
5          0.1725            47.9798       1.0000
6          0.4234            20.4305       1.0000
7          0.0442           216.3177       1.0000
8          0.2177            68.2773       1.0000
9          0.1164           106.3046       1.0000
10         0.1725            19.1919       0.1675
11         0.1164           121.4910       1.0000
12         0.3776            18.1346       0.9983
13         0.2868            29.8345       1.0000
14        -0.0536           -98.2760       1.0000
15        -0.4373           -19.7211          NaN

$Best.nc
                      KL      CH Hartigan     CCC   Scott   Marriot   TrCovW  TraceW Friedman   Rubin  Cindex      DB Silhouette   Duda PseudoT2
Number_clusters   8.0000  3.0000   6.0000  3.0000  7.0000         4   7.0000  7.0000  12.0000 11.0000 10.0000 15.0000     8.0000 3.0000   3.0000
Value_Index     115.6198 23.0027  11.7591 -2.3384 47.2286 154280514 723.4098 16.8164   5.0156 -0.1911  0.3716  1.0293     0.2537 0.7619   6.8769
                 Beale Ratkowsky    Ball PtBiserial Frey McClain   Dunn Hubert SDindex Dindex    SDbw
Number_clusters 3.0000    3.0000  4.0000     8.0000    2  3.0000 8.0000      0  8.0000      0 15.0000
Value_Index     0.9339    0.3612 24.8902     0.5323   NA  1.2726 0.2019      0  1.8598      0  0.2517

$Best.partition
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
 2  2  1  2  2  1  3  2  2  3  2  3  3  1  2  3  1  3  3  1  2  3  2  2  3  1  3  1  2  3  1  1  2  1  1  3  2  3  3  3  3  2  3  2  1  2  3  3 
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 
 1  3  3  1  3  3  3  3  1  3  3  1  1  1  3  1  1  3  3  1  3  1  3  1  2

Results Comparison

Looking at the cluster means above, we see that the clusters defined with the kmeans function are characterized similarly as before. Thus, we relabel them to describe them more accurately. We can now compare this clustering to the demographics and choice as well as the hierarchical clustering.

Demographics

CrossTable(seg_data$MBA,Kmean_Cluster,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
             | Kmean_Cluster 
seg_data$MBA |      Perf. KM |    Comfort KM | Appearance KM |     Row Total | 
-------------|---------------|---------------|---------------|---------------|
         MBA |            11 |             8 |             5 |            24 | 
             |         0.458 |         0.333 |         0.208 |         0.329 | 
             |         0.611 |         0.250 |         0.217 |               | 
-------------|---------------|---------------|---------------|---------------|
   Undergrad |             7 |            24 |            18 |            49 | 
             |         0.143 |         0.490 |         0.367 |         0.671 | 
             |         0.389 |         0.750 |         0.783 |               | 
-------------|---------------|---------------|---------------|---------------|
Column Total |            18 |            32 |            23 |            73 | 
             |         0.247 |         0.438 |         0.315 |               | 
-------------|---------------|---------------|---------------|---------------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  8.694824     d.f. =  2     p =  0.01294026

Choice

CrossTable(Kmean_Cluster,seg_data$Choice,prop.chisq = FALSE, prop.r = T, prop.c = T,prop.t = F,chisq = T)


 
   Cell Contents
|-------------------------|
|                       N |
|           N / Row Total |
|           N / Col Total |
|-------------------------|

 
Total Observations in Table:  73 

 
              | seg_data$Choice 
Kmean_Cluster |       BMW |     Lexus |  Mercedes | Row Total | 
--------------|-----------|-----------|-----------|-----------|
     Perf. KM |         7 |         8 |         3 |        18 | 
              |     0.389 |     0.444 |     0.167 |     0.247 | 
              |     0.219 |     0.364 |     0.158 |           | 
--------------|-----------|-----------|-----------|-----------|
   Comfort KM |        14 |         8 |        10 |        32 | 
              |     0.438 |     0.250 |     0.312 |     0.438 | 
              |     0.438 |     0.364 |     0.526 |           | 
--------------|-----------|-----------|-----------|-----------|
Appearance KM |        11 |         6 |         6 |        23 | 
              |     0.478 |     0.261 |     0.261 |     0.315 | 
              |     0.344 |     0.273 |     0.316 |           | 
--------------|-----------|-----------|-----------|-----------|
 Column Total |        32 |        22 |        19 |        73 | 
              |     0.438 |     0.301 |     0.260 |           | 
--------------|-----------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  2.753465     d.f. =  4     p =  0.5998918

Hierarchical Clustering

CrossTable(h_cluster,Kmean_Cluster,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)

Latent Class Analysis

Latent Class Analysis is a method to identify cluster membership of subjects using the observable variables that describe them. The approach consists in estimating for each individual the probability to belong to a “latent class” or cluster. In turn, each cluster is defined in terms of its “geometry” and “orientation” as cloud of points. As such, this technique belongs to the family of gaussian finite mixture models. This approach relies on a different optimization procedure that aims to maximize the likelihood (versus minimize the distances between each point). Hence, the tools to assess the optimal number of classes differ. We now perform this analysis using the package mclust. We start by determining the optimal model based on BIC using the function mclustBIC().

Find the optimal model

set.seed(1990)
mclustBIC(std_seg_data[,1:5],verbose=F)

Hence, the optimal model is VEE with 2 segments. This means that the data can be clustered in two clusters which will both be modeled by a Normal distribution with the same covariance matrix. We obtain more details about the optimal model below:

set.seed(1990)
lca_clust <- Mclust(std_seg_data[,1:5],verbose = FALSE)
summary(lca_clust)

----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust VEE (ellipsoidal, equal shape and orientation) model with 2 components:

 log.likelihood  n df       BIC       ICL
      -431.8862 73 27 -979.6149 -990.9626

Clustering table:
 1  2 
47 26

We now interpret each cluster and rename them to describe them accurately:

lca_clusters <- lca_clust$classification
lca_clust_summary <- aggregate(std_seg_data[,c("Trendy", "Styling", "Reliability", "Sportiness", "Performance", "Comfort")],by=list(lca_clusters),FUN=mean)
lca_clust_summary
lca_clusters<-factor(lca_clusters,levels = c(1,2),
                    labels = c("Reliability LCA", "Comfort LCA"))

Results Comparison

Let us compare this solution to the demographics and choice data as well as the hierarchical clustering and K-Means:

Demographics

CrossTable(seg_data$MBA,lca_clusters,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)

Choice

CrossTable(lca_clusters,seg_data$Choice,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)

Hierarchical Clustering

CrossTable(h_cluster,lca_clusters,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)

K-Means Clustering

CrossTable(Kmean_Cluster,lca_clusters,prop.chisq = FALSE, prop.r = T, prop.c = T,
           prop.t = F,chisq = T)

LS0tDQp0aXRsZTogIlN0YXRpc3RpY2FsIEFuYWx5c2lzIEhhbmRvdXQgZm9yIHRoZSBNYXJrZXQgU2VnbWVudGF0aW9uIExlY3R1cmUiDQpvdXRwdXQ6DQogIGh0bWxfZG9jdW1lbnQ6DQogICAgZGZfcHJpbnQ6IHBhZ2VkDQogICAgdG9jOiB5ZXMNCiAgaHRtbF9ub3RlYm9vazoNCiAgICB0b2M6IHllcw0KLS0tDQoNClRoaXMgaGFuZG91dCBpcyBkZXNpZ25lZCB0byBoZWxwIHlvdSByZXBsaWNhdGUgdGhlIHN0YXRpc3RpY2FsIGFuYWx5c2VzIHRoYXQgd2VyZSBjb3ZlcmVkIGluIHRoZSBNYXJrZXQgU2VnbWVudGF0aW9uIGxlY3R1cmUuIFlvdSBzaG91bGQgaGF2ZSB0aGlzIGhhbmRvdXQgaGFuZHkgd2hlbiB5b3Ugd29yayBvbiB0aGUgKE1hcmtldCBTZWdtZW50YXRpb24pIHByb2dyYW1taW5nIGFzc2lnbm1lbnQgd2hlcmUgeW91IHdpbGwgYmUgYXNrZWQgdG8gYXBwbHkgdGhlIGxlYXJuaW5ncyBmcm9tIHRoZSBsZWN0dXJlIG9uIGEgZGlmZmVyZW50IGRhdGFzZXQuDQoNCiMgQ2x1c3RlciBBbmFseXNpcw0KDQpDbHVzdGVyIEFuYWx5c2lzIHJlZmVycyB0byBhIGNsYXNzIG9mIHRlY2huaXF1ZXMgdXNlZCB0byBjbGFzc2lmeSBpbmRpdmlkdWFscyBpbnRvIGdyb3VwcyBzdWNoIHRoYXQ6DQoNCiAgKiBJbmRpdmlkdWFscyB3aXRoaW4gYSBncm91cCBzaG91bGQgYmUgYXMgc2ltaWxhciBhcyBwb3NzaWJsZQ0KICAqIEluZGl2aWR1YWxzIGJlbG9uZ2luZyB0byBkaWZmZXJlbnQgZ3JvdXBzIHNob3VsZCBiZSBhcyBkaXNzaW1pbGFyIGFzIHBvc3NpYmxlDQogIA0KVGhpcyBoYW5kb3V0IHNob3dzIHRocmVlIGRpZmZlcmVudCBjbHVzdGVyIGFuYWx5c2lzIHRlY2huaXF1ZXM6DQogIA0KICAxLiBIaWVyYXJjaGljYWwgY2x1c3RlcmluZw0KICAyLiBLLU1lYW5zDQogIDMuIExhdGVudCBDbGFzcyBBbmFseXNpcw0KICANCkluIG9yZGVyIHRvIHJ1biB0aGVzZSBzdGF0aXN0aWNhbCBtZXRob2RzLCB5b3UgbmVlZCB0byBpbnN0YWxsIHRoZXNlIFIgcGFja2FnZXM6DQoNCiAgKiBOYkNsdXN0DQogICogbWNsdXN0DQogICogZ21vZGVscw0KICANCmBgYHtyLGVjaG89VFJVRSwgY29tbWVudD1GQUxTRSx3YXJuaW5nPUZBTFNFLG1lc3NhZ2U9RkFMU0V9DQpzZXQuc2VlZCgxOTkwKQ0KbGlicmFyeShOYkNsdXN0KQ0KbGlicmFyeShtY2x1c3QpDQpsaWJyYXJ5KGdtb2RlbHMpDQpgYGANCg0KIyBSZWFkaW5nIGFuZCBvdXRwdXRpbmcgZGF0YQ0KDQpUaGUgZGF0YXNldCBpbmNsdWRlcyBkYXRhIGZyb20gNzMgc3R1ZGVudHMgKDI0IE1CQXMgYW5kIDQ5IHVuZGVyZ3JhZHMpLiBUaGVzZSBzdHVkZW50cyB3ZXJlIGFza2VkIHRvIGFsbG9jYXRlIDEwMCBwb2ludHMgYWNyb3NzIHNpeCBhdXRvbW9iaWxlIGF0dHJpYnV0ZXMgKFRyZW5keSwgU3R5bGluZywgUmVsaWFiaWxpdHksIFNwb3J0aW5lc3MsIFBlcmZvcm1hbmNlLCBhbmQgQ29tZm9ydCkgaW4gYSB3YXkgdGhhdCByZWZsZWN0cyB0aGVpciBpbXBvcnRhbmNlIGluIHRoZSBwdXJjaGFzZSBkZWNpc2lvbiBvZiB3aGljaCBjYXIgdG8gYnV5LiBXZSB1c2UgdGhpcyBkYXRhc2V0IHRvIGFuc3dlciB0aGUgZm9sbG93aW5nIHF1ZXN0aW9uczoNCg0KICAxLiBBcmUgdGhlcmUgZGlmZmVyZW50IGJlbmVmaXQgc2VnbWVudHMgYW1vbmcgdGhpcyBzdHVkZW50IHBvcHVsYXRpb24/DQogIDIuIEhvdyBtYW55IHNlZ21lbnRzPw0KICAzLiBIb3cgYXJlIHRoZXkgZGlmZmVyZW50IGluIHRoZWlyIGNvbnN0YW50LXN1bSBhbGxvY2F0aW9uPw0KICA0LiBIb3cgY2FuIHdlIHRyYW5zZm9ybSB0aGlzIGluZm9ybWF0aW9uIGludG8gYWN0aW9uYWJsZSBsZXZlcnMgZnJvbSBhIG1hbmFnZXJpYWwgc3RhbmRwb2ludD8gDQoNCkxldCB1cyBzdGFydCBieSByZWFkaW5nIHRoZSBkYXRhLiBSZW1lbWJlciB0byBzdGFydCBieSBzZXR0aW5nIHRoZSBhcHByb3ByaWF0ZSBkaXJlY3RvcnkgdXNpbmcgdGhlIGZvbGxvd2luZyBjb2RlDQoNCmBgYHtyLGV2YWw9RkFMU0V9DQpzZXR3ZCgieW91cl9kaXJlY3RvcnkiKQ0KYGBgDQoNCldlIGNhbiBub3cgcmVhZCB0aGUgcmF3IGRhdGE6IA0KDQpgYGB7cixldmFsPVRSVUUsZWNobz1UUlVFfQ0Kc2VnX2RhdGEgPC0gcmVhZC5jc3YoZmlsZSA9ICJTZWdtZW50YXRpb25EYXRhLmNzdiIscm93Lm5hbWVzPTEpDQpoZWFkKHNlZ19kYXRhKQ0KYGBgDQoNCiMgSGllcmFyY2hpY2FsIENsdXN0ZXJpbmcgQW5hbHlzaXMNCg0KSGllcmFyY2hpY2FsIENsdXN0ZXJpbmcgQW5hbHlzaXMgaXMgb25lIG9mIHRoZSBtb3N0IHBvcHVsYXIgdGVjaG5pcXVlIHVzZWQgZm9yIG1hcmtldCBzZWdtZW50YXRpb24uIEl0IGlzIGEgbnVtZXJpY2FsIHByb2NlZHVyZSB3aGljaCBhdHRlbXB0cyB0byBzZXBhcmF0ZSBhIHNldCBvZiBvYnNlcnZhdGlvbnMgaW50byBjbHVzdGVycyBmcm9tIHRoZSBib3R0b20tdXAgYnkgam9pbmluZyBzaW5nbGUgaW5kaXZpZHVhbHMgc2VxdWVudGlhbGx5IHVudGlsIHdlIG9idGFpbiBvbmUgbGFyZ2UgY2x1c3Rlci4gSGVuY2UsIHRoaXMgdGVjaG5pcXVlIGRvZXNuJ3QgcmVxdWlyZSB0aGUgcHJlLXNwZWNpZmljYXRpb24gb2YgdGhlIG51bWJlciBvZiBjbHVzdGVycyx3aGljaCBjYW4gYmUgYXNzZXNzZWQgdGhyb3VnaCB0aGUgImRlbmRvZ3JhbSIgKGEgdHJlZS1saWtlIHJlcHJlc2VudGF0aW9uIG9mIHRoZSBkYXRhKS4NCg0KTW9yZSBzcGVjaWZpY2FsbHksIHRoZSBhbGdvcml0aG0gd29ya3MgYXMgZm9sbG93Og0KICANCiAgMS4gRWFjaCByZXNwb25kZW50IGlzIGluaXRpYWxseSBhc3NpZ25lZCB0byBoaXMgb3IgaGVyIG93biBjbHVzdGVyDQogIDIuIElkZW50aWZ5IHRoZSBkaXN0YW5jZSBiZXR3ZWVuIGVhY2ggY2x1c3RlciAoaW50aWFsbHkgYmV0d2VlbiBwYWlycyBvZiByZXNwb25kZW50cykNCiAgMy4gVGhlIHR3byBjbG9zZXN0IGNsdXN0ZXJzIGFyZSBjb21iaW5lZCBpbnRvIG9uZQ0KICA0LiBSZXBlYXQgc3RlcHMgMiBhbmQgMyB1bnRpbCB0aGVyZSBpcyBvbmUgdW5pcXVlIGNsdXN0ZXIgY29udGFpbmluZyBhbGwgdGhlIG9ic2VydmF0aW9ucw0KICA1LiBSZXByZXNlbnQgdGhlIGNsdXN0ZXJzIGluIGEgZGVuZG9ncmFtDQogIA0KQSBrZXkgYXNwZWN0IG9mIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIGNvbnNpc3RzIGluIGNob29zaW5nIGhvdyB0byBjb21wdXRlIHRoZSBkaXN0YW5jZSBiZXR3ZWVuIHR3byBjbHVzdGVycy4gSXMgaXQgZXF1YWwgdG8gdGhlIG1heGltYWwgZGlzdGFuY2UgYmV0d2VlbiB0d28gcG9pbnRzIGZyb20gZWFjaCBvZiB0aGVzZSBjbHVzdGVycz8gT3IgdGhlIG1pbmltYWwgZGlzdGFuY2U/IFdoYXQgYWJvdXQgdGhlIGRpc3RhbmNlIGJldHdlZW4gdHdvIHBvaW50cz8gSW4gdGhpcyBoYW5kb3V0LCB3ZSB3aWxsIHVzZSBXYXJkJ3MgY3JpdGVyaW9uIHdoaWNoIGFpbXMgdG8gbWluaW1pemUgdGhlIHRvdGFsIHZhcmlhbmNlIHdpdGhpbi1jbHVzdGVyLiBUbyBkbyBzbywgd2UgdXNlIHRoZSBSIGZ1bmN0aW9uIGhjbHVzdC4gV2Ugc3RhcnQgYnkgc3RhbmRhcmRpemluZyB0aGUgZGF0YSBzbyB0aGF0IGV2ZXJ5IHZhcmlhYmxlIGlzIG9uIHRoZSBzYW1lIHNjYWxlLiBXZSB0aGVuIGNvbXB1dGUgdGhlIGV1Y2xpZGVhbiBkaXN0YW5jZSBiZXR3ZWVuIG9ic2VydmF0aW9ucy4gIA0KDQpgYGB7cn0NCnN0ZF9zZWdfZGF0YSA8LSBzY2FsZShzZWdfZGF0YVssYygiVHJlbmR5IiwgIlN0eWxpbmciLCAiUmVsaWFiaWxpdHkiLCAiU3BvcnRpbmVzcyIsICJQZXJmb3JtYW5jZSIsICJDb21mb3J0IildKSANCmRpc3QgPC0gZGlzdChzdGRfc2VnX2RhdGEsIG1ldGhvZCA9ICJldWNsaWRlYW4iKQ0KYXMubWF0cml4KGRpc3QpWzE6NSwxOjVdDQpgYGANCg0KV2Ugbm93IHVzZSB0aGUgZnVuY3Rpb24gaGNsdXN0KCkgdG8gYXBwbHkgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgb24gb3VyIGRhdGEuIFdlIHVzZSB0aGUgV2FyZCBjcml0ZXJpb24gd2hpY2ggYWltcyB0byBtaW5pbWl6ZSB0aGUgd2l0aGluLWNsdXN0ZXIgdmFyaWFuY2UuIFxcDQpXZSBvYnRhaW4gdGhlIGRlbmRvZ3JhbSBiZWxvdyB3aGljaCBjYW4gaGVscCB1cyBkZWNpZGUgdGhlIG51bWJlciBvZiBjbHVzdGVycyB0byByZXRhaW4uIFRoaXMgbnVtYmVyIHNlZW1zIHRvIGJlIGVpdGhlciAzIG9yIDQuIFxcDQpOb3RlOiBJdCBpcyBpbXBvcnRhbnQgdG8gc2V0IHRoZSBzZWVkIHRvIGEgc3BlY2lmaWMgdmFsdWUuICBUaGlzIHdheSB5b3Ugd291bGQgYWx3YXlzIGdldCB0aGUgc2FtZSBsYWJlbGluZyBvZiB0aGUgY2x1c3RlcnMuIE90aGVyd2lzZSwgY2x1c3RlciAxIGluIG9uZSBhbmFseXNpcyBtYXkgY29ycmVzcG9uZCB0byBjbHVzdGVyIDMgaW4gYW5vdGhlci4NCg0KDQpgYGB7cn0NCnNldC5zZWVkKDE5OTApDQpjbHVzdCA8LSBoY2x1c3QoZGlzdCwgbWV0aG9kID0gIndhcmQuRDIiKQ0KcGxvdChjbHVzdCkNCmBgYA0KDQojIyBUaGUgZm91ci1jbHVzdGVyIHNvbHV0aW9uDQoNCldlIHN0YXJ0IGJ5IDQgY2x1c3RlcnMgYXMgd2Ugc2VlIGJlbG93Og0KDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMTk5MCkNCmNsdXN0IDwtIGhjbHVzdChkaXN0LCBtZXRob2QgPSAid2FyZC5EMiIpDQpwbG90KGNsdXN0KQ0KaF9jbHVzdGVyIDwtIGN1dHJlZShjbHVzdCwgNCkNCnJlY3QuaGNsdXN0KGNsdXN0LCBrPTQsIGJvcmRlcj0icmVkIikNCmBgYA0KDQpMZXQgdXMgbm93IGxvb2sgYXQgc29tZSBkZXNjcmlwdGlvbiBvZiB0aGlzIGNsdXN0ZXJpbmcuIFRoZSB0YWJsZSBiZWxvdyBpbmZvcm1zIHVzIHdpdGggdGhlIG51bWJlciBvZiBpbmRpdmlkdWFscyBpbiBlYWNoIGNsdXN0ZXI6DQoNCmBgYHtyfQ0KdGFibGUoaF9jbHVzdGVyKQ0KYGBgDQpUaGUgdGFibGUgYmVsb3cgcmVwb3J0cyB0aGUgcHJvZmlsZXMgb2YgdGhlIGZvdXIgY2x1c3RlcnMgKGkuZS4sIHRoZSBjbHVzdGVyaW5nIHZhcmlhYmxlcyBtZWFucyBieSBjbHVzdGVyKS4gIExvb2tpbmcgYXQgdGhpcyB0YWJsZSwgd2UgY2FuIGRlc2NyaWJlIHRoZSBjbHVzdGVycyBhcyBmb2xsb3dzOg0KDQogIDEuIENsdXN0ZXIgMSB2YWx1ZXMgcmVsaWFiaWxpdHkgYW5kIFBlcmZvcm1hbmNlDQogIDIuIENsdXN0ZXIgMiB2YWx1ZXMgU3BvcnRpbmVzcyBhbmQgQ29tZm9ydA0KICAzLiBDbHVzdGVyIDMgdmFsdWVzIFRyZW5kaW5lc3MgYW5kIFN0eWxlDQogIDQuIENsdXN0ZXIgNCB2YWx1ZXMgU3R5bGUgYW5kIFNwb3J0aW5lc3MNCg0KSGVuY2UsIGl0IHNlZW1zIHRoYXQgQ2x1c3RlciA0IGlzIGEgY29tYmluYXRpb24gb2YgQ2x1c3RlcnMgMiBhbmQgMy4gVGhpcyBzdWdnZXN0cyB0aGF0IDMgY2x1c3RlcnMgbWF5IGJlIGJldHRlciBhdCBjYXB0dXJpbmcgdGhlIGhldGVyb2dlbmVpdHkgb2YgdGhlIHN1YmplY3RzIGluIHRoaXMgZGF0YXNldC4gDQoNCmBgYHtyfQ0KaGNsdXN0X3N1bW1hcnkgPC0gYWdncmVnYXRlKHN0ZF9zZWdfZGF0YVssYygiVHJlbmR5IiwgIlN0eWxpbmciLCAiUmVsaWFiaWxpdHkiLCAiU3BvcnRpbmVzcyIsICJQZXJmb3JtYW5jZSIsICJDb21mb3J0IildLGJ5PWxpc3QoaF9jbHVzdGVyKSxGVU49bWVhbikNCmhjbHVzdF9zdW1tYXJ5DQpgYGANCiMjIFRocmVlLUNsdXN0ZXIgU29sdXRpb24NCg0KYGBge3J9DQpwbG90KGNsdXN0KQ0KaF9jbHVzdGVyIDwtIGN1dHJlZShjbHVzdCwgMykNCnJlY3QuaGNsdXN0KGNsdXN0LCBrPTMsIGJvcmRlcj0icmVkIikNCmBgYA0KDQoNCmBgYHtyfQ0KdGFibGUoaF9jbHVzdGVyKQ0KYGBgDQoNCmBgYHtyfQ0KaGNsdXN0X3N1bW1hcnkgPC0gYWdncmVnYXRlKHN0ZF9zZWdfZGF0YVssYygiVHJlbmR5IiwgIlN0eWxpbmciLCAiUmVsaWFiaWxpdHkiLCAiU3BvcnRpbmVzcyIsICJQZXJmb3JtYW5jZSIsICJDb21mb3J0IildLGJ5PWxpc3QoaF9jbHVzdGVyKSxGVU49bWVhbikNCmhjbHVzdF9zdW1tYXJ5DQpgYGANClRoaXMgc29sdXRpb24gc2VlbXMgdG8gaGF2ZSBjbHVzdGVycyBvZiBzaW1pbGFyIHNpemVzLiBJbiBhZGRpdGlvbiwgd2UgY2FuIGVhc2lseSBjYXJhY3Rlcml6ZSBlYWNoIG9mIHRoZW0uIFRoZSBmaXJzdCBjbHVzdGVyIGNhcmVzIGFib3V0IFBlcmZvcm1hbmNlIGFuZCBSZWxpYWJpbGl0eSB3aGlsZSBDbHVzdGVyIDIgdmFsdWVzIENvbWZvcnQgYW5kIFNwb3J0aW5lc3MuIEZpbmFsbHksIHRoZSB0aGlyZCBjbHVzdGVyIGNhcmVzIGFib3V0IHRoZSBhcHBlYXJhbmNlLiBCZWxvdywgd2UgcmVuYW1lIHRob3NlIGNsdXN0ZXJzIGFjY29yZGluZyB0byB0aGVpciBjaGFyYWN0ZXJpc3RpY3MuIA0KDQpgYGB7cn0NCmhfY2x1c3RlciA8LSBmYWN0b3IoaF9jbHVzdGVyLGxldmVscyA9IGMoMSwyLDMpLA0KICAgICAgICAgICAgICAgICAgICBsYWJlbHMgPSBjKCJQZXJmLiIsICJDb21mb3J0IiwgIkFwcGVhcmFuY2UiKSkNCmBgYA0KDQoNCldlIGNhbiBhbHNvIGZvY3VzIG9uIGEgZ2l2ZW4gY2x1c3RlciBieSB1c2luZyB0aGUgZm9sbG93aW5nIGNvZGUuIEhlcmUgdGhlIGZpcnN0IG9uZSBvbiB0aGUgbGVmdDoNCg0KYGBge3J9DQpwbG90KGN1dChhcy5kZW5kcm9ncmFtKGNsdXN0KSwgaD05KSRsb3dlcltbM11dKQ0KYGBgDQoNCiMjIE51bWJlciBvZiBDbHVzdGVycw0KDQpBcyBzZWVuIGFib3ZlLCBvbmUgY2FuIHVzZSB0aGUgZGVuZG9ncmFtIHRvIGRlY2lkZSBvbiB0aGUgYXBwcm9wcmlhdGUgbnVtYmVyIG9mIGNsdXN0ZXJzLiBUaGUgZnVuY3Rpb24gTmJDbHVzdCBleGFtaW5lcyBhbGwgdGhlIGluZGV4ZXMvY3JpdGVyaWEgdXNlZCB0byBkZXRlcm1pbmUgdGhlIG9wdGltYWwgbnVtYmVyIG9mIGNsdXN0ZXJzIGFuZCBvdXRwdXRzIHRoZSBvcHRpbWFsIG51bWJlciBiYXNlZCBvbiB0aGUgbWFqb3JpdHkgcnVsZS4gTm90ZSB0aGF0IHNpbmNlIGl0J3MgYSBjb25zdGFudC1zdW0gYWxsb2NhdGlvbiwgd2UgbXVzdCB1c2Ugb25seSA1IHZhcmlhYmxlcyB0byBhdm9pZCBjb2xsaW5lYXJpdHkgaXNzdWVzLiANCg0KYGBge3IsZWNobz1UUlVFfQ0Kc2V0LnNlZWQoMTk5MCkNCk5iQ2x1c3QoZGF0YT1zdGRfc2VnX2RhdGFbLDE6NV0sIG1pbi5uYz0zLCBtYXgubmM9MTUsIGluZGV4PSJhbGwiLCBtZXRob2Q9IndhcmQuRDIiKQ0KDQpgYGANCg0KIyMgVGFyZ2V0aW5nIHRoZSBDbHVzdGVycy9zZWdtZW50cw0KDQpXZSBjYW4gbm93IHN0dWR5IG91ciBkZW1vZ3JhcGhpY3MgYW5kIGNob2ljZSBkYXRhIGluIGxpZ2h0IG9mIHRoZXNlIGNsdXN0ZXIgYXNzaWdubWVudHMgdXNpbmcgdGhlIGZ1bmNpb24gQ3Jvc3NUYWJsZToNCg0KIyMjIERlbW9ncmFwaGljcw0KYGBge3Isd2FybmluZz1GQUxTRX0NCkNyb3NzVGFibGUoc2VnX2RhdGEkTUJBLGhfY2x1c3Rlcixwcm9wLmNoaXNxID0gRkFMU0UsIHByb3AuciA9IFQsIHByb3AuYyA9IFQsDQogICAgICAgICAgIHByb3AudCA9IEYsY2hpc3EgPSBUKQ0KYGBgDQojIyMgQ2hvaWNlDQoNCmBgYHtyLHdhcm5pbmc9RkFMU0V9DQpDcm9zc1RhYmxlKGhfY2x1c3RlcixzZWdfZGF0YSRDaG9pY2UscHJvcC5jaGlzcSA9IEZBTFNFLCBwcm9wLnIgPSBULCBwcm9wLmMgPSBULA0KICAgICAgICAgICBwcm9wLnQgPSBGLGNoaXNxID0gVCkNCmBgYA0KDQpTZWUgbGVjdHVyZSBvbiBob3cgdG8gaWRlbnRpZnkgdmFyaWFibGVzIGZvciB0YXJnZXRpbmcuDQoNCg0KIyBLLU1lYW5zDQoNCldlIG5vdyBmb2N1cyBvbiBhIGRpZmZlcmVudCBtZXRob2QgY2FsbGVkIEstTWVhbnMuIFRoaXMgbWV0aG9kLCB3aGljaCByZXF1aXJlcyB1cyB0byBzcGVjaWZ5IGluIGFkdmFuY2UgdGhlIG51bWJlciBvZiBjbHVzdGVycywgYWltcyB0byBncm91cCB0aGUgb2JzZXJ2YXRpb25zIGJhc2VkIG9uIHRoZWlyIHNpbWlsYXJpdHkgdXNpbmcgYW4gb3B0aW1pemF0aW9uIHByb2NlZHVyZS4gSW5kZWVkLCB0aGUgYWltIGlzIHRvIG1pbmltaXplIHRoZSB3aXRoaW4tY2x1c3RlciB2YXJpYXRpb24gd2hpY2ggaXMgZGVmaW5lZCBhcyB0aGUgc3VtIG9mIHNxdWFyZSBvZiB0aGUgZXVjbGlkZWFuIGRpc3RhbmNlIGJldHdlZW4gZWFjaCBkYXRhIHBvaW50IHRvIHRoZSBjZW50cm9pZCBvZiBpdHMgY2x1c3Rlci4gTW9yZSBwcmVjaXNlbHksIHRoZSBhbGdvcml0aG0gd29ya3MgYXMgZm9sbG93Og0KDQogIDEuIFN0YXJ0IGJ5IGFzc2lnbmluZyBlYWNoIHBvaW50IHRvIGEgY2x1c3RlciByYW5kb21seQ0KICAyLiBDb21wdXRlIHRoZSBjZW50cm9pZCBvZiBlYWNoIGNsdXN0ZXIgYW5kIHRoZSBkaXN0YW5jZXMgb2YgZWFjaCBwb2ludCB0byBlYWNoIGNlbnRyb2lkDQogIDMuIFJlYXNzaWduIGVhY2ggb2JzZXJ2YXRpb24gdG8gdGhlIGNsb3Nlc3QgQ2VudHJvaWQNCiAgNC4gUmVwZWF0IFN0ZXBzIDIgYW5kIDMgdW50aWwgdGhlIHdpdGhpbi1jbHVzdGVyIHZhcmlhbmNlIGlzIG1pbmltaXplZA0KDQojIyBUaHJlZSBDbHVzdGVyIFNvbHV0aW9uIG9idGFpbmVkIHVzaW5nIEstTWVhbnMNCg0KTGV0IHVzIHN0YXJ0IGJ5IG9ic2VydmluZyBob3cgdGhlIGFsZ29yaXRobSB3b3JrcyBvbiBvdXIgZGF0YSBmb3IgMyBzZWdtZW50cy4gV2UgdXNlIHRoZSBmdW5jdGlvbiBrbWVhbnMoKS4gIERvbid0IGZvcmdldCB0byBzZXQgdGhlIHNlZWQgdG8gYSBzcGVjaWZpYyB2YXVsZSAoZS5nLiwgMTk5MCkuDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMTk5MCkNCmNhcl9DbHVzdGVyMyA8LWttZWFucyhzdGRfc2VnX2RhdGEsIDMsIGl0ZXIubWF4PTEwMCxuc3RhcnQ9MTAwKQ0KY2FyX0NsdXN0ZXIzDQpgYGANCg0KYGBge3J9DQpLbWVhbl9DbHVzdGVyPC1mYWN0b3IoY2FyX0NsdXN0ZXIzJGNsdXN0ZXIsbGV2ZWxzID0gYygxLDIsMyksDQogICAgICAgICAgICAgICAgICAgIGxhYmVscyA9IGMoIlBlcmYuIEtNIiwgIkNvbWZvcnQgS00iLCAiQXBwZWFyYW5jZSBLTSIpKQ0KYGBgDQoNCiMjIEZpbmQgdGhlIG9wdGltYWwgbnVtYmVyIG9mIGNsdXN0ZXJzDQoNCkEga2V5IHF1ZXN0aW9uIHdoZW4gdXNpbmcgdGhlIEstTWVhbnMgY2x1c3RlcmluZyB0ZWNobmlxdWUgY29uc2lzdHMgaW4gY2hvb3NpbmcgdGhlIG9wdGltYWwgbnVtYmVyIG9mIHNlZ21lbnRzLiBJbiBvcmRlciB0byBkbyB0aGF0LCB3ZSBjYW4gdXNlIHRoZSBmdW5jdGlvbiBOYkNsdXN0KCkgYXMgaW4gaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgYnkgc3BlY2lmeWluZyB0aGUgbWV0aG9kIGttZWFucyBhcyBiZWxvdy4gRnJvbSB0aGUgb3V0cHV0LCBXZSBzZWUgdGhhdCB0aGUgdGhyZWUtY2x1c3RlciBzb2x1dGlvbiBpcyBiZXN0LiANCg0KDQpgYGB7ciwgd2FybmluZz1GQUxTRX0NCnNldC5zZWVkKDE5OTApDQpOYkNsdXN0KGRhdGE9c3RkX3NlZ19kYXRhWywxOjVdLCBtaW4ubmM9MywgbWF4Lm5jPTE1LCBpbmRleD0iYWxsIiwgbWV0aG9kPSJrbWVhbnMiKQ0KYGBgDQoNCiMjIFJlc3VsdHMgQ29tcGFyaXNvbg0KTG9va2luZyBhdCB0aGUgY2x1c3RlciBtZWFucyBhYm92ZSwgd2Ugc2VlIHRoYXQgdGhlIGNsdXN0ZXJzIGRlZmluZWQgd2l0aCB0aGUga21lYW5zIGZ1bmN0aW9uIGFyZSBjaGFyYWN0ZXJpemVkIHNpbWlsYXJseSBhcyBiZWZvcmUuIFRodXMsIHdlIHJlbGFiZWwgdGhlbSB0byBkZXNjcmliZSB0aGVtIG1vcmUgYWNjdXJhdGVseS4gV2UgY2FuIG5vdyBjb21wYXJlIHRoaXMgY2x1c3RlcmluZyB0byB0aGUgZGVtb2dyYXBoaWNzIGFuZCBjaG9pY2UgYXMgd2VsbCBhcyB0aGUgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcuDQoNCiMjIyBEZW1vZ3JhcGhpY3MNCmBgYHtyLHdhcm5pbmc9RkFMU0V9DQpDcm9zc1RhYmxlKHNlZ19kYXRhJE1CQSxLbWVhbl9DbHVzdGVyLHByb3AuY2hpc3EgPSBGQUxTRSwgcHJvcC5yID0gVCwgcHJvcC5jID0gVCwNCiAgICAgICAgICAgcHJvcC50ID0gRixjaGlzcSA9IFQpDQpgYGANCg0KIyMjIENob2ljZQ0KDQpgYGB7cix3YXJuaW5nPUZBTFNFfQ0KQ3Jvc3NUYWJsZShLbWVhbl9DbHVzdGVyLHNlZ19kYXRhJENob2ljZSxwcm9wLmNoaXNxID0gRkFMU0UsIHByb3AuciA9IFQsIHByb3AuYyA9IFQscHJvcC50ID0gRixjaGlzcSA9IFQpDQpgYGANCg0KIyMjIEhpZXJhcmNoaWNhbCBDbHVzdGVyaW5nDQoNCmBgYHtyLHdhcm5pbmc9RkFMU0V9DQpDcm9zc1RhYmxlKGhfY2x1c3RlcixLbWVhbl9DbHVzdGVyLHByb3AuY2hpc3EgPSBGQUxTRSwgcHJvcC5yID0gVCwgcHJvcC5jID0gVCwNCiAgICAgICAgICAgcHJvcC50ID0gRixjaGlzcSA9IFQpDQpgYGANCiMgTGF0ZW50IENsYXNzIEFuYWx5c2lzDQoNCkxhdGVudCBDbGFzcyBBbmFseXNpcyBpcyBhIG1ldGhvZCB0byBpZGVudGlmeSBjbHVzdGVyIG1lbWJlcnNoaXAgb2Ygc3ViamVjdHMgdXNpbmcgdGhlIG9ic2VydmFibGUgdmFyaWFibGVzIHRoYXQgZGVzY3JpYmUgdGhlbS4gVGhlIGFwcHJvYWNoIGNvbnNpc3RzIGluIGVzdGltYXRpbmcgZm9yIGVhY2ggaW5kaXZpZHVhbCB0aGUgcHJvYmFiaWxpdHkgdG8gYmVsb25nIHRvIGEgImxhdGVudCBjbGFzcyIgb3IgY2x1c3Rlci4gSW4gdHVybiwgZWFjaCBjbHVzdGVyIGlzIGRlZmluZWQgaW4gdGVybXMgb2YgaXRzICJnZW9tZXRyeSIgYW5kICJvcmllbnRhdGlvbiIgYXMgY2xvdWQgb2YgcG9pbnRzLiBBcyBzdWNoLCB0aGlzIHRlY2huaXF1ZSBiZWxvbmdzIHRvIHRoZSBmYW1pbHkgb2YgZ2F1c3NpYW4gZmluaXRlIG1peHR1cmUgbW9kZWxzLiBUaGlzIGFwcHJvYWNoIHJlbGllcyBvbiBhIGRpZmZlcmVudCBvcHRpbWl6YXRpb24gcHJvY2VkdXJlIHRoYXQgYWltcyB0byBtYXhpbWl6ZSB0aGUgbGlrZWxpaG9vZCAodmVyc3VzIG1pbmltaXplIHRoZSBkaXN0YW5jZXMgYmV0d2VlbiBlYWNoIHBvaW50KS4gSGVuY2UsIHRoZSB0b29scyB0byBhc3Nlc3MgdGhlIG9wdGltYWwgbnVtYmVyIG9mIGNsYXNzZXMgZGlmZmVyLiBXZSBub3cgcGVyZm9ybSB0aGlzIGFuYWx5c2lzIHVzaW5nIHRoZSBwYWNrYWdlIG1jbHVzdC4gV2Ugc3RhcnQgYnkgZGV0ZXJtaW5pbmcgdGhlIG9wdGltYWwgbW9kZWwgYmFzZWQgb24gQklDIHVzaW5nIHRoZSBmdW5jdGlvbiBtY2x1c3RCSUMoKS4NCg0KIyMgRmluZCB0aGUgb3B0aW1hbCBtb2RlbA0KDQpgYGB7cn0NCnNldC5zZWVkKDE5OTApDQptY2x1c3RCSUMoc3RkX3NlZ19kYXRhWywxOjVdLHZlcmJvc2U9RikNCmBgYA0KSGVuY2UsIHRoZSBvcHRpbWFsIG1vZGVsIGlzIFZFRSB3aXRoIDIgc2VnbWVudHMuIFRoaXMgbWVhbnMgdGhhdCB0aGUgZGF0YSBjYW4gYmUgY2x1c3RlcmVkIGluIHR3byBjbHVzdGVycyB3aGljaCB3aWxsIGJvdGggYmUgbW9kZWxlZCBieSBhIE5vcm1hbCBkaXN0cmlidXRpb24gd2l0aCB0aGUgc2FtZSBjb3ZhcmlhbmNlIG1hdHJpeC4gV2Ugb2J0YWluIG1vcmUgZGV0YWlscyBhYm91dCB0aGUgb3B0aW1hbCBtb2RlbCBiZWxvdzoNCg0KYGBge3J9DQpzZXQuc2VlZCgxOTkwKQ0KbGNhX2NsdXN0IDwtIE1jbHVzdChzdGRfc2VnX2RhdGFbLDE6NV0sdmVyYm9zZSA9IEZBTFNFKQ0Kc3VtbWFyeShsY2FfY2x1c3QpDQpgYGANCldlIG5vdyBpbnRlcnByZXQgZWFjaCBjbHVzdGVyIGFuZCByZW5hbWUgdGhlbSB0byBkZXNjcmliZSB0aGVtIGFjY3VyYXRlbHk6DQoNCmBgYHtyfQ0KbGNhX2NsdXN0ZXJzIDwtIGxjYV9jbHVzdCRjbGFzc2lmaWNhdGlvbg0KbGNhX2NsdXN0X3N1bW1hcnkgPC0gYWdncmVnYXRlKHN0ZF9zZWdfZGF0YVssYygiVHJlbmR5IiwgIlN0eWxpbmciLCAiUmVsaWFiaWxpdHkiLCAiU3BvcnRpbmVzcyIsICJQZXJmb3JtYW5jZSIsICJDb21mb3J0IildLGJ5PWxpc3QobGNhX2NsdXN0ZXJzKSxGVU49bWVhbikNCmxjYV9jbHVzdF9zdW1tYXJ5DQpsY2FfY2x1c3RlcnM8LWZhY3RvcihsY2FfY2x1c3RlcnMsbGV2ZWxzID0gYygxLDIpLA0KICAgICAgICAgICAgICAgICAgICBsYWJlbHMgPSBjKCJSZWxpYWJpbGl0eSBMQ0EiLCAiQ29tZm9ydCBMQ0EiKSkNCmBgYA0KDQojIyBSZXN1bHRzIENvbXBhcmlzb24NCkxldCB1cyBjb21wYXJlIHRoaXMgc29sdXRpb24gdG8gdGhlIGRlbW9ncmFwaGljcyBhbmQgY2hvaWNlIGRhdGEgYXMgd2VsbCBhcyB0aGUgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgYW5kIEstTWVhbnM6DQoNCiMjIyBEZW1vZ3JhcGhpY3MNCg0KYGBge3Isd2FybmluZz1GQUxTRX0NCkNyb3NzVGFibGUoc2VnX2RhdGEkTUJBLGxjYV9jbHVzdGVycyxwcm9wLmNoaXNxID0gRkFMU0UsIHByb3AuciA9IFQsIHByb3AuYyA9IFQsDQogICAgICAgICAgIHByb3AudCA9IEYsY2hpc3EgPSBUKQ0KYGBgDQojIyMgQ2hvaWNlDQoNCmBgYHtyLHdhcm5pbmc9RkFMU0V9DQpDcm9zc1RhYmxlKGxjYV9jbHVzdGVycyxzZWdfZGF0YSRDaG9pY2UscHJvcC5jaGlzcSA9IEZBTFNFLCBwcm9wLnIgPSBULCBwcm9wLmMgPSBULA0KICAgICAgICAgICBwcm9wLnQgPSBGLGNoaXNxID0gVCkNCmBgYA0KDQojIyMgSGllcmFyY2hpY2FsIENsdXN0ZXJpbmcNCiANCmBgYHtyLHdhcm5pbmc9RkFMU0V9DQpDcm9zc1RhYmxlKGhfY2x1c3RlcixsY2FfY2x1c3RlcnMscHJvcC5jaGlzcSA9IEZBTFNFLCBwcm9wLnIgPSBULCBwcm9wLmMgPSBULA0KICAgICAgICAgICBwcm9wLnQgPSBGLGNoaXNxID0gVCkNCmBgYA0KDQojIyMgSy1NZWFucyBDbHVzdGVyaW5nDQogDQpgYGB7cix3YXJuaW5nPUZBTFNFfQ0KQ3Jvc3NUYWJsZShLbWVhbl9DbHVzdGVyLGxjYV9jbHVzdGVycyxwcm9wLmNoaXNxID0gRkFMU0UsIHByb3AuciA9IFQsIHByb3AuYyA9IFQsDQogICAgICAgICAgIHByb3AudCA9IEYsY2hpc3EgPSBUKQ0KYGBg

Statistical Analysis Handout for the Market Segmentation Lecture

Cluster Analysis

Reading and outputing data

Hierarchical Clustering Analysis

The four-cluster solution

Three-Cluster Solution

Number of Clusters

Targeting the Clusters/segments

Demographics

Choice

K-Means

Three Cluster Solution obtained using K-Means

Find the optimal number of clusters

Results Comparison

Demographics

Choice

Hierarchical Clustering

Latent Class Analysis

Find the optimal model

Results Comparison

Demographics

Choice

Hierarchical Clustering

K-Means Clustering