malhotra mr6e 20

Upload: madddyhyd

Post on 03-Apr-2018

255 views

Category:

Documents


11 download

TRANSCRIPT

  • 7/28/2019 Malhotra MR6e 20

    1/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-1

    Chapter Twenty

    Cluster Analysis

  • 7/28/2019 Malhotra MR6e 20

    2/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-2

    Chapter Outline

    1) Overview2) Basic Concept

    3) Statistics Associated with Cluster Analysis

    4) Conducting Cluster Analysis

    i. Formulating the Problem

    ii. Selecting a Distance or Similarity Measure

    iii. Selecting a Clustering Procedure

    iv. Deciding on the Number of Clusters

    v. Interpreting and Profiling the Clusters

    vi. Assessing Reliability and Validity

  • 7/28/2019 Malhotra MR6e 20

    3/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-3

    Chapter Outline

    5) Applications of NonhierarchicalClustering

    6) Clustering Variables

    7) Summary

  • 7/28/2019 Malhotra MR6e 20

    4/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-4

    Cluster Analysis

    Cluster analysis is a class of techniques used to classify

    objects or cases into relatively homogeneous groups calledclusters. Objects in each cluster tend to be similar to eachother and dissimilar to objects in the other clusters.Cluster analysis is also called classification analysis, ornumerical taxonomy.

    Both cluster analysis and discriminant analysis areconcerned with classification. However, discriminantanalysis requires prior knowledge of the cluster or groupmembership for each object or case included, to developthe classification rule. In contrast, in cluster analysis thereis no a priori information about the group or cluster

    membership for any of the objects. Groups or clusters aresuggested by the data, not defined a priori.

  • 7/28/2019 Malhotra MR6e 20

    5/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-5

    An Ideal Clustering Situation

    Variable 2

    Variable1

    Fig. 20.1

  • 7/28/2019 Malhotra MR6e 20

    6/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-6

    A Practical Clustering Situation

    X

    Variable 2

    Va

    riable

    1

    Fig. 20.2

  • 7/28/2019 Malhotra MR6e 20

    7/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-7

    Statistics Associated with Cluster Analysis

    Agglomeration schedule. An agglomeration schedulegives information on the objects or cases being combined ateach stage of a hierarchical clustering process.

    Cluster centroid. The cluster centroid is the mean valuesof the variables for all the cases or objects in a particularcluster.

    Cluster centers. The cluster centers are the initial startingpoints in nonhierarchical clustering. Clusters are builtaround these centers, or seeds.

    Cluster membership. Cluster membership indicates thecluster to which each object or case belongs.

  • 7/28/2019 Malhotra MR6e 20

    8/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-8

    Statistics Associated with Cluster Analysis

    Dendrogram. A dendrogram, or tree graph, is a graphicaldevice for displaying clustering results. Vertical linesrepresent clusters that are joined together. The position ofthe line on the scale indicates the distances at whichclusters were joined. The dendrogram is read from left to

    right. Figure 20.8 is a dendrogram.

    Distances between cluster centers. These distancesindicate how separated the individual pairs of clusters are.Clusters that are widely separated are distinct, andtherefore desirable.

  • 7/28/2019 Malhotra MR6e 20

    9/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-9

    Statistics Associated with Cluster Analysis

    Icicle plot. An icicle plot is a graphical display ofclustering results, so called because it resemblesa row of icicles hanging from the eaves of ahouse. The columns correspond to the objectsbeing clustered, and the rows correspond to the

    number of clusters. An icicle plot is read frombottom to top. Figure 20.7 is an icicle plot.

    Similarity/distance coefficient matrix. Asimilarity/distance coefficient matrix is a lower-triangle matrix containing pairwise distancesbetween objects or cases.

  • 7/28/2019 Malhotra MR6e 20

    10/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-10

    Conducting Cluster Analysis

    Formulate the Problem

    Assess the Validity of Clustering

    Select a Distance Measure

    Select a Clustering Procedure

    Decide on the Number of Clusters

    Interpret and Profile Clusters

    Fig. 20.3

  • 7/28/2019 Malhotra MR6e 20

    11/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-11

    Attitudinal Data For Clustering

    Case No. V1 V2 V3 V4 V5 V6

    1 6 4 7 3 2 32 2 3 1 4 5 43 7 2 6 4 1 34 4 6 4 5 3 65 1 3 2 2 6 46 6 4 6 3 3 47 5 3 6 3 3 4

    8 7 3 7 4 1 49 2 4 3 3 6 310 3 5 3 6 4 611 1 3 2 3 5 312 5 4 5 4 2 413 2 2 1 5 4 414 4 6 4 6 4 715 6 5 4 2 1 416 3 5 4 6 4 717 4 4 7 2 2 518 3 7 2 6 4 319 4 6 3 7 2 720 2 3 2 4 7 2

    Table

    20.

    1

  • 7/28/2019 Malhotra MR6e 20

    12/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-12

    Conducting Cluster Analysis:Formulate the Problem

    Perhaps the most important part of formulating theclustering problem is selecting the variables on whichthe clustering is based.

    Inclusion of even one or two irrelevant variables maydistort an otherwise useful clustering solution.

    Basically, the set of variables selected should describethe similarity between objects in terms that are relevantto the marketing research problem.

    The variables should be selected based on past research,

    theory, or a consideration of the hypotheses beingtested. In exploratory research, the researcher shouldexercise judgment and intuition.

  • 7/28/2019 Malhotra MR6e 20

    13/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-13

    Conducting Cluster Analysis:Select a Distance or Similarity Measure

    The most commonly used measure of similarity is the Euclidean

    distance or its square. The Euclidean distance is the square root ofthe sum of the squared differences in values for each variable. Otherdistance measures are also available. The city-block or Manhattandistancebetween two objects is the sum of the absolute differencesin values for each variable. The Chebychev distance between twoobjects is the maximum absolute difference in values for any

    variable. If the variables are measured in vastly different units, the clustering

    solution will be influenced by the units of measurement. In thesecases, before clustering respondents, we must standardize the databy rescaling each variable to have a mean of zero and a standarddeviation of unity. It is also desirable to eliminate outliers (cases

    with atypical values). Use of different distance measures may lead to different clustering

    results. Hence, it is advisable to use different measures and comparethe results.

  • 7/28/2019 Malhotra MR6e 20

    14/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-14

    A Classification of Clustering Procedures

    Fig. 20.4

    NonhierarchicalHierarchical

    Agglomerative Divisive

    SequentialThreshold

    ParallelThreshold

    OptimizingPartitioning

    LinkageMethods

    VarianceMethods

    CentroidMethods

    WardsMethod

    SingleLinkage

    CompleteLinkage

    AverageLinkage

    Other

    Two-Step

    Clustering Procedures

  • 7/28/2019 Malhotra MR6e 20

    15/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-15

    Conducting Cluster Analysis:Select a Clustering ProcedureHierarchical

    Hierarchical clustering is characterized by thedevelopment of a hierarchy or tree-like structure.Hierarchical methods can be agglomerative or divisive.

    Agglomerative clustering starts with each object in aseparate cluster. Clusters are formed by grouping objects

    into bigger and bigger clusters. This process is continueduntil all objects are members of a single cluster.

    Divisive clustering starts with all the objects grouped in asingle cluster. Clusters are divided or split until each objectis in a separate cluster.

    Agglomerative methods are commonly used in marketingresearch. They consist of linkage methods, error sums ofsquares or variance methods, and centroid methods.

  • 7/28/2019 Malhotra MR6e 20

    16/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-16

    Conducting Cluster Analysis:Select a Clustering Procedure Linkage Method

    The single linkage method is based on minimum distance,or the nearest neighbor rule. At every stage, the distancebetween two clusters is the distance between their two closestpoints (see Figure 20.5).

    The complete linkage method is similar to single linkage,except that it is based on the maximum distance or the

    furthest neighbor approach. In complete linkage, the distancebetween two clusters is calculated as the distance betweentheir two furthest points.

    The average linkage method works similarly. However, inthis method, the distance between two clusters is defined asthe average of the distances between all pairs of objects,

    where one member of the pair is from each of the clusters(Figure 20.5).

  • 7/28/2019 Malhotra MR6e 20

    17/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-17

    Linkage Methods of Clustering

    Fig. 20.5 Single Linkage

    Minimum Distance

    Complete Linkage

    MaximumDistance

    Average Linkage

    Average Distance

    Cluster 1 Cluster 2

    Cluster 1 Cluster 2

    Cluster 1 Cluster 2

    C d i Cl A l i

  • 7/28/2019 Malhotra MR6e 20

    18/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-18

    Conducting Cluster Analysis:Select a Clustering Procedure Variance Method

    The variance methods attempt to generate clusters tominimize the within-cluster variance.

    A commonly used variance method is the Ward's procedure.For each cluster, the means for all the variables are computed.Then, for each object, the squared Euclidean distance to thecluster means is calculated (Figure 20.6). These distances are

    summed for all the objects. At each stage, the two clusterswith the smallest increase in the overall sum of squares withincluster distances are combined.

    In the centroid methods, the distance between two clustersis the distance between their centroids (means for all thevariables), as shown in Figure 20.6. Every time objects are

    grouped, a new centroid is computed. Of the hierarchical methods, average linkage and Ward's

    methods have been shown to perform better than the otherprocedures.

  • 7/28/2019 Malhotra MR6e 20

    19/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-19

    Other Agglomerative Clustering Methods

    Wards Procedure

    Centroid Method

    Fig. 20.6

    C d ti Cl t A l i

  • 7/28/2019 Malhotra MR6e 20

    20/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-20

    Conducting Cluster Analysis:Select a Clustering Procedure Nonhierarchical

    The nonhierarchical clustering methods are frequently

    referred to as k-means clustering. These methods includesequential threshold, parallel threshold, and optimizingpartitioning.

    In the sequential threshold method, a cluster center isselected and all objects within a prespecified threshold valuefrom the center are grouped together. Then a new clustercenter or seed is selected, and the process is repeated for theunclustered points. Once an object is clustered with a seed, itis no longer considered for clustering with subsequent seeds.

    The parallel threshold method operates similarly, except thatseveral cluster centers are selected simultaneously and objects

    within the threshold level are grouped with the nearest center. The optimizing partitioning method differs from the two

    threshold procedures in that objects can later be reassigned toclusters to optimize an overall criterion, such as average withincluster distance for a given number of clusters.

    C d ti Cl t A l i

  • 7/28/2019 Malhotra MR6e 20

    21/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-21

    Conducting Cluster Analysis:Select a Clustering Procedure

    It has been suggested that the hierarchical andnonhierarchical methods be used in tandem. First, an initialclustering solution is obtained using a hierarchicalprocedure, such as average linkage or Ward's. The numberof clusters and cluster centroids so obtained are used as

    inputs to the optimizing partitioning method.

    Choice of a clustering method and choice of a distancemeasure are interrelated. For example, squared Euclideandistances should be used with the Ward's and centroidmethods. Several nonhierarchical procedures also use

    squared Euclidean distances.

  • 7/28/2019 Malhotra MR6e 20

    22/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-22

    Results of Hierarchical Clustering

    Stage clusterClusters combined first appears

    Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage1 14 16 1.000000 0 0 62 6 7 2.000000 0 0 73 2 13 3.500000 0 0 154 5 11 5.000000 0 0 11

    5 3 8 6.500000 0 0 166 10 14 8.160000 0 1 97 6 12 10.166667 2 0 108 9 20 13.000000 0 0 119 4 10 15.583000 0 6 1210 1 6 18.500000 6 7 1311 5 9 23.000000 4 8 1512 4 19 27.750000 9 0 17

    13 1 17 33.100000 10 0 1414 1 15 41.333000 13 0 1615 2 5 51.833000 3 11 1816 1 3 64.500000 14 5 1917 4 18 79.667000 12 0 1818 2 4 172.662000 15 17 1919 1 2 328.600000 16 18 0

    Agglomeration Schedule Using Wards Procedure

    Table 20.2

  • 7/28/2019 Malhotra MR6e 20

    23/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-23

    Results of Hierarchical Clustering

    Number of ClustersLabel case 4 3 2

    1 1 1 1

    2 2 2 2

    3 1 1 1

    4 3 3 2

    5 2 2 2

    6 1 1 17 1 1 1

    8 1 1 1

    9 2 2 2

    10 3 3 2

    11 2 2 2

    12 1 1 1

    13 2 2 2

    14 3 3 215 1 1 1

    16 3 3 2

    17 1 1 1

    18 4 3 2

    19 3 3 2

    20 2 2 2

    Cluster Membership of Cases Using Wards Procedure

    Table

    20.

    2,

    cont.

  • 7/28/2019 Malhotra MR6e 20

    24/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-24

    Vertical Icicle Plot Using Wards MethodFig. 20.7

  • 7/28/2019 Malhotra MR6e 20

    25/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-25

    Fig. 20.8

    Dendrogram Using Wards Method

    Conducting Cluster Analysis:

  • 7/28/2019 Malhotra MR6e 20

    26/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-26

    Conducting Cluster Analysis:Decide on the Number of Clusters

    Theoretical, conceptual, or practical considerations maysuggest a certain number of clusters.

    In hierarchical clustering, the distances at which clustersare combined can be used as criteria. This information canbe obtained from the agglomeration schedule or from the

    dendrogram. In nonhierarchical clustering, the ratio of total within-group

    variance to between-group variance can be plotted againstthe number of clusters. The point at which an elbow or asharp bend occurs indicates an appropriate number of

    clusters. The relative sizes of the clusters should be meaningful.

    Conducting Cluster Analysis:

  • 7/28/2019 Malhotra MR6e 20

    27/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-27

    Conducting Cluster Analysis:Interpreting and Profiling the Clusters

    Interpreting and profiling clusters involvesexamining the cluster centroids. Thecentroids enable us to describe each cluster

    by assigning it a name or label.

    It is often helpful to profile the clusters interms of variables that were not used forclustering. These may include demographic,

    psychographic, product usage, media usage,or other variables.

  • 7/28/2019 Malhotra MR6e 20

    28/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-28

    Cluster Centroids

    Table 20.3

    Cluster V1 V2 V3 V4 V5 V6

    1 5.750 3.625 6.000 3.125 1.750 3.875

    2 1.667 3.000 1.833 3.500 5.500 3.333

    3 3.500 5.833 3.333 6.000 3.500 6.000

    Means of Variables

    Conducting Cluster Analysis:

  • 7/28/2019 Malhotra MR6e 20

    29/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-29

    Conducting Cluster Analysis:Assess Reliability and Validity

    1. Perform cluster analysis on the same data using differentdistance measures. Compare the results across measures todetermine the stability of the solutions.

    2. Use different methods of clustering and compare the results.

    3. Split the data randomly into halves. Perform clustering

    separately on each half. Compare cluster centroids across thetwo subsamples.

    4. Delete variables randomly. Perform clustering based on thereduced set of variables. Compare the results with thoseobtained by clustering based on the entire set of variables.

    5. In nonhierarchical clustering, the solution may depend on theorder of cases in the data set. Make multiple runs usingdifferent order of cases until the solution stabilizes.

  • 7/28/2019 Malhotra MR6e 20

    30/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-30

    Table 20.4

    Results of Nonhierarchical Clustering

    Initial Cluster Centers

    4 2 7

    6 3 2

    3 2 6

    7 4 42 7 1

    7 2 3

    V1

    V2

    V3

    V4V5

    V6

    1 2 3

    Cluster

    Convergence achieved due to no or small distancechange. The maximum distance by which any centerhas changed is 0.000. The current iteration is 2. Theminimum distance between initial centers is 7.746.

    a.

    Iteration Historya

    2.154 2.102 2.5500.000 0.000 0.000

    Iteration

    12

    1 2 3

    Change in Cluster Centers

  • 7/28/2019 Malhotra MR6e 20

    31/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-31

    Results of Nonhierarchical Clustering

    Table 20.4 cont.

    Cluster Membership

    3 1.414

    2 1.323

    3 2.550

    1 1.404

    2 1.848

    3 1.225

    3 1.500

    3 2.1212 1.756

    1 1.143

    2 1.041

    3 1.581

    2 2.598

    1 1.404

    3 2.828

    1 1.624

    3 2.598

    1 3.555

    1 2.154

    2 2.102

    Case Number

    1

    2

    3

    4

    5

    6

    7

    89

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    Cluster Distance

  • 7/28/2019 Malhotra MR6e 20

    32/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-32

    Results of Nonhierarchical Clustering

    Table 20.4, cont.Final Cluster Centers

    Cluster

    1 2 3V1V2

    V3V4V5V6

    46

    3646

    23

    2463

    64

    6324

    Distances between Final Cluster Centers

    1 2 3

    123

    5.5685.698

    5.568

    6.928

    5.6986.928

    Cluster

  • 7/28/2019 Malhotra MR6e 20

    33/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-33

    ANOVA

    29.108 2 0.608 17 47.888 0.000

    13.546 2 0.630 17 21.505 0.000

    31.392 2 0.833 17 37.670 0.000

    15.713 2 0.728 17 21.585 0.000

    22.537 2 0.816 17 27.614 0.000

    12.171 2 1.071 17 11.363 0.001

    V1

    V2

    V3

    V4

    V5

    V6

    Mean Square dfCluster

    Mean Square dfError

    F Sig.

    The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observedsignificance levels are not corrected for this, and thus cannot be interpreted as tests of the

    hypothesis that the cluster means are equal.

    Number of Cases in each Cluster

    6.000

    6.000

    8.000

    20.000

    0.000

    1

    2

    3

    Cluster

    Valid

    Missing

    Results of Nonhierarchical Clustering

    Table 20.4, cont.

    f

  • 7/28/2019 Malhotra MR6e 20

    34/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-34

    Results of Two-Step Clustering

    Table 20.5 Auto-Clustering

    Number of Clusters

    Akaike's

    InformationCriterion (AIC)

    AICChange(a)

    Ratio of AICChanges(b)

    Ratio of

    DistanceMeasures(c)

    1 104.140

    2 101.171 -2.969 1.000 .847

    3 97.594 -3.577 1.205 1.583

    4 116.896 19.302 -6.502 2.115

    5 138.230 21.335 -7.187 1.222

    6 158.586 20.355 -6.857 1.0217 179.340 20.755 -6.991 1.224

    8 201.628 22.288 -7.508 1.006

    9 224.055 22.426 -7.555 1.111

    10 246.522 22.467 -7.568 1.588

    11 269.570 23.048 -7.764 1.001

    12 292.718 23.148 -7.798 1.055

    13 316.120 23.402 -7.883 1.00214 339.223 23.103 -7.782 1.044

    15 362.650 23.427 -7.892 1.004

    a The changes are from the previous number of clusters in the table.b The ratios of changes are relative to the change for the two cluster solution.c The ratios of distance measures are based on the current number of clusters

    against the previous number of clusters.

  • 7/28/2019 Malhotra MR6e 20

    35/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-35

    Cluster Distribution

    Table 20.5, cont.

    N% of

    Combined % of Total1 6 30.0% 30.0%

    2 6 30.0% 30.0%

    3 8 40.0% 40.0%

    Cluster

    Combined 20 100.0% 100.0%

    Total 20 100.0%

    Cl t P fil

  • 7/28/2019 Malhotra MR6e 20

    36/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-36

    Cluster Profiles

    Table 20.5, cont.

    Fun Bad for Budget Eating Out

    Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation

    1 1.67 .516 3.00 .632 1.83 .753

    2 3.50 .548 5.83 .753 3.33 .816

    3 5.75 1.035 3.63 .916 6.00 1.069

    Cluster

    Combined 3.85 1.899 4.10 1.410 3.95 2.012

    Best Buys Don't Care Compare Prices

    Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation

    3.50 1.049 5.50 1.049 3.33 .8166.00 .632 3.50 .837 6.00 1.549

    3.13 .835 1.88 .835 3.88 .641

    4.10 1.518 3.45 1.761 4.35 1.496

    Cl t i V i bl

  • 7/28/2019 Malhotra MR6e 20

    37/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-37

    Clustering Variables

    In this instance, the units used for analysis are the variables,

    and the distance measures are computed for all pairs ofvariables.

    Hierarchical clustering of variables can aid in the identificationof unique variables, or variables that make a uniquecontribution to the data.

    Clustering can also be used to reduce the number ofvariables. Associated with each cluster is a linear combinationof the variables in the cluster, called the cluster component.A large set of variables can often be replaced by the set ofcluster components with little loss of information. However, a

    given number of cluster components does not generallyexplain as much variance as the same number of principalcomponents.

  • 7/28/2019 Malhotra MR6e 20

    38/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-38

    SPSS Windows

    To select this procedure using SPSS for Windows,click:

    Analyze>Classify>Hierarchical Cluster

    Analyze>Classify>K-Means Cluster

    Analyze>Classify>Two-Step Cluster

  • 7/28/2019 Malhotra MR6e 20

    39/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-39

    SPSS Windows: Hierarchical Clustering

    1. Select ANALYZE from the SPSS menu bar.

    2. Click CLASSIFY and then HIERARCHICAL CLUSTER.3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3], Best Buys [v4],

    Dont Care [v5], and Compare Prices [v6] into the VARIABLES box.

    4. In the CLUSTER box, check CASES (default option). In the DISPLAY box,check STATISTICS and PLOTS (default options).

    5. Click on STATISTICS. In the pop-up window, check AGGLOMERATIONSCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OFSOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and forMAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.

    6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In theICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box,check VERTICAL. Click CONTINUE.

    7. Click on METHOD. For CLUSTER METHOD, select WARDS METHOD. In theMEASURE box, check INTERVAL and select SQUARED EUCLIDEANDISTANCE. Click CONTINUE.

    8. Click OK.

  • 7/28/2019 Malhotra MR6e 20

    40/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-40

    SPSS Windows: K-Means Clustering

    1. Select ANALYZE from the SPSS menu bar.2. Click CLASSIFY and then K-MEANS CLUSTER.

    3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3],Best Buys [v4], Dont Care [v5], and Compare Prices[v6] into the VARIABLES box.

    4. For NUMBER OF CLUSTER, select 3.

    5. Click on OPTIONS. In the pop-up window, in theSTATISTICS box, check INITIAL CLUSTER CENTERS andCLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.

    6. Click OK.

  • 7/28/2019 Malhotra MR6e 20

    41/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-41

    SPSS Windows: Two-Step Clustering

    1. Select ANALYZE from the SPSS menu bar.

    2. Click CLASSIFY and then TWO-STEP CLUSTER.

    3. Move Fun [v1], Bad for Budget [v2], Eating Out [v3],Best Buys [v4], Dont Care [v5], and Compare Prices[v6] into the CONTINUOUS VARIABLES box.

    4. For DISTANCE MEASURE, select EUCLIDEAN.

    5. For NUMBER OF CLUSTER, select DETERMINEAUTOMATICALLY.

    6. For CLUSTERING CRITERION, select AKAIKES

    INFORMATION CRITERION (AIC).

    7. Click OK.

  • 7/28/2019 Malhotra MR6e 20

    42/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-42

    SAS Learning Edition

    To select this procedure using SAS LearningEdition, click:

    Analyze>Classify>Cluster Analysis

    i di i i hi l l i

  • 7/28/2019 Malhotra MR6e 20

    43/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-43

    SAS Learning Edition: Hierarchical Clustering

    1. Select ANALYZE from the SAS Learning Edition

    menu bar.

    2. Select Multivariate>Cluster Analysis.

    3. Move V1-V6 to the Analysis variables task role.

    4. Click Cluster and select Wards minimumvariance method under Cluster method.

    5. Click Results and select Simple summarystatistics.

    6. Click Run.

    SAS Learning Edition Windows:

  • 7/28/2019 Malhotra MR6e 20

    44/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-44

    SAS Learning Edition Windows:K-Means Clustering

    1. Select ANALYZE from the SAS Learning Editionmenu bar.

    2. Select Multivariate>Cluster Analysis.

    3. Move V1-V6 to the Analysis variables task role.

    4. Click Cluster and select K-means algorithm as thecluster method and 3 for the Maximum number ofclusters.

    5. Click Run.

  • 7/28/2019 Malhotra MR6e 20

    45/46

    Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 20-45

  • 7/28/2019 Malhotra MR6e 20

    46/46

    All rights reserved. No part of this publication may bereproduced, stored in a retrieval system, or transmitted, in

    any form or by any means, electronic, mechanical,photocopying, recording, or otherwise, without the priorwritten permission of the publisher. Printed in the United

    States of America.

    Copyright 2010 Pearson Education, Inc.publishing as Prentice Hall