Metodi Quantitativi per Economia, Finanza e Management
Lezione n°12
Quante componenti considerare?
1. metodo degli autovalori >1
2. rapporto tra numero di componenti e variabili (circa 1/3)
3. percentuale di varianza spiegata (almeno 60%)
4. lo SCREE PLOT (plot di autovalore vs il numero di fattori) Se il plot mostra un “gomito” è plausibile ipotizzare l’esistenza di una struttura latente, se la forma è quasi rettilinea significa che i fattori sono solo una trasformazione delle variabili manifeste. I fattori rilevanti sono quelli al di sopra del gomito (a discrezione anche quello in corrispondenza del gomito). Se non ci sono fattori predominanti il criterio è inadatto.
Analisi fattoriale
Quante componenti considerare?
5. Comunalità:
- confronto tra le comunatità di più soluzioni
- la quota di varianza spiegata di ciascuna variabile dalla soluzione scelta deve essere soddisfacente
Analisi fattoriale
Come interpretarle?
1. rotazione delle componentiLa rotazione ortogonale nello spazio dei fattori non influenza la validità del modello: sfruttiamo questa caratteristica per ottenere dei fattori più facilmente interpretabili.
–The Varimax method of rotation, suggested by Kaiser, has the purpose of minimizing the number of variables with high saturations (correlations) for each factor–The Quartimax method attempts to minimize the number of factors tightly correlated to each variable–The Equimax method is a cross between the Varimax and the Quartimax
2. correlazioni tra componenti principali e variabili originarie
Analisi fattoriale
Esempidi Analisi Fattoriale
di vecchi lavori di gruppo
Obiettivo della ricerca è comprendere quali siano i principali mezzi informativi, il relativo indice di gradimento e quali siano gli argomenti di maggior interesse.
Analisi fattoriale:Le variabili considerate sono i 14 parametri che influenzano la scelta del canale e quelli che influenzano la scelta relativa al tipo di fonte
EsempioImportanza dell’Informazione e
modalità di acquisizione
In funzione di cosa scegli il canale? Su una scala da 1 a 10 (dove 1= per niente e 10= moltissimo) esprimi un giudizio sull’importanza:
semplicità 1 2 3 4 5 6 7 8 9 10costo 1 2 3 4 5 6 7 8 9 10velocità di acquisizione 1 2 3 4 5 6 7 8 9 10comodità 1 2 3 4 5 6 7 8 9 10tempo di aggiornamento 1 2 3 4 5 6 7 8 9 10
In funzione di cosa scegli le fonti? Su una scala da 1 a 10 (dove 1= per niente e 10= moltissimo) esprimi un giudizio sull’importanza:
orientamento politico 1 2 3 4 5 6 7 8 9 10temi trattati
1 2 3 4 5 6 7 8 9 10
area geografica di interesse
1 2 3 4 5 6 7 8 9 10
direttore 1 2 3 4 5 6 7 8 9 10formato / stile 1 2 3 4 5 6 7 8 9 10con chi vivi
1 2 3 4 5 6 7 8 9 10
redazione 1 2 3 4 5 6 7 8 9 10giornalisti/speaker 1 2 3 4 5 6 7 8 9 10qualità servizi 1 2 3 4 5 6 7 8 9 10
Le variabili considerate sono i parametri che influenzano la scelta del canale e quelli che influenzano la scelta relativa al tipo di fonte
EsempioImportanza dell’Informazione e
modalità di acquisizione
9
Autovalore Differenza Proporzione Cumulata
1 3.16944223 0.52227941 0.2264 0.2264
2 2.64716281 1.35701039 0.1891 0.4155
3 1.29015243 0.02599489 0.0922 0.5076
4 1.26415754 0.2604549 0.0903 0.5979
5 1.00370264 0.20036187 0.0717 0.6696
6 0.80334077 0.01216326 0.0574 0.727
7 0.79117751 0.13231428 0.0565 0.7835
8 0.65886322 0.03460029 0.0471 0.8306
9 0.62426293 0.12396136 0.0446 0.8752
10 0.50030158 0.09138333 0.0357 0.9109
11 0.40891825 0.04258591 0.0292 0.9401
12 0.36633234 0.09276211 0.0262 0.9663
13 0.27357023 0.07495472 0.0195 0.9858
14 0.19861552 0 0.0142 1
Autovalori della matrice di correlazione: Totale
= 14 Media = 1
Esempio: Importanza dell’Informazione e modalità di acquisizione
SCREE PLOT
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14
FATTORI
AU
TO
VA
LOR
IEsempio: Importanza dell’Informazione
e modalità di acquisizione
n=5 n=6
D_17_s semplicità 0.70 0.70D_17_c costo 0.73 0.73D_17_v velocità di acquisizione 0.82 0.84
D_17_com comodità 0.66 0.71D_17_tda tempo di aggiornamento 0.73 0.81D_20_orp orientamento politico 0.80 0.81D_20_tt temi trattati 0.65 0.70D_20_ag area geografica di interesse 0.68 0.84D_20_d direttore 0.72 0.73D_20_fs formato/stile 0.56 0.63
D_20_ccv con chi vivi 0.54 0.56D_20_r redazione 0.70 0.73
D_20_gs giornalisti/speaker 0.63 0.67D_20_qs qualità servizi 0.46 0.74
totale 9.37 10.18
% 66.96% 72.70%
VARIABILI DESCRIZIONECUMUNALITA'
CONFRONTO CUMUNALITA’ FINALI:
Esempio: Importanza dell’Informazione e modalità di acquisizione
Schema fattoriale
Factor1 Factor2 Factor3 Factor4 Factor5
D_17_s semplicità 0.56626 . . . 0.46051
D_17_c costo 0.35685 . 0.65469 . 0.3875
D_17_v velocità 0.75292 . . . .
D_17_com comodità 0.68764 -0.36206 . . .
D_17_tda tempo di aggiornamento 0.5326 -0.43612 . . -0.38524
D_20_orp orientamento politico . 0.54298 . 0.53024 .
D_20_tt temi trattati 0.41299 . . 0.53419 .
D_20_ag area geografica di interesse . . -0.5248 . 0.38026
D_20_d direttore . 0.74874 . . .
D_20_fs formato/stile 0.38261 . . -0.43544 .
D_20_ccv con chi vivi . 0.50515 . . .
D_20_r redazione . 0.72899 . . .
D_20_gs giornalisti/speaker 0.58604 0.49902 . . .
D_20_qs qualità servizi 0.63683 . . . .
I valori minori di 0.35 non sono stampati.
Lo schema fattoriale a 5 fattori, così come si presenta, è di difficile interpretazione; per questo risulta opportuno ruotare i fattori attraverso un apposito metodo (VARIMAX).
Esempio: Importanza dell’Informazione e modalità di acquisizione
SCHEMA FATTORIALE RUOTATO Factor1 Factor2 Factor3 Factor4 Factor5
D_17_v velocità 0.8578 . . . .
D_17_tdatempo di
aggiornamento 0.7885 . . . .
D_17_com comodità 0.70345 . . 0.39398 .
D_20_qs qualità servizio 0.53133 . . . .
D_20_r rapidità . 0.74824 . . .
D_20_fs formato/stile . 0.71171 . . .
D_20_ccv con chi vivi . 0.70059 . . .
D_20_gs giornalisti/speaker . 0.62098 0.36737 . .
D_20_orp orientamento politico . . 0.8923 . .
D_20_d direttore . . 0.77647 . .
D_17_c costo . . . 0.83334 .
D_17_s semplicità . . . 0.65037 0.45187
D_20_agarea geografica di
interesse . . . . 0.7622
D_20_tt temi trattati . . . . 0.71198
I valori minori di 0.35 non sono stampati.
Rapidità di acquisizione e qualità del servizio offerto.
Esposizione dell’ informazione
Affinità politica/ ideologica
Accessibilità al sevizio
Attrattività argomenti trattati
Esempio: Importanza dell’Informazione e modalità di acquisizione
Coffee Consumption in Italy
Factor Analysis
We ran a Factor Analysis on two numerical questions from the survey that we felt might have correlated variables: Q15 (“What are you general coffee preferences?”) and Q16 (“If you drink your coffee outside (in a bar/coffee place) which are the main factors that, in general, influence your decision on where you drink your coffee?”).
•We used the Principal Components Method that was supposed to solve the multicollinearity problem among our variables and provide us with summarized number of variables/factors which are not correlated (standardized by definition, with mean 0, standard deviation 1) to better explain and understand the specific situation of coffee consumption.
•This represents a preliminary phase for cluster analysis and regression analysis.
Initial Variables used for analysis
On the right, there are our initial 21 variables (taken from Q15 and Q16) that we selected for running the factor analysis.
Judging by the SPSS Correlation Matrix (that is not present in the slide because of its big size – please see the output for the check), we have many variables which are significantly correlated.
Need for FACTOR REDUCTION! Start real
Factor Analysis!
Choosing the right number of factors
1. 1/3 criteria: 21/3= 7 factors2. Variance explained (60%-75%): 7, 8,
9, 10 factors3. Scree Plot: 6, 8, 10 factors4. Eigenvalues: 6, 7, 8 factors
The optimal values seem to be 7 or 8 factors.
Choosing the right number of factors – continued -
The present Scree Plot represents the number 3 criteria of number of factors selection from the previous slide.
Factor Analysis with 8 FactorsAfter analyzing the Communalities table, we identified one variable that is not properly explained by our 8 selected factors (0.387 is not satisfying)! This variable is Price which we consider an important variable in our analysis!
Decreasing the number of factors to 7, will not improve the explanatory power of the variables for the price!
We decided to exclude the Price variable from this factor analysis and consider it as a separate factor (given its very high importance from our qualitative point view) in the future analysis: cluster & regression analysis.
Factor Analysis with 20 FactorsAfter elimination of the Price variable
1. 1/3 criteria: 20/3= 6 factors2. Variance explained (60%-75%): 7, 8, 9
factors3. Scree Plot: 6, 7, 9 factors4. Eigenvalues: 6, 7, 8 factors
The optimal choice seems to be 7 factors.
Factor Analysis with 20 FactorsAfter elimination of the Price variable
-continued-
The present Scree Plot represents the number 3 criteria of number of factors selection from the previous slide.
Factor Analysis with 7 Factors
After analyzing the Communalities table, we that so far the 7 factors properly explain the initial variables. All communalities are over 0.400, which is a good result.
We are ready to take a look at the Rotated Component Matrix to see if the factors make sense/can be explained!
Factors - explained
• The method used for rotation was Varimax.
• After closely analyzing the Rotated Component Matrix, we tried to give meaning to our 7 factors.
• The names of the respective factors are the following:
1. Socialization factor2. Internet/ Trendiness
factor3. Close meeting place
factor4. Intellectual/ non-
smoking factor5. Familiarity factor6. Variety/To Go factor7. Traditionality &
Addiction factor
Factors – explained - continued -
1. Socialization Factor Socialize, sit down, being with friends, cozy atmosphere
2. Internet/Trendiness Factor Wi-Fi availability, internet, trendy place
3. Close meeting place Factor Close to home/work/school, ability to meet people, quality of coffee not important
4. Intellectual/Non-smoking Factor Non-smokers, usually snack, love to read
5. Familiarity Factor Go to the same bar, do not like trying new places, concerned about quality of coffee
6. Variety/To-go Factor Variety and coffee to go, non traditional Italian coffee, preference for taking coffee alone
7. Traditionality/Addiction Factor Italian coffee preference, addicts
The consumption of Digital Music and its impact on the Music
Industry
Factor Analysis
We have taken into consideration questions n° 4,9,10 and therefore we have 24 variables
We asked interviewees to give a score from 1 to 9 (1: “I don’t like it” 9: “I love it”)or to use percentages
Quest.n.4: scoreQuest.n.9: scoreQuest.n.10: %
1. Home2. Car3. Outside in general4. Office/University5. Shops6. Restaurants7. Bars/discoteque8. Record player9. Cassette player10. CD player11. Digital player12. Car stereo13. House stereo14. Radio15. Mobile phone16. USE record player17. USE cassette player18. USE CD Player19. USE digital player20. USE car stereo21. USE house stereo22. USE radio23. USE PC24. USE mobile phone
Total Variance Explained
3,389 14,120 14,120 3,389 14,120 14,120 2,427 10,114 10,114
2,768 11,533 25,653 2,768 11,533 25,653 2,126 8,857 18,972
1,970 8,209 33,862 1,970 8,209 33,862 1,991 8,297 27,268
1,542 6,425 40,287 1,542 6,425 40,287 1,877 7,820 35,088
1,539 6,411 46,698 1,539 6,411 46,698 1,659 6,912 42,000
1,388 5,782 52,480 1,388 5,782 52,480 1,647 6,861 48,861
1,355 5,646 58,126 1,355 5,646 58,126 1,568 6,535 55,396
1,164 4,850 62,976 1,164 4,850 62,976 1,457 6,072 61,469
1,058 4,408 67,385 1,058 4,408 67,385 1,420 5,916 67,385
,956 3,985 71,369
,919 3,831 75,201
,823 3,427 78,628
,714 2,975 81,603
,689 2,872 84,475
,600 2,498 86,973
,565 2,353 89,326
,504 2,098 91,424
,464 1,935 93,359
,455 1,894 95,254
,355 1,480 96,733
,321 1,336 98,070
,253 1,053 99,122
,211 ,878 100,000
-7,9E-017 -3,31E-016 100,000
Component1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Extraction Method: Principal Component Analysis.
Factor Analysis
First hypothesis:
Number of factors: 9
Extraction: Principal Component Analysis
Max number of interaction: 25
Rotation : Varimax
Factor Analysis
Ratio between component number
and variable number
ADEQUATE
For a set of 17 variables, the ideal number of components is 4-5.
In this case for a set of 24 variables, we have
considered 9 components
% global explained variance
OKAbout 68% - the optimal
range is 60% - 70%
Communalities ADEQUATEThe values vary among
0,456 and 0,917
We found a problem looking at the rotated component matrix: CORRELATION AMONG COMPONENTS AND ORIGINAL
VARIABLES NON OPTIMAL problematic 9th component
Factor Analysis
Total Variance Explained
3,389 14,120 14,120 3,389 14,120 14,120 2,634 10,974 10,974
2,768 11,533 25,653 2,768 11,533 25,653 2,339 9,744 20,718
1,970 8,209 33,862 1,970 8,209 33,862 1,891 7,880 28,598
1,542 6,425 40,287 1,542 6,425 40,287 1,810 7,541 36,139
1,539 6,411 46,698 1,539 6,411 46,698 1,776 7,399 43,538
1,388 5,782 52,480 1,388 5,782 52,480 1,721 7,171 50,709
1,355 5,646 58,126 1,355 5,646 58,126 1,486 6,191 56,900
1,164 4,850 62,976 1,164 4,850 62,976 1,458 6,077 62,976
1,058 4,408 67,385
,956 3,985 71,369
,919 3,831 75,201
,823 3,427 78,628
,714 2,975 81,603
,689 2,872 84,475
,600 2,498 86,973
,565 2,353 89,326
,504 2,098 91,424
,464 1,935 93,359
,455 1,894 95,254
,355 1,480 96,733
,321 1,336 98,070
,253 1,053 99,122
,211 ,878 100,000
-3,1E-016 -1,30E-015 100,000
Component1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Extraction Method: Principal Component Analysis.
Second hypothesis:
Number of factors: 8
Extraction: Principal Component Analysis
Max number of interaction: 25
Rotation : Varimax
Factor Analysis
Ratio between component number
and variable number
ADEQUATE
For a set of 17 variables, the ideal number of components is 4-5.
In this case for a set of 24 variables, we have
considered 8 components
% global explained variance
OKAbout 63% - the optimal
range is 60% - 70%
Communalities ACCEPTABLEThe values vary among
0,431 and 0,870
Communalities
1,000 ,626
1,000 ,546
1,000 ,522
1,000 ,450
1,000 ,623
1,000 ,630
1,000 ,450
1,000 ,657
1,000 ,797
1,000 ,545
1,000 ,670
1,000 ,664
1,000 ,646
1,000 ,516
1,000 ,736
1,000 ,355
1,000 ,431
1,000 ,632
1,000 ,870
1,000 ,828
1,000 ,679
1,000 ,685
1,000 ,814
1,000 ,747
"Casa"
"Automobile"
"Fuori in generale"
"Ufficio/Università"
"Negozi"
"Ristoranti"
"Bar/Discoteche"
"Registratore audio"
"Cassette player"
"CD player"
"Digital player"
"Autoradio"
"Stereo di casa"
"Radio"
"Cellulare"
"USO Registratore audio"
"USO Cassette player"
"USO CD player"
"USO Digital player"
"USO Autoradio"
"USO Stereo di casa"
"USO Radio"
"USO Computer"
"USO Cellulare"
Initial Extraction
Extraction Method: Principal Component Analysis.
Factor Analysis
Scree plot ADEQUATE“Quite linear
slope”
From the 9th component , there is little increase in significance explained.
Rotated Component Matrixa
,252 -,364 ,499 ,409
,157 ,164 ,668 -,195
,547 ,306 ,296
,638
,760
,726 -,168 ,218
,536 ,310
,179 ,757 ,164
,242 ,827 ,180
,433 -,375 ,150 ,348 ,151 -,198
,289 ,448 -,570 ,198
,242 -,203 -,279 ,654 ,156
,206 ,742
,246 ,570 ,332
,211 ,825
,527 -,206
,584 -,245
-,264 ,264 -,171 ,277 ,360 -,490
-,294 ,665 -,365 -,190 -,371
-,857 -,207
-,248 ,773
,816
,892
-,160 ,824
"Casa"
"Automobile"
"Fuori in generale"
"Ufficio/Università"
"Negozi"
"Ristoranti"
"Bar/Discoteche"
"Registratore audio"
"Cassette player"
"CD player"
"Digital player"
"Autoradio"
"Stereo di casa"
"Radio"
"Cellulare"
"USO Registratore audio"
"USO Cassette player"
"USO CD player"
"USO Digital player"
"USO Autoradio"
"USO Stereo di casa"
"USO Radio"
"USO Computer"
"USO Cellulare"
1 2 3 4 5 6 7 8
Component
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.
Rotation converged in 10 iterations.a.
Factor AnalysisInterpretation
1. Problems with the 9th component it’s over.
2. We choosed Varimax option to minimize the number of variables that have elevated saturations for each factor
WE CHOOSE THE SECOND HYPOTHESIS
Rotated Component Matrixa
,252 -,364 ,499 ,409
,157 ,164 ,668 -,195
,547 ,306 ,296
,638
,760
,726 -,168 ,218
,536 ,310
,179 ,757 ,164
,242 ,827 ,180
,433 -,375 ,150 ,348 ,151 -,198
,289 ,448 -,570 ,198
,242 -,203 -,279 ,654 ,156
,206 ,742
,246 ,570 ,332
,211 ,825
,527 -,206
,584 -,245
-,264 ,264 -,171 ,277 ,360 -,490
-,294 ,665 -,365 -,190 -,371
-,857 -,207
-,248 ,773
,816
,892
-,160 ,824
"Casa"
"Automobile"
"Fuori in generale"
"Ufficio/Università"
"Negozi"
"Ristoranti"
"Bar/Discoteche"
"Registratore audio"
"Cassette player"
"CD player"
"Digital player"
"Autoradio"
"Stereo di casa"
"Radio"
"Cellulare"
"USO Registratore audio"
"USO Cassette player"
"USO CD player"
"USO Digital player"
"USO Autoradio"
"USO Stereo di casa"
"USO Radio"
"USO Computer"
"USO Cellulare"
1 2 3 4 5 6 7 8
Component
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.
Rotation converged in 10 iterations.a.
Office/UniversityShopsRestaurantsBars/Discoteque
OUTSIDE LISTENING
Record playerUse record playerCassette playerUse cassette player
STEREO
Digital playerUse digital player
DIGITAL PLAYER
RadioUse radio
RADIO
CarCar stereoCD playerUse CD player
CAR LISTENING
HomeHouse stereoUse house stereo HOUSE LISTENING
Outside in generalUse PC
PC
Mobile phoneUse mobile phone
MOBILE PHONE
Factor AnalysisInterpretation