Mirio De Rosa recently published an excellent article on Data Science Central about “How to go from data to information to insights and unleash the power of strategic thinking”. He walks through the procedures of
- understanding the raw structure of the data,
- surveying various possible “analytical technolog[ies]” to fit the given data,
- selecting the most sensible model, and
- interpreting the results in order to
- deliver insight, and
- decide on an informed strategy.
It will help to read his blog post first before continuing, but it’s not essential to my point. In the last section of the article, De Rosa demonstrates an incisive analysis of the model results and after delivering his insight issues a judicious recommendation on how to deploy that insight for strategic planning. In brief, he employed correspondence analysis to fit a survey data about automobile brand perception, layered a clustering algorithm on top of it, and from the resulting perceptual map derived qualitative segmentations critical to marketers in brand positioning and communicating desired perceptions thereof.
While the analysis is solid and the recommendation is sound, the nuanced nature of the subject in question, what image people think a brand evokes, also begs the question of whether or not different interpretations of the results are possible. To this end I reproduced the data and modeling process to see if I would come to at least a variation on the same conclusions.
De Rosa used the marketing tool MM4XL. I used R. But I also wanted to be able to do this as if I were using an enterprise software to present an open-source alternative to MM4XL, at least with respect to a GUI. So I used R Commander with FactorMineR to run correspondence analysis with clustering of the principal components. Here is what I found.
As expected, the factor map (above) on the first two dimensions is reproduced exactly here as in De Rosa’s. There are two key differences between these results and those De Rosa assembled with MM4XL.
First is the clustering around Audi, Mercedes, and BMW. De Rosa groups Audi with BMW in what he calls “Expensive emotion” segment, which comprises the Attractive, Good feeling, and Great image attributes and puts Mercedes Benz as the sole brand within HiTech. In my results, Mercedes Benz switches positions with Audi to be with BMW, and puts Audi on its own subclass that corresponds more with the technological profile. All three cars reside within what De Rosa labels as the Emotional-Expensive (North East) quadrant.
Second is the classification for all the brands that are west of the y-axis. Opel, Renault, and Ford are all tightly packed around the Environment attribute. VW and Peugeot are associated with Brand I like, but while De Rosa claims FIAT in this attribute, my results do not. FIAT is an outlier like Volvo and Citroen, more closely associated with Ford, and certainly much closer to Environment than Brand I like. In fact the raw survey results show that of the 106 responses to FIAT, there were 23 entries for Environment versus 16 for Brand I like.
…
A few conditions are causing the difficulty with FIAT. Even, there’s a Harvard Business Review report, “Fiat’s Extreme Makeover“, that gives color to the evidence we find in our data about this brand’s uniqueness. That article reports on the issue, citing that “In a world of low-priced apples, Fiat is striving to be an orange….Many people don’t fit Fiat’s mold, and even individuals well-targeted by demographic criteria…might not be early adopters of a new brand.”
It’s possible that there is a hierarchical structure in the attributes themselves. Specifically, some respondents might interpret Environment and Nice ad as subclasses of Brand I like: I like Brand X by virtue of its environmentally conscious image. In cases where the distinction between some attributes is not of polarity but of degree, a hierarchical multiple factor analysis method would be appropriate (Le Dien and Pagès 2003). More generally, when dealing with survey data, it’s important to consider questions of survey methodology and address them by extending the analytical methods accordingly.
Clustering the attributes themselves reveals a possible cognitive structure around this perceptual taxonomy. In the figure above, we see that Environment and Brand I like are closely related as attributes of a car brand. So are Good feeling & Great image, and Reliable & Hi-Tech . The latter pairing maps out a probable tautology in the survey logic: a car is perceived high-tech because it is believed reliable, and a car is believed reliable because it is perceived high-tech.
I stress this point: surveys are highly susceptible to respondents’ interpretive variations which consequently charges the collected data with subjective bias. Therefore it’s prudent to volunteer the open ended nature of one’s analysis. Doing so would help to neutralize the conclusion’s valence and maintain impartiality. And De Rosa is shrewd to concede in his notes, saying “I wasn’t quite sure about my final interpretation, so I left room open for new labels.” Ultimately, I point out these nuances to demonstrate that it is best to prescribe data-driven recommendations with a healthy dose of skepticism throughout the analytical process.
Notes
Although the dependency between brand and attribute is statistically significant (at 0.05 level), the correlation coefficient as measured by the square root of trace is rather weak at 0.18. Also, NbClust produced 3 for best number of clusters, see Brand-Attribute map below, which corresponds with the 3 clusters of the attributes themselves as shown in the “Hierarchical Clustering” dendrogram above.
References
Le Dien S, Pagès J (2003). “Hierarchical Multiple Factor Analysis: application to the comparison of sensory profiles.” Food Quality and Preference, 14, 397–403.
Featured image: 1910 Fiat S76. Photo credit: Stefan Marjoram. Flicker creative commons license. Source: https://www.flickr.com/photos/stefanmarjoram/24936480825/in/album-72157625974670240/
R Code
setwd(“~/Google Drive/Coding R/My Blog”)
carBrand <- read.csv(‘AutomobilesBrandPerception.csv’, header = TRUE, sep = ‘,’)
CAcarbrand <- carBrand[,2:9]
row.names(CAcarbrand)<-carBrand[,1:1]
head(carBrand)
# Chisq to test dependence between rows and columns
chisq <- chisq.test(CAcarbrand)
chisq # p-value = 0.01342
# ======================================================================= #
# CORRESPONDENCE ANALYSIS. Requires FactorMineR
install.packages(‘FactoMineR’)
library(FactoMineR)
install.packages(“factoextra”)
library(factoextra)
res.ca <- CA(CAcarbrand, graph = FALSE)
print(res.ca)
summary(res.ca)
# Significance test of association between rows and columns
# Trace is the table’s total inertia, sum of the eigenvalues.
# Square root of trace ~ correlation coefficient btw row and columns
trace <- sum(get_eigenvalue(res.ca)$eigenvalue)
sqrt(trace)
# chi-square statistic = trace * n, grand total of the table
chisq2 <- trace*sum(as.matrix(CAcarbrand))
df<- (nrow(CAcarbrand)-1)*(ncol(CAcarbrand)-1)
pval <- pchisq(chisq2, df=df, lower.tail = FALSE)
# Dim eigenvalue
head(round(get_eigenvalue(res.ca),2))
# Plot
plot.CA(res.ca, aaxes = c(1,2), col.row = “blue”, col.col = “red”)
## Cluster Analysis & Visualization
## —-
# K-Means
res.coords <- data.frame(rbind(
res.ca$col$coord[,1:2],
res.ca$row$coord[,1:2]))
# optimal number of clusters
fviz_nbclust(res.coords, kmeans, method=”gap_stat”)
# Visualize Optimall number of clusters – k=Best
install.packages(“NbClust”)
library(“NbClust”)
res.nbclust <- NbClust(res.coords, distance = “euclidean”,
min.nc = 2, max.nc = 8,
method = “complete”, index = “all”)
factoextra::fviz_nbclust(res.nbclust) + theme_minimal()
# Select K and Visualize clustered results
k=as.numeric(names(sort(table(res.nbclust$Best.nc[1,]), decreasing = TRUE)[1]))
km.res <- kmeans(res.coords, k, nstart=25)
# Visualize factor map with clustering on top
fviz_cluster(km.res, data=res.coords, frame.type = “convex”
, title = “Brand-Attribute Map”)+
theme_minimal()
# Hierarchical Clustering
# Dissimilarity matrix
d <- dist(res.coords, method=”euclidean”)
# Hierarchical clustering using Ward’s method
res.hc <- hclust(d, method=”ward.D2″)
# Cut tree into X groups
grp <- cutree(res.hc, k=k)
# Visualize
plot(res.hc, cex=0.6, main = “Hierarchical Clustering Dendrogram”)
rect.hclust(res.hc, k=k, border=2:5)
# ======================================================================= #
# R Commander GUI
install.packages(“Rcmdr”)
install.packages(“RcmdrPlugin.FactoMineR”)
library(Rcmdr)
library(RcmdrPlugin.FactoMineR)
# Open
Commander()