Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with cor() creating NA's in #149

Open
GuillaumeBot opened this issue May 25, 2018 · 1 comment
Open

Problem with cor() creating NA's in #149

GuillaumeBot opened this issue May 25, 2018 · 1 comment

Comments

@GuillaumeBot
Copy link

GuillaumeBot commented May 25, 2018

Hi all,
Hi @Anton262 & @VarunKShetty & @tevgeniou,

I wanted to applied MarketSegmentationProcessInClassParts1and2.Rmd to run an unsupervised learning on our data set for final project. However, I have difficulties to run the code, since my data are of different kinds (integer, factors, mainly) as opposed to assignment 3 boat data.
Error log is the following (line 224):

3. stop("supply both 'x' and 'y' or a matrix-like 'x'")
2. cor(r, use = "pairwise")
1. principal(ProjectDataFactor, nfactors = max(factors_selected), rotate = rotation_used, score = TRUE)

The error occurs here:

Rotated_Factors<-round(Rotated_Results$loadings,2)
Rotated_Factors<-as.data.frame(unclass(Rotated_Factors))
colnames(Rotated_Factors)<-paste("Comp.",1:ncol(Rotated_Factors),sep="")

sorted_rows <- sort(Rotated_Factors[,1], decreasing = TRUE, index.return = TRUE)$ix
Rotated_Factors <- Rotated_Factors[sorted_rows,]

iprint.df(Rotated_Factors, scale=TRUE)
write.csv(Rotated_Factors, file = "Rotated_Factors.csv")

but I believe this is the root cause. So I tried to change cor() to cor2(), which should handle the different types... https://www.rdocumentation.org/packages/ParallelPC/versions/1.2/topics/cor2

thecor = round(cor2(ProjectDataFactor),2) #Cor2 is supposed to handle the different type of variable
iprint.df(round(thecor,2), scale=TRUE)
write.csv(round(thecor,2), file = "thecor.csv")

Any idea?

@tevgeniou
Copy link
Contributor

the problem is not with cor, but with "principal", which works only for numeric data. Choices are to either generate new numeric (meaningful) features and use those, or (less preferred) to try what is called "correspondence analysis" (https://en.wikipedia.org/wiki/Correspondence_analysis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants