Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Data Format and Class Label for kmeans #6

Open
GoogleCodeExporter opened this issue May 15, 2015 · 1 comment
Open

Training Data Format and Class Label for kmeans #6

GoogleCodeExporter opened this issue May 15, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

Hi,

I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations 
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt 
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every 
vector. How can I changed my data format to yours since the square box at the 
end may not be the only one? I tried to fetch your demo.train file in matlab, 
and it doesn't let me do that either.

For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans 
--mini_batch_size 100 --iterations 500 --objective_after_init 
--objective_after_training --training_file demo/demo.train --model_out 
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class 
label in the training data (demo.train) can be assigned with any values, right? 
Of course, I chose, say, all 1 among these values: 1,0,-1.

I look forward to your clarification. 

Thank you,


Fred

Original issue reported on code.google.com by [email protected] on 23 Sep 2011 at 3:56

Attachments:

@GoogleCodeExporter
Copy link
Author

I have solved the training data by putting '\n' in every line of my training 
data (SMLFAutoTrain1s512val.txt). But I found that a lot zeros in every lines 
after my 78-dimensions in each vector in the output file 
(CSMLFAutoTrain1s512val.txt). How can I run the kmeans program not having so 
much zeros in every lines? What is the first field in every line of my output 
data since they are all zeros? I assume that is the class label. Please correct 
me if I am wrong here. 

Original comment by [email protected] on 23 Sep 2011 at 4:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant