Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The magic number 0.43 #8

Closed
thanish opened this issue Oct 27, 2016 · 5 comments
Closed

The magic number 0.43 #8

thanish opened this issue Oct 27, 2016 · 5 comments
Labels

Comments

@thanish
Copy link
Contributor

thanish commented Oct 27, 2016

The prediction was done on which testing dataset to reach an accuracy of 0.43 ? I mean, was the prediction done on the blind data set with SHANKLE well or NEWBY well or any other data?

@kwinkunks
Copy link
Member

kwinkunks commented Oct 27, 2016

Please use the printed article as the intended view of performance, not the notebooks. So the blind well is SHANKLE and the F1 score is 0.43. I will try to make the notebooks consistent today.

Notwithstanding this, if we are able to change the 'true' blind test with new data (see #2 — we are still waiting to resolve this) then of course the performance of Brendon's model will change.

@kwinkunks
Copy link
Member

Reopening because, while b9f2b15 has resolve the mismatch between the article nd the notebook, I cannot reproduce the F1 score in the article.

Right now, it's 0.39 (cf 0.43 in the article). Hoping I just missed something when I merged the final version of the notebook. Maybe @brendonhall can check the notebook and make sure it's the same as his. I'm thinking the main differences could be the test/train random seed, and the values of the hyperparameters c and gamma.

@LukasMosser
Copy link
Contributor

See this closed issue by @CannedGeo #10 Training_data.csv?

@brendonhall
Copy link
Contributor

Hi everyone, sorry for being late to the party on this. The problem is that the main notebook and the article are a little out of sync. I streamlined the narrative of the notebook quite a bit for the article. One of the things I did was create a dataset for the article that only had a complete set of well logs. facies_vectors.csv is the original training set from the website I obtained the data from. I removed the vectors that don't have a PE value, and saved that dataset has training_data.csv. That is what I used for the article, and I have changed the notebook to match.

Second, in the article I used test_size=0.1 when splitting the training and cross validation sets. The notebook had a value of 0.2, so I have changed this to match.

After these changes, I'm getting an overall F1 of 0.42 for the classifier. Not quite what is reported in the article. When I run the pure article code again I'm also getting 0.42. Perhaps there has been some change in the ordering/randomization of the data? Maybe some libraries have been updated? It's Halloween???

@kwinkunks
Copy link
Member

Awesome, thanks @brendonhall.

As far as I'm concerned, 0.42 is the same as 0.43. I am very confident that someone will get a better fit than 0.43 soon anyway, if they haven't already. So if this isn't a non-issue already, I think it will be soon.

@thanish Thank you again for raising this. I'll close it now. The notebook is a good reflection of the 'base case'. We can of course re-open if need be, I trust you will let me know. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants