Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function for cross-validation of bias parameters #2

Open
connormayer opened this issue Jul 10, 2021 · 5 comments
Open

Add function for cross-validation of bias parameters #2

connormayer opened this issue Jul 10, 2021 · 5 comments
Assignees

Comments

@connormayer
Copy link
Owner

This will involve splitting the data into training and validation sets and comparing parameter values.

@connormayer
Copy link
Owner Author

Do we sample tokens or types? Tokens maybe give a better approximation of actual acquisition. Maybe option for both. Type sampling is more straightforward to implement.

@adelrtan
Copy link
Contributor

Just a thought: Maybe make this function available to compare different temperature values too!

Or perhaps even different combinations of bias & temperature values!

@adelrtan
Copy link
Contributor

More thoughts: Perhaps apply the softmax function (similar to what you did for AIC/BIC/AIC-C weights -- I thought that was really cool!) to quantify the conditional probability of the different hyperparam values being the best ones.

@adelrtan
Copy link
Contributor

Do we sample tokens or types? Tokens maybe give a better approximation of actual acquisition. Maybe option for both. Type sampling is more straightforward to implement.

Yeah, I think it'll be great to have both options.

Re token sampling: I discovered the utility of the sample function while writing the code for monte_carlo.R. This function makes random draws according to a probability distribution.
We just need to create a probability distribution over input-output pairs, and make random draws based on this distribution.
Update the new frequencies for the train & validation sets, then delete any resulting "empty" tableaux (i.e. tableaux with 0 tokens) from each set.

@connormayer
Copy link
Owner Author

A few thoughts about your comments @adelrtan:

  • does it make sense to do cross-validation on the temperature parameter? I think the idea with this parameter is that wug tests tend to be less categorical in a way that's (perhaps) independent of the grammar. Fitting the temperature value to non-wug data seems to contradict this. If the user wants to find the temperature value that works best for a wug data set, it's easy enough for them to do that by looping over possible values. We could add a function that does this, but it doesn't seem high priority.
  • The cross-validation I added is for tokens rather than types. I'm not sure whether type-based cross-validation really makes sense, but we should talk about it.
  • Adding the softmax function for cross-validation is a cool idea, but I'm unsure if it can be interpreted in the same way as the softmax of AIC/BIC weights are.

@connormayer connormayer self-assigned this Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants