Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document manual MCMC processing #113

Open
lgatto opened this issue Jun 11, 2018 · 14 comments
Open

Document manual MCMC processing #113

lgatto opened this issue Jun 11, 2018 · 14 comments
Assignees

Comments

@lgatto
Copy link
Owner

lgatto commented Jun 11, 2018

Currently there's the tagmMcmcProcess function, but we need to document in details how to process the chains manually (before starting with the GUI).

I would suggest to add a section in the vignette. That section won't be reproducible, because a full MCMCParam object, with four (or more) full chains is too large to store with the package. What we can do is the put that object on a server so that people can download it manually and reproduce the results. The code chunks and the outputs will have to be copied from the console, unfortunately.

@ococrook - could you start this, please.

@ococrook
Copy link
Collaborator

  • Add normalisation to MCMC documentation
  • Map documentation, including convergence
  • Expand MCMC documentation including functions

@LaurentGatto
Copy link

I am working on helper functions (partly pushed), although not yet incorporated in the TAGM vignette. Here are some suggestions for the vignette:

  • In the TAGM MCMC details section, consider clarifying this sentence:

They all oscillate around an average of 360 outliers.

The number of outliers should be similar for each chain, but what number of proportion is acceptable. How should my run compare to 360? Can I, for example, have 2000 for similar training parameters?

  • For the potential scale reduction factors, what if I have 1.07; is this too much? What about 2? What should I do if my value is too high? Possibly try thinning?

  • When plotting the posteriors for each protein at the end, one chain out of the four possible is used. It happens to be the last one that was initiated within to for loop that discarded some iterations. I assume it doesn't matter which one, as they all have been quality checked. This needs clarification though.

Reminder: any changes need to be done in inst/extdoc/mcmc.R.

@ococrook
Copy link
Collaborator

ococrook commented Sep 24, 2018

Yes, some clarification is needed. Thanks for adding the helper functions - though I think we should rename
mcmc_thin_chains to mcmc_burn_chains, since I think this function was to remove the front portion of the chain. We should reserve mcmc_thin_chains for when we sub-sample the chain.

I'll think the Rhat statistic will need more detail. I thought about adding some more diagnostic test too. A review of most methods are given here:
http://www.math.pitt.edu/~swigon/Homework/brooks97assessing.pdf

http://www.stat.columbia.edu/~gelman/research/published/brooksgelman2.pdf argue that 1.2 for the statistics is sufficient, however this is based on an heurisitic argument rather than a "proof". I think if it is less than 1.1 no question would be asked. I have personal preference for 1.05.

@lgatto
Copy link
Owner Author

lgatto commented Sep 24, 2018

Thanks - replaced thin with burn - it would be useful to review these names anyway later.

Feel free to update the mcmc.Rmd files with some clarification and link you suggest above.

@ococrook
Copy link
Collaborator

When plotting the posteriors for each protein at the end, one chain out of the four possible is used. It happens to be the last one that was initiated within to for loop that discarded some iterations. I assume it doesn't matter which one, as they all have been quality checked. This needs clarification though.

I'll write a function that pools the 4 chains into 1, so that all chains are used. This will give better quality results.

@lgatto
Copy link
Owner Author

lgatto commented Oct 2, 2018

The pooling function is not merged (see #116). Do you still need to update tagmMcmcProcess to make use of it, as it seems that there's some code that is duplicated now? Do you also plan to update the MCMC in details section to demonstrate this new function?

@lgatto
Copy link
Owner Author

lgatto commented Oct 2, 2018

By the way, it's probably worth later simplifying the names of these helper function to thin, pool, plot (which I have already done), ... (without making generics/methods out of them though). But this is something that can be ironed out later.

@ococrook
Copy link
Collaborator

ococrook commented Oct 2, 2018

Some code it duplicated, but these function might change slightly in the future so I don't want one to depend on the other and it come back a bite us later. Yes, I plan on updating the documentation. I want to write a few more helper functions so that the analysis is streamlined as possible but still flexible.

@lgatto
Copy link
Owner Author

lgatto commented Oct 2, 2018

Re code duplication, it is the point of one function using the other, to avoid that the manual processing (including pooling) is different from running tagmMcmcProcess. Otherwise, you'll have to change the code twice.

@ococrook
Copy link
Collaborator

ococrook commented Oct 4, 2018

I've added the helper functions I think we need, including access to more diagnostics help. Do I re-write tagmMcmcProcess to use the pool function? It would be good to do some code review and polish these functions. From here I think I can write up some more detailed documentation. More function may need to be added but they are not obvious to me at the moment.

@lgatto
Copy link
Owner Author

lgatto commented Oct 4, 2018 via email

@lmsimp
Copy link
Collaborator

lmsimp commented Oct 24, 2018

hi @ococrook I see it's already on the list above to add to the vignette but I second @lgatto's comment about needing more information about the Rhat statistic's. Using TAGM for the first time and looking at results from Claire's data german.diag gives

> gelman.diag(out)
Potential scale reduction factors:

     Point est. Upper C.I.
[1,]       1.07        1.2

I have no idea if this seems okay, according to you reference in the above comment, if the point estimate is < 1.2, this is okay? In the vignette you just say around 1.

Thanks!

@ococrook
Copy link
Collaborator

ococrook commented Oct 25, 2018

Hi @lmsimp !
The honest answer is assessing convergence is quite challenging and we should be mildly paranoid about convergence. I'd produce some plots of the iterations just to check they are rapidly oscilating around there mean. The Gelman diagnostic is a good tool and if you quote that the statistic was less than 1.2, then everything is probably good as suggest in the paper (though we can never say for certain!).

@lgatto
Copy link
Owner Author

lgatto commented Nov 3, 2018

@ococrook - could you update the roxygen documentation for freq in mcmc_thin_chains (here), specifying that it should be a numeric (between 0 and 1, I assume), ....

Do you think you could also provide defaults for the extra helper function parameters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants