Document manual MCMC processing #113

lgatto · 2018-06-11T08:23:30Z

Currently there's the tagmMcmcProcess function, but we need to document in details how to process the chains manually (before starting with the GUI).

I would suggest to add a section in the vignette. That section won't be reproducible, because a full MCMCParam object, with four (or more) full chains is too large to store with the package. What we can do is the put that object on a server so that people can download it manually and reproduce the results. The code chunks and the outputs will have to be copied from the console, unfortunately.

@ococrook - could you start this, please.

The text was updated successfully, but these errors were encountered:

ococrook · 2018-08-26T14:18:59Z

Add normalisation to MCMC documentation
Map documentation, including convergence
Expand MCMC documentation including functions

LaurentGatto · 2018-09-23T08:04:25Z

I am working on helper functions (partly pushed), although not yet incorporated in the TAGM vignette. Here are some suggestions for the vignette:

In the TAGM MCMC details section, consider clarifying this sentence:

They all oscillate around an average of 360 outliers.

The number of outliers should be similar for each chain, but what number of proportion is acceptable. How should my run compare to 360? Can I, for example, have 2000 for similar training parameters?

For the potential scale reduction factors, what if I have 1.07; is this too much? What about 2? What should I do if my value is too high? Possibly try thinning?
When plotting the posteriors for each protein at the end, one chain out of the four possible is used. It happens to be the last one that was initiated within to for loop that discarded some iterations. I assume it doesn't matter which one, as they all have been quality checked. This needs clarification though.

Reminder: any changes need to be done in inst/extdoc/mcmc.R.

ococrook · 2018-09-24T12:10:01Z

Yes, some clarification is needed. Thanks for adding the helper functions - though I think we should rename
mcmc_thin_chains to mcmc_burn_chains, since I think this function was to remove the front portion of the chain. We should reserve mcmc_thin_chains for when we sub-sample the chain.

I'll think the Rhat statistic will need more detail. I thought about adding some more diagnostic test too. A review of most methods are given here:
http://www.math.pitt.edu/~swigon/Homework/brooks97assessing.pdf

http://www.stat.columbia.edu/~gelman/research/published/brooksgelman2.pdf argue that 1.2 for the statistics is sufficient, however this is based on an heurisitic argument rather than a "proof". I think if it is less than 1.1 no question would be asked. I have personal preference for 1.05.

lgatto · 2018-09-24T16:41:32Z

Thanks - replaced thin with burn - it would be useful to review these names anyway later.

Feel free to update the mcmc.Rmd files with some clarification and link you suggest above.

ococrook · 2018-09-29T17:39:08Z

When plotting the posteriors for each protein at the end, one chain out of the four possible is used. It happens to be the last one that was initiated within to for loop that discarded some iterations. I assume it doesn't matter which one, as they all have been quality checked. This needs clarification though.

I'll write a function that pools the 4 chains into 1, so that all chains are used. This will give better quality results.

lgatto · 2018-10-02T05:03:22Z

The pooling function is not merged (see #116). Do you still need to update tagmMcmcProcess to make use of it, as it seems that there's some code that is duplicated now? Do you also plan to update the MCMC in details section to demonstrate this new function?

lgatto · 2018-10-02T05:04:52Z

By the way, it's probably worth later simplifying the names of these helper function to thin, pool, plot (which I have already done), ... (without making generics/methods out of them though). But this is something that can be ironed out later.

ococrook · 2018-10-02T08:49:08Z

Some code it duplicated, but these function might change slightly in the future so I don't want one to depend on the other and it come back a bite us later. Yes, I plan on updating the documentation. I want to write a few more helper functions so that the analysis is streamlined as possible but still flexible.

lgatto · 2018-10-02T11:20:12Z

Re code duplication, it is the point of one function using the other, to avoid that the manual processing (including pooling) is different from running tagmMcmcProcess. Otherwise, you'll have to change the code twice.

ococrook · 2018-10-04T12:31:05Z

I've added the helper functions I think we need, including access to more diagnostics help. Do I re-write tagmMcmcProcess to use the pool function? It would be good to do some code review and polish these functions. From here I think I can write up some more detailed documentation. More function may need to be added but they are not obvious to me at the moment.

lgatto · 2018-10-04T15:54:09Z

Thanks, I'll have a look. In the future, if you want explicit review of the code (and I think it's generally a good thing to do), I suggest you clone the repo, send a pull request, and as for an explicit review as part of the merging process. Helps to track the review and resulting changes and updates better.

lmsimp · 2018-10-24T11:51:31Z

hi @ococrook I see it's already on the list above to add to the vignette but I second @lgatto's comment about needing more information about the Rhat statistic's. Using TAGM for the first time and looking at results from Claire's data german.diag gives

> gelman.diag(out)
Potential scale reduction factors:

     Point est. Upper C.I.
[1,]       1.07        1.2

I have no idea if this seems okay, according to you reference in the above comment, if the point estimate is < 1.2, this is okay? In the vignette you just say around 1.

Thanks!

ococrook · 2018-10-25T08:56:13Z

Hi @lmsimp !
The honest answer is assessing convergence is quite challenging and we should be mildly paranoid about convergence. I'd produce some plots of the iterations just to check they are rapidly oscilating around there mean. The Gelman diagnostic is a good tool and if you quote that the statistic was less than 1.2, then everything is probably good as suggest in the paper (though we can never say for certain!).

lgatto · 2018-11-03T17:05:39Z

@ococrook - could you update the roxygen documentation for freq in mcmc_thin_chains (here), specifying that it should be a numeric (between 0 and 1, I assume), ....

Do you think you could also provide defaults for the extra helper function parameters?

lgatto added the enhancement label Jun 11, 2018

lgatto assigned ococrook Jun 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document manual MCMC processing #113

Document manual MCMC processing #113

lgatto commented Jun 11, 2018

ococrook commented Aug 26, 2018

LaurentGatto commented Sep 23, 2018

ococrook commented Sep 24, 2018 •

edited

Loading

lgatto commented Sep 24, 2018

ococrook commented Sep 29, 2018

lgatto commented Oct 2, 2018

lgatto commented Oct 2, 2018

ococrook commented Oct 2, 2018

lgatto commented Oct 2, 2018

ococrook commented Oct 4, 2018

lgatto commented Oct 4, 2018 via email

lmsimp commented Oct 24, 2018

ococrook commented Oct 25, 2018 •

edited

Loading

lgatto commented Nov 3, 2018 •

edited

Loading

Document manual MCMC processing #113

Document manual MCMC processing #113

Comments

lgatto commented Jun 11, 2018

ococrook commented Aug 26, 2018

LaurentGatto commented Sep 23, 2018

ococrook commented Sep 24, 2018 • edited Loading

lgatto commented Sep 24, 2018

ococrook commented Sep 29, 2018

lgatto commented Oct 2, 2018

lgatto commented Oct 2, 2018

ococrook commented Oct 2, 2018

lgatto commented Oct 2, 2018

ococrook commented Oct 4, 2018

lgatto commented Oct 4, 2018 via email

lmsimp commented Oct 24, 2018

ococrook commented Oct 25, 2018 • edited Loading

lgatto commented Nov 3, 2018 • edited Loading

ococrook commented Sep 24, 2018 •

edited

Loading

ococrook commented Oct 25, 2018 •

edited

Loading

lgatto commented Nov 3, 2018 •

edited

Loading