-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: Rethinking_2 Chp_04: update trace_to_dataframe() to InferenceData #143
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hey @joannadiong Does it work to do this in 4.32 trace_df = pm_data.posterior.to_dataframe()
trace_df.cov() ? Alternatively, maybe in 4.30 you could do trace_4_1 = pm.sample(1000, tune=1000, return_inferencedata=True) so that 4.32 just becomes
? |
Hi @MarcoGorelli, Many thanks for the suggestions. There are 2 traces in the chapter that call trace_4_1 I received an error for the first approach: # Code 4.32
trace_df = pm.posterior.to_dataframe()
>>> AttributeError: module 'pymc3' has no attribute 'posterior' But the second solution seemed to work for # 4.30
trace_4_1 = pm.sample(1000, tune=1000, return_inferencedata=True)
# 4.32
trace_df = trace_4_1.posterior.to_dataframe()
trace_df.cov() In 4.34, I had to change Overall, I think the fix for this trace is done. The other trace is more problematic. trace_4_3 Doing this in the notebook: # 4.42
trace_4_3 = pm.sample(1000, tune=1000, return_inferencedata=True)
# 4.45
trace_4_3_df = trace_4_3.posterior.to_dataframe()
# 4.50
mu_at_50 = trace_4_3_df["a"] + trace_4_3_df["b"] * (50 - d2.weight.mean())
# 4.52
az.hpd(mu_at_50) Gives this error at 4.52: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-676334186adf> in <module>
----> 1 az.hpd(mu_at_50)
2
~/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/stats/stats.py in hpd(ary, hdi_prob, circular, multimodal, skipna, group, var_names, filter_vars, coords, max_modes, dask_kwargs, **kwargs)
341 ("hpd will be deprecated " "Please replace hdi"),
342 )
--> 343 return hdi(
344 ary,
345 hdi_prob,
~/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/stats/stats.py in hdi(ary, hdi_prob, circular, multimodal, skipna, group, var_names, filter_vars, coords, max_modes, dask_kwargs, **kwargs)
498 ary = np.expand_dims(ary, 0)
499
--> 500 ary = convert_to_dataset(ary, group=group)
501 if coords is not None:
502 ary = get_coords(ary, coords)
~/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/data/converters.py in convert_to_dataset(obj, group, coords, dims)
177 xarray.Dataset
178 """
--> 179 inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
180 dataset = getattr(inference_data, group, None)
181 if dataset is None:
~/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/data/converters.py in convert_to_inference_data(obj, group, coords, dims, **kwargs)
129 "cmdstanpy fit",
130 )
--> 131 raise ValueError(
132 "Can only convert {} to InferenceData, not {}".format(
133 ", ".join(allowable_types), obj.__class__.__name__
ValueError: Can only convert xarray dataarray, xarray dataset, dict, netcdf filename, numpy array, pystan fit, pymc3 trace, emcee fit, pyro mcmc fit, numpyro mcmc fit, cmdstan fit csv filename, cmdstanpy fit to InferenceData, not Series I wanted to run model m4_3 separately in a Python file to inspect the objects: # 4.42
xbar = d2.weight.mean()
with pm.Model() as m4_3:
a = pm.Normal("a", mu=178, sd=20)
b = pm.Lognormal("b", mu=0, sd=1)
sigma = pm.Uniform("sigma", 0, 50)
mu = a + b * (d2.weight - xbar)
height = pm.Normal("height", mu=mu, sd=sigma, observed=d2.height)
trace_4_3 = pm.sample(1000, tune=1000, return_inferencedata=True) but interestingly, the m4_3 code produces an error when run in the Python file: Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b, a]
█/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
WARNING (theano.tensor.opt): Failed to infer_shape from Op DeepCopyOp.
Input shapes: [(TensorConstant{352},)]
Exception encountered during infer_shape: <class 'ValueError'>
Exception message: Cannot compute test value: input 0 (DeepCopyOp.0) of Op Shape_i{0}(DeepCopyOp.0) missing default value.
Traceback: Traceback (most recent call last):
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/graph/op.py", line 81, in compute_test_value
storage_map[ins] = [ins.get_test_value()]
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/graph/basic.py", line 422, in get_test_value
raise TestValueError(f"{self} has no test value {detailed_err_msg}")
theano.graph.utils.TestValueError: DeepCopyOp.0 has no test value
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/tensor/opt.py", line 1097, in get_node_infer_shape
o_shapes = shape_infer(
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/tensor/opt.py", line 1204, in default_infer_shape
rval.append(self.shape_tuple(r))
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/tensor/opt.py", line 1191, in shape_tuple
return tuple([self.shape_ir(i, r) for i in range(r.ndim)])
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/tensor/opt.py", line 1191, in <listcomp>
return tuple([self.shape_ir(i, r) for i in range(r.ndim)])
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/tensor/opt.py", line 1179, in shape_ir
s = Shape_i(i)(r)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/graph/op.py", line 253, in __call__
compute_test_value(node)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/theano/graph/op.py", line 94, in compute_test_value
raise ValueError(
ValueError: Cannot compute test value: input 0 (DeepCopyOp.0) of Op Shape_i{0}(DeepCopyOp.0) missing default value. The issues here and here seem the most closely related, but I did not import Theano and PyMC3 libraries separately. What might be the best way to proceed? Thanks in advance. |
I don't know about that theano error you've received, but Alex and Osvaldo are subscribed to this thread so they may have some more useful input. Perhaps if you share your entire Python script, as well versions of PyMC3 and theano-pymc, then that will help with debugging |
Thanks @MarcoGorelli! Code 4.32
I used Error at Code 4.52 Interestingly, I ran model m4_3 in Python script afresh and the Theano error did not replicate! (Not sure how it got magicked away, but it's one less problem.) The code still fails at 4.52. I ran these relevant bits in a script: import arviz as az
import pandas as pd
import pymc3 as pm
# 4.26
d = pd.read_csv("Data/Howell1.csv", sep=";", header=0)
d2 = d[d.age >= 18]
# 4.42
xbar = d2.weight.mean()
with pm.Model() as m4_3:
a = pm.Normal("a", mu=178, sd=20)
b = pm.Lognormal("b", mu=0, sd=1)
sigma = pm.Uniform("sigma", 0, 50)
mu = a + b * (d2.weight - xbar)
height = pm.Normal("height", mu=mu, sd=sigma, observed=d2.height)
trace_4_3 = pm.sample(1000, tune=1000, return_inferencedata=True)
# Code 4.45
trace_4_3_df = trace_4_3.posterior.to_dataframe()
trace_4_3_df.cov().round(3)
# Code 4.50
mu_at_50 = trace_4_3_df["a"] + trace_4_3_df["b"] * (50 - d2.weight.mean())
# Code 4.52
az.hpd(mu_at_50) This produced the following ValueError: Backend TkAgg is interactive backend. Turning interactive mode on.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, b, a]
█/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/math.py:246: RuntimeWarning: divide by zero encountered in log1p
return np.where(x < 0.6931471805599453, np.log(-np.expm1(-x)), np.log1p(-np.exp(-x)))
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/stats/stats.py:340: UserWarning: hpd will be deprecated Please replace hdi
warnings.warn(
Traceback (most recent call last):
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-f0b5e524fe36>", line 27, in <module>
az.hpd(mu_at_50)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/stats/stats.py", line 343, in hpd
return hdi(
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/stats/stats.py", line 500, in hdi
ary = convert_to_dataset(ary, group=group)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/data/converters.py", line 179, in convert_to_dataset
inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/arviz/data/converters.py", line 131, in convert_to_inference_data
raise ValueError(
ValueError: Can only convert xarray dataarray, xarray dataset, dict, netcdf filename, numpy array, pystan fit, pymc3 trace, emcee fit, pyro mcmc fit, numpyro mcmc fit, cmdstan fit csv filename, cmdstanpy fit to InferenceData, not Series Versions in case helpful: Name: pymc3 Name: Theano-PyMC |
You defined it in 4.32: trace_df = pm.trace_to_dataframe(trace_4_1)
trace_df.cov()
+ # update: trace saved with InferenceData format
+ # trace_to_dataframe() retained until update to InferenceData is stable
+ with pm.Model() as m4_1:
+ pm_data = az.from_pymc3(trace=trace_4_1) |
Got it, thanks! Fixed the structure of trace_df = pm_data.posterior.to_dataframe()
trace_df.cov() It dealt with the long error. The notebook now breaks at Code 4.59, 4.60: height_pred = pm.sample_posterior_predictive(trace_4_3, 200, m4_3)
height_pred_hpd = az.hpd(height_pred["height"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-61-0390e3860878> in <module>
----> 1 height_pred_hpd = az.hpd(height_pred["height"])
2
KeyError: 'height' Tried height_pred = pm.sample_posterior_predictive(trace_4_3_df, 200, m4_3)
/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/sampling.py:1687: UserWarning: samples parameter is smaller than nchains times ndraws, some draws and/or chains may not be represented in the returned posterior predictive sample
warnings.warn(
█Traceback (most recent call last):
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-15-f0c44fac5ce3>", line 1, in <module>
height_pred = pm.sample_posterior_predictive(trace_4_3_df, 200, m4_3)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pymc3/sampling.py", line 1729, in sample_posterior_predictive
param = _trace[idx % len_trace]
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/joanna/Dropbox/Sketchbook/python/resources/env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 0 |
worked well for me and also means that we don't run into errors with using |
In:
I have ported part of this code to save the trace as an InferenceData object:
The pm_data object will be needed in Code sections 4.33, 4.34, 4.36. (
trace_to_dataframe
is retained for now so the notebook does not break.)I am unsure how to calculate the covariance matrix in
trace_df.cov
as above. I may have missed it, but there does not seem to be a function in arviz that does this. So the covariance matrix might need to be computed frompm_data.mu.values
andpm_data.sigma.values
.Also, by default, the sampler generates 4 chains with 1000 draws. Should all chains be involved in computing the covariance matrix?
Could you advise what is the best approach to get the covariance matrix, for later separation into variances and correlations? Thanks for the pointers!
Ping @AlexAndorra or @aloctavodia