diff --git a/data/metropolis_hastings_example.csv b/data/metropolis_hastings_example.csv new file mode 100644 index 0000000..7c1e00e --- /dev/null +++ b/data/metropolis_hastings_example.csv @@ -0,0 +1,100 @@ +5.133879959574598 +4.937817051434988 +5.277966400350326 +4.978073616723295 +4.768264765712152 +5.00325432375216 +5.140401818212519 +5.220085220743337 +4.9103261367115305 +4.916817929108131 +4.676230504106508 +4.681155942296321 +4.939456345611404 +5.404674908950762 +5.0000565051183505 +4.946723670230919 +5.071484229192757 +4.702765071407966 +4.990440219285413 +4.757565482906698 +4.90342085169882 +4.909731907856895 +4.7502350067852035 +4.735636087526655 +4.831835910148419 +5.083641260592571 +4.941212006801077 +4.836347213445427 +4.962098544360349 +4.686040420643092 +5.124470580419098 +5.066542799622644 +5.116490362362727 +4.978068545964261 +4.733571541168396 +4.799377562666428 +4.731238173387429 +4.927630515268987 +5.276180964383851 +4.605853804427195 +4.895306070517735 +5.125441726077501 +5.048540015619509 +5.134475090531417 +5.020340763177987 +4.744641578259603 +4.9620314387166395 +4.664576116370324 +5.244742999934165 +5.279274831080141 +5.089479340563577 +5.071743145634021 +4.942265997797657 +5.089398760768866 +5.458231752210997 +4.786148013337675 +4.7972136274342025 +4.93152718719522 +4.978778063916119 +4.801232311921279 +4.686526370326429 +5.105113175288019 +4.8016552715132645 +4.978252685586892 +4.903595625784314 +5.150389785891813 +4.957095292110073 +4.851463539442262 +5.18128245975816 +5.139918945662961 +5.020216551825798 +4.84115302966151 +4.975904991006625 +5.401969287491859 +4.811230682660985 +4.7993454128378925 +5.08505429688905 +4.6388175324133565 +5.363488556427758 +4.825053384694479 +5.304806718052644 +5.321881179050992 +5.044592651869508 +4.944522400406804 +4.9930450107605076 +5.400020830329514 +5.162848379281823 +4.851419760797657 +5.021789943039787 +4.831523320170004 +4.870328937789049 +4.927266223569667 +5.023563865935118 +4.969113692374906 +4.997160262493862 +4.747684042472069 +5.070321872431043 +4.899256187007241 +5.111542353099519 +5.201047790285543 diff --git a/docs/img/dimension_plot.png b/docs/img/dimension_plot.png new file mode 100644 index 0000000..6aeabc2 Binary files /dev/null and b/docs/img/dimension_plot.png differ diff --git a/docs/img/dimension_sphere_diagram.png b/docs/img/dimension_sphere_diagram.png new file mode 100644 index 0000000..6f0a970 Binary files /dev/null and b/docs/img/dimension_sphere_diagram.png differ diff --git a/docs/img/markov_chain.png b/docs/img/markov_chain.png new file mode 100644 index 0000000..60c894b Binary files /dev/null and b/docs/img/markov_chain.png differ diff --git a/docs/metropolis-hastings.html b/docs/metropolis-hastings.html index 39fbc4e..84a7582 100644 --- a/docs/metropolis-hastings.html +++ b/docs/metropolis-hastings.html @@ -4956,7 +4956,10 @@

On this page

  • Part 1: sampling from a normal distribution
  • Part 2: determining mean and standard deviation from data
  • -
  • Hyperspheres in hyperspace don’t make much sense
  • +
  • Volume in hyperspace don’t make much sense +
  • @@ -4987,10 +4990,8 @@

    Introduction

    Welcome back!

    Plan for today:

      -
    1. Problem Statement
    2. Markov Chains
    3. Metropolis-Hastings explained
    4. -
    5. What are the limitations?

    Recap from last week:

    What is MCMC trying to solve: MCMC

    @@ -5005,7 +5006,7 @@

    Markov Chains

    For example:

    The transitions can be measured as discrete time steps with the following matrix representation

    -
    +
    import numpy as np
     T = np.matrix([[0.1, 0.1, 0.8], [0.5, 0.1, 0.4], [0.5, 0.2, 0.3]])
     print(T)
    @@ -5016,7 +5017,7 @@

    Markov Chains

    Given an initial starting position

    -
    +
    v0 = np.matrix([0.1, 0.4, 0.5])
     print(v0)
    @@ -5024,7 +5025,7 @@

    Markov Chains

    We can simulate the probabilities of the next step given the transition matrix.

    -
    +
    v1 = v0*T
     print(v1)
    @@ -5032,7 +5033,7 @@

    Markov Chains

    Following this again we can simulate the states after two steps

    -
    +
    v2 = v1*T
     print(v2)
    @@ -5040,21 +5041,21 @@

    Markov Chains

    There’s a convenient way to calculate the next step given the starting condition.

    -
    +
    print(v0*T**2)
    [[0.316 0.139 0.545]]

    What happens if we continue doing this for a long time?

    -
    +
    print(v0*T**100)
    [[0.35714286 0.14935065 0.49350649]]

    And how does this change when taking the next step?

    -
    +
    print(v0*T**101)
    [[0.35714286 0.14935065 0.49350649]]
    @@ -5119,7 +5120,7 @@

    Define probability density function

    -
    +
    import numpy as np
     from scipy.stats import norm
     
    @@ -5129,21 +5130,21 @@ 

    Define

    Define proposal distribution

    -
    +
    def proposal(x):
       return norm.rvs(x, 1, 1)[0]

    Initialise sampler

    -
    +
    current = 0.0
     samples = [current]

    Sample from distribution

    -
    +
    for i in range(10000):
         prop = proposal(current)
         accept_reject = prob(prop)/prob(current)
    @@ -5159,21 +5160,21 @@ 

    Sample from distr

    Plot distribution

    -
    +
    import matplotlib.pyplot as plt
     plt.hist(samples)
    -
    (array([   7.,   31.,  192.,  669., 1232., 1248., 1012.,  471.,  100.,
    -          22.]),
    - array([0.        , 0.37565749, 0.75131498, 1.12697247, 1.50262996,
    -        1.87828745, 2.25394494, 2.62960243, 3.00525992, 3.38091741,
    -        3.7565749 ]),
    +
    (array([   6.,   45.,  377., 1127., 1581., 1315.,  503.,  105.,    9.,
    +           2.]),
    + array([0.        , 0.43380605, 0.8676121 , 1.30141815, 1.7352242 ,
    +        2.16903025, 2.6028363 , 3.03664235, 3.4704484 , 3.90425445,
    +        4.3380605 ]),
      <BarContainer object of 10 artists>)
    -

    +

    @@ -5181,13 +5182,13 @@

    Plot distribution

    Trace plot

    -
    +
    draw = [draw for draw, _ in enumerate(samples)]
     plt.plot(draw, samples)
    -

    +

    @@ -5196,13 +5197,36 @@

    Trace plot

    Part 2: determining mean and standard deviation from data

    +

    I suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.

    +
    +
    def eval_prob(data, mu, sigma):
    +    return np.log(norm.pdf(data,mu,sigma)).sum()
    +
    +

    Here’s also a multivariate random number generator to generate proposals.

    +
    +
    def proposal_multi(mu, sigma):
    +    mean = [mu, sigma]
    +    cov = [[0.2, 0], [0, 0.2]]  # diagonal covariance
    +    return np.random.multivariate_normal(mean, cov, 1)[0]
    +
    +

    Here is how you’d call the proposal function

    +
    prop_mu, prop_sigma = proposal_multi(current_mu, current_sigma)
    +

    the accept_reject probability should also be updated to account for the log-probability

    +
    accept_reject = np.exp(prob_prop - prob_current)
    +

    You should sample a 95% interval including a \(\mu = 5\) and a \(\sigma = 0.2\). This may be difficult at first to sample and I would recommend initialising at these values.

    -
    -

    Hyperspheres in hyperspace don’t make much sense

    -

    Why volume in higher dimensions is non-intuitive.

    +
    +

    Volume in hyperspace don’t make much sense

    +

    Given an n-dimensional cube of length=2 you place spheres of diameter=1 in each of the corners and then place another sphere in the middle.

    +

    How does the size of the middle sphere change as dimensions increase?

    +

    +
    +

    Radius of middle sphere as dimension increases

    +

    +
    diff --git a/docs/search.json b/docs/search.json index 5f5d659..54825a7 100644 --- a/docs/search.json +++ b/docs/search.json @@ -301,7 +301,7 @@ "href": "metropolis-hastings.html", "title": "Metropolis Hastings", "section": "", - "text": "Welcome back!\nPlan for today:\n\nProblem Statement\nMarkov Chains\nMetropolis-Hastings explained\nWhat are the limitations?\n\nRecap from last week:\nWhat is MCMC trying to solve: MCMC", + "text": "Welcome back!\nPlan for today:\n\nMarkov Chains\nMetropolis-Hastings explained\n\nRecap from last week:\nWhat is MCMC trying to solve: MCMC", "crumbs": [ "Course materials", "Metropolis Hastings" @@ -312,7 +312,7 @@ "href": "metropolis-hastings.html#introduction", "title": "Metropolis Hastings", "section": "", - "text": "Welcome back!\nPlan for today:\n\nProblem Statement\nMarkov Chains\nMetropolis-Hastings explained\nWhat are the limitations?\n\nRecap from last week:\nWhat is MCMC trying to solve: MCMC", + "text": "Welcome back!\nPlan for today:\n\nMarkov Chains\nMetropolis-Hastings explained\n\nRecap from last week:\nWhat is MCMC trying to solve: MCMC", "crumbs": [ "Course materials", "Metropolis Hastings" @@ -367,18 +367,18 @@ "href": "metropolis-hastings.html#coding-the-metropolis-hastings-in-practice", "title": "Metropolis Hastings", "section": "Coding the Metropolis Hastings in practice", - "text": "Coding the Metropolis Hastings in practice\n\nPart 1: sampling from a normal distribution\nGiven a \\(p(x) = N(2, 0.5)\\) how would draw samples using the Metropolis-Hastings algorithm?\n\nChoose proposal value\nEvaluate probability ratio\nAccept-Reject\nincrement step\n\n\nDefine probability density function\n\nimport numpy as np\nfrom scipy.stats import norm\n\ndef prob(x):\n return norm.pdf(x, 2, 0.5)\n\n\n\nDefine proposal distribution\n\ndef proposal(x):\n return norm.rvs(x, 1, 1)[0]\n\n\n\nInitialise sampler\n\ncurrent = 0.0\nsamples = [current]\n\n\n\nSample from distribution\n\nfor i in range(10000):\n prop = proposal(current)\n accept_reject = prob(prop)/prob(current)\n if accept_reject > 1:\n samples.append(prop)\n current = prop\n else:\n cutoff = np.random.rand(1)[0]\n if accept_reject > cutoff:\n samples.append(prop)\n current = prop\n\n\n\nPlot distribution\n\nimport matplotlib.pyplot as plt\nplt.hist(samples)\n\n(array([ 7., 31., 192., 669., 1232., 1248., 1012., 471., 100.,\n 22.]),\n array([0. , 0.37565749, 0.75131498, 1.12697247, 1.50262996,\n 1.87828745, 2.25394494, 2.62960243, 3.00525992, 3.38091741,\n 3.7565749 ]),\n <BarContainer object of 10 artists>)\n\n\n\n\n\n\n\n\n\n\n\nTrace plot\n\ndraw = [draw for draw, _ in enumerate(samples)]\nplt.plot(draw, samples)\n\n\n\n\n\n\n\n\n\n\n\nPart 2: determining mean and standard deviation from data", + "text": "Coding the Metropolis Hastings in practice\n\nPart 1: sampling from a normal distribution\nGiven a \\(p(x) = N(2, 0.5)\\) how would draw samples using the Metropolis-Hastings algorithm?\n\nChoose proposal value\nEvaluate probability ratio\nAccept-Reject\nincrement step\n\n\nDefine probability density function\n\nimport numpy as np\nfrom scipy.stats import norm\n\ndef prob(x):\n return norm.pdf(x, 2, 0.5)\n\n\n\nDefine proposal distribution\n\ndef proposal(x):\n return norm.rvs(x, 1, 1)[0]\n\n\n\nInitialise sampler\n\ncurrent = 0.0\nsamples = [current]\n\n\n\nSample from distribution\n\nfor i in range(10000):\n prop = proposal(current)\n accept_reject = prob(prop)/prob(current)\n if accept_reject > 1:\n samples.append(prop)\n current = prop\n else:\n cutoff = np.random.rand(1)[0]\n if accept_reject > cutoff:\n samples.append(prop)\n current = prop\n\n\n\nPlot distribution\n\nimport matplotlib.pyplot as plt\nplt.hist(samples)\n\n(array([ 6., 45., 377., 1127., 1581., 1315., 503., 105., 9.,\n 2.]),\n array([0. , 0.43380605, 0.8676121 , 1.30141815, 1.7352242 ,\n 2.16903025, 2.6028363 , 3.03664235, 3.4704484 , 3.90425445,\n 4.3380605 ]),\n <BarContainer object of 10 artists>)\n\n\n\n\n\n\n\n\n\n\n\nTrace plot\n\ndraw = [draw for draw, _ in enumerate(samples)]\nplt.plot(draw, samples)\n\n\n\n\n\n\n\n\n\n\n\nPart 2: determining mean and standard deviation from data\nI suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.\n\ndef eval_prob(data, mu, sigma):\n return np.log(norm.pdf(data,mu,sigma)).sum()\n\nHere’s also a multivariate random number generator to generate proposals.\n\ndef proposal_multi(mu, sigma):\n mean = [mu, sigma]\n cov = [[0.2, 0], [0, 0.2]] # diagonal covariance\n return np.random.multivariate_normal(mean, cov, 1)[0]\n\nHere is how you’d call the proposal function\nprop_mu, prop_sigma = proposal_multi(current_mu, current_sigma)\nthe accept_reject probability should also be updated to account for the log-probability\naccept_reject = np.exp(prob_prop - prob_current)\nYou should sample a 95% interval including a \\(\\mu = 5\\) and a \\(\\sigma = 0.2\\). This may be difficult at first to sample and I would recommend initialising at these values.", "crumbs": [ "Course materials", "Metropolis Hastings" ] }, { - "objectID": "metropolis-hastings.html#hyperspheres-in-hyperspace-dont-make-much-sense", - "href": "metropolis-hastings.html#hyperspheres-in-hyperspace-dont-make-much-sense", + "objectID": "metropolis-hastings.html#volume-in-hyperspace-dont-make-much-sense", + "href": "metropolis-hastings.html#volume-in-hyperspace-dont-make-much-sense", "title": "Metropolis Hastings", - "section": "Hyperspheres in hyperspace don’t make much sense", - "text": "Hyperspheres in hyperspace don’t make much sense\nWhy volume in higher dimensions is non-intuitive.", + "section": "Volume in hyperspace don’t make much sense", + "text": "Volume in hyperspace don’t make much sense\nGiven an n-dimensional cube of length=2 you place spheres of diameter=1 in each of the corners and then place another sphere in the middle.\nHow does the size of the middle sphere change as dimensions increase?\n\n\nRadius of middle sphere as dimension increases", "crumbs": [ "Course materials", "Metropolis Hastings" diff --git a/docs/week3.html b/docs/week3.html index df2096a..508b52c 100644 --- a/docs/week3.html +++ b/docs/week3.html @@ -5147,7 +5147,7 @@

    Example

    We’ll go through some diagnostics using arviz.

    Step one is to load some data. Rather than going through a whole modelling workflow, we’ll just take one of the example MCMC outputs that arviz provides via the function load_arviz_data.

    This particular MCMC output has to do with measurements of soil radioactivity in the USA. You can read more about it here.

    -
    +
    import arviz as az
     import numpy as np  # numpy is for numerical programming
     import xarray as xr  # xarray is for manipulating n dimensional tables; arviz uses it a lot! 
    @@ -5163,8 +5163,8 @@ 

    Example

    • - - + +
        @@ -5491,7 +5491,7 @@

        Example

        inference_library: pymc3 inference_library_version: 3.9.2 sampling_time: 18.096983432769775 - tuning_steps: 1000
  • created_at :
    2020-07-24T18:15:12.191355
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2
    sampling_time :
    18.096983432769775
    tuning_steps :
    1000

  • - - + +
      @@ -5848,20 +5848,20 @@

      Example

      created_at: 2020-07-24T18:15:12.449843 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.449843
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -6179,20 +6179,20 @@

      Example

      created_at: 2020-07-24T18:15:12.448264 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.448264
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -6520,17 +6520,17 @@

      Example

      inference_library: pymc3 inference_library_version: 3.9.2 sampling_time: 18.096983432769775 - tuning_steps: 1000
  • created_at :
    2020-07-24T18:15:12.197697
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2
    sampling_time :
    18.096983432769775
    tuning_steps :
    1000

  • - - + +
      @@ -6857,7 +6857,7 @@

      Example

      created_at: 2020-07-24T18:15:12.454586 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
    • g_coef
      PandasIndex
      PandasIndex(Index(['intercept', 'slope'], dtype='object', name='g_coef'))
  • created_at :
    2020-07-24T18:15:12.454586
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7214,20 +7214,20 @@

      Example

      created_at: 2020-07-24T18:15:12.457652 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.457652
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7543,17 +7543,17 @@

      Example

      created_at: 2020-07-24T18:15:12.458415 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.458415
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • - - + +
      @@ -7872,7 +7872,7 @@

      Example

      created_at: 2020-07-24T18:15:12.459832 arviz_version: 0.9.0 inference_library: pymc3 - inference_library_version: 3.9.2
  • created_at :
    2020-07-24T18:15:12.459832
    arviz_version :
    0.9.0
    inference_library :
    pymc3
    inference_library_version :
    3.9.2

  • @@ -8197,7 +8197,7 @@

    Example

    For example, if you click on the dropdown arrows above you will see that the group posterior contains a variable called a_county that has three dimensions called chain, draw and County. There are 85 counties and the first one is labelled 'AITKIN'.

    The function az.summary lets us look at some useful summary statistics, including \(\hat{R}\), divergent transitions and MCSE.

    The variable lp, which you can find in the group sample_stats is the model’s total log probability density. It’s not very meaningful on its own, but is useful for judging overall convergence. diverging counts the number of divergent transitions.

    -
    +
    az.summary(idata.sample_stats, var_names=["lp", "diverging"])
    /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/.venv/lib/python3.12/site-packages/arviz/stats/diagnostics.py:592: RuntimeWarning: invalid value encountered in scalar divide
    @@ -8256,7 +8256,7 @@ 

    Example

    Sometimes it’s useful to summarise individual parameters. This can be done by pointing az.summary at the group where the parameters of interest live. In this case the group is called posterior.

    -
    +
    az.summary(idata.posterior, var_names=["sigma", "g"])
    @@ -8324,7 +8324,7 @@

    Example

    In this case there were no post-warmup diverging transitions, and the \(\hat{R}\) statistic for the lp variable is pretty close to 1: great!

    Now we can start evaluating the model. First we check to see whether replicated measurements from the model’s posterior predictive distribution broadly agree with the observed measurements, using the arviz function plot_lm:

    -
    +
    az.style.use("arviz-doc")
     az.plot_lm(
         y=idata.observed_data["y"],
    @@ -8336,13 +8336,13 @@ 

    Example

    -

    +

    The function az.loo performs approximate leave-one-out cross validation, which can be useful for evaluating how well the model might make predictions. Watch out for the warning column, which can tell you if the approximation is likely to be incorrect. It’s usually a good idea to set the pointwise argument to True, as this allows for more detailed analysis at the per-observation level.

    -
    +
    az.loo(idata, var_name="y", pointwise=True)
    Computed from 2000 posterior samples and 919 observations log-likelihood matrix.
    @@ -8361,7 +8361,7 @@ 

    Example

    The function az.compare is useful for comparing different out of sample log likelihood estimates.

    -
    +
    idata.log_likelihood["fake"] = xr.DataArray(
         # generate some fake log likelihoods
         np.random.normal(0, 2, [4, 500, 919]),
    @@ -8410,7 +8410,7 @@ 

    Example

    -1027.176982 26.817161 0.000000 -0.98484 +0.985377 28.848998 0.000000 False @@ -8419,12 +8419,12 @@

    Example

    fake 1 --1797.015886 -3626.709645 -769.838905 -0.01516 -3.334786 -29.069816 +-1799.350872 +3621.756304 +772.173891 +0.014623 +3.523104 +28.906533 True log @@ -8436,7 +8436,7 @@

    Example

    The function az.plot_compare shows these results on a nice graph:

    -
    +
    az.plot_compare(comparison);
    /Users/tedgro/repos/biosustain/bayesian_statistics_for_computational_biology/.venv/lib/python3.12/site-packages/arviz/plots/backends/matplotlib/compareplot.py:87: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
    @@ -8445,7 +8445,7 @@ 

    Example

    -

    +

    diff --git a/materials/data.csv b/materials/data.csv new file mode 100644 index 0000000..7c1e00e --- /dev/null +++ b/materials/data.csv @@ -0,0 +1,100 @@ +5.133879959574598 +4.937817051434988 +5.277966400350326 +4.978073616723295 +4.768264765712152 +5.00325432375216 +5.140401818212519 +5.220085220743337 +4.9103261367115305 +4.916817929108131 +4.676230504106508 +4.681155942296321 +4.939456345611404 +5.404674908950762 +5.0000565051183505 +4.946723670230919 +5.071484229192757 +4.702765071407966 +4.990440219285413 +4.757565482906698 +4.90342085169882 +4.909731907856895 +4.7502350067852035 +4.735636087526655 +4.831835910148419 +5.083641260592571 +4.941212006801077 +4.836347213445427 +4.962098544360349 +4.686040420643092 +5.124470580419098 +5.066542799622644 +5.116490362362727 +4.978068545964261 +4.733571541168396 +4.799377562666428 +4.731238173387429 +4.927630515268987 +5.276180964383851 +4.605853804427195 +4.895306070517735 +5.125441726077501 +5.048540015619509 +5.134475090531417 +5.020340763177987 +4.744641578259603 +4.9620314387166395 +4.664576116370324 +5.244742999934165 +5.279274831080141 +5.089479340563577 +5.071743145634021 +4.942265997797657 +5.089398760768866 +5.458231752210997 +4.786148013337675 +4.7972136274342025 +4.93152718719522 +4.978778063916119 +4.801232311921279 +4.686526370326429 +5.105113175288019 +4.8016552715132645 +4.978252685586892 +4.903595625784314 +5.150389785891813 +4.957095292110073 +4.851463539442262 +5.18128245975816 +5.139918945662961 +5.020216551825798 +4.84115302966151 +4.975904991006625 +5.401969287491859 +4.811230682660985 +4.7993454128378925 +5.08505429688905 +4.6388175324133565 +5.363488556427758 +4.825053384694479 +5.304806718052644 +5.321881179050992 +5.044592651869508 +4.944522400406804 +4.9930450107605076 +5.400020830329514 +5.162848379281823 +4.851419760797657 +5.021789943039787 +4.831523320170004 +4.870328937789049 +4.927266223569667 +5.023563865935118 +4.969113692374906 +4.997160262493862 +4.747684042472069 +5.070321872431043 +4.899256187007241 +5.111542353099519 +5.201047790285543 diff --git a/materials/img/dimension_plot.png b/materials/img/dimension_plot.png new file mode 100644 index 0000000..6aeabc2 Binary files /dev/null and b/materials/img/dimension_plot.png differ diff --git a/materials/img/dimension_sphere_diagram.png b/materials/img/dimension_sphere_diagram.png new file mode 100644 index 0000000..6f0a970 Binary files /dev/null and b/materials/img/dimension_sphere_diagram.png differ diff --git a/materials/metropolis-hastings.qmd b/materials/metropolis-hastings.qmd index 0ddd6b5..32b48c3 100644 --- a/materials/metropolis-hastings.qmd +++ b/materials/metropolis-hastings.qmd @@ -6,10 +6,8 @@ Welcome back! Plan for today: -1. [Problem Statement](#why-integrals) -2. [Markov Chains](#markov-chains) -4. [Metropolis-Hastings explained](#metropolis-hastings) -5. [What are the limitations?](#metropolis-hastings-limitations) +1. [Markov Chains](#markov-chains) +2. [Metropolis-Hastings explained](#metropolis-hastings) Recap from last week: @@ -211,6 +209,52 @@ plt.plot(draw, samples) ``` ### Part 2: determining mean and standard deviation from data +I suggest using logs due to numerical issues. +Here's an example function which you can use to evaluate +the probability of the [data](https://github.com/biosustain/bayesian_statistics_for_computational_biology/blob/main/data/metropolis_hastings_example.csv). -## Hyperspheres in hyperspace don't make much sense -Why volume in higher dimensions is non-intuitive. +```{python} +def eval_prob(data, mu, sigma): + return np.log(norm.pdf(data,mu,sigma)).sum() +``` + +Here's also a multivariate random number generator to generate +proposals. + +```{python} +def proposal_multi(mu, sigma): + mean = [mu, sigma] + cov = [[0.2, 0], [0, 0.2]] # diagonal covariance + return np.random.multivariate_normal(mean, cov, 1)[0] + +``` + +Here is how you'd call the proposal function +``` +prop_mu, prop_sigma = proposal_multi(current_mu, current_sigma) +``` + +the accept_reject probability should also be updated to account +for the log-probability + +``` +accept_reject = np.exp(prob_prop - prob_current) +``` + +You should sample a 95% interval including a $\mu = 5$ +and a $\sigma = 0.2$. This may be difficult at first to sample and +I would recommend initialising at these values. + +## Volume in hyperspace don't make much sense + +Given an n-dimensional cube of length=2 you place spheres +of diameter=1 in each of the corners and then place another +sphere in the middle. + +How does the size of the middle sphere change as dimensions increase? + +![](img/dimension_sphere_diagram.png) + +### Radius of middle sphere as dimension increases + +![](img/dimension_plot.png)