@model function pPCA_ARD(X)
# Dimensionality of the problem.
@@ -1938,544 +1938,544 @@
diff --git a/pr-previews/553/search.json b/pr-previews/553/search.json
index ed6653b42..d2302d521 100644
--- a/pr-previews/553/search.json
+++ b/pr-previews/553/search.json
@@ -907,7 +907,7 @@
"href": "tutorials/11-probabilistic-pca/index.html#the-gene-expression-example",
"title": "Probabilistic Principal Component Analysis (p-PCA)",
"section": "The gene expression example",
- "text": "The gene expression example\nIn the first example, we illustrate:\n\nhow to specify the probabilistic model and\nhow to perform inference on \\(\\mathbf{W}\\), \\(\\boldsymbol{\\mu}\\) and \\(\\sigma\\) using MCMC.\n\nWe use simulated gemnome data to demonstrate these. The simulation is inspired by biological measurement of expression of genes in cells, and each cell is characterized by different gene features. While the human genome is (mostly) identical between all the cells in the body, there exist interesting differences in gene expression in different human tissues and disease conditions. One way to investigate certain diseases is to look at differences in gene expression in cells from patients and healthy controls (usually from the same tissue).\nUsually, we can assume that the changes in gene expression only affect a subset of all genes (and these can be linked to diseases in some way). One of the challenges for this kind of data is to explore the underlying structure, e.g. to make the connection between a certain state (healthy/disease) and gene expression. This becomes difficult when the dimensions is very large (up to 20000 genes across 1000s of cells). So in order to find structure in this data, it is useful to project the data into a lower dimensional space.\nRegardless of the biological background, the more abstract problem formulation is to project the data living in high-dimensional space onto a representation in lower-dimensional space where most of the variation is concentrated in the first few dimensions. We use PCA to explore underlying structure or pattern which may not necessarily be obvious from looking at the raw data itself.\n\nStep 1: configuration of dependencies\nFirst, we load the dependencies used.\n\nusing Turing\nusing ReverseDiff\nusing LinearAlgebra, FillArrays\n\n# Packages for visualization\nusing DataFrames, StatsPlots, Measures\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(1789);\n\nAll packages used in this tutorial are listed here. You can install them via using Pkg; Pkg.add(\"package_name\").\n\n\n\n\n\n\nPackage usages:\n\n\n\nWe use DataFrames for instantiating matrices, LinearAlgebra and FillArrays to perform matrix operations; Turing for model specification and MCMC sampling, ReverseDiff for setting the automatic differentiation backend when sampling. StatsPlots for visualising the resutls. , Measures for setting plot margin units. As all examples involve sampling, for reproducibility we set a fixed seed using the Random standard library.\n\n\n\n\nStep 2: Data generation\nHere, we simulate the biological gene expression problem described earlier. We simulate 60 cells, each cell has 9 gene features. This is a simplified problem with only a few cells and genes for demonstration purpose, which is not comparable to the complexity in real-life (e.g. thousands of features for each individual). Even so, spotting the structures or patterns in a 9-feature space would be a challenging task; it would be nice to reduce the dimentionality using p-PCA.\nBy design, we mannually divide the 60 cells into two groups. the first 3 gene features of the first 30 cells have mean 10, while those of the last 30 cells have mean 10. These two groups of cells differ in the expression of genes.\n\nn_genes = 9 # D\nn_cells = 60 # N\n\n# create a diagonal block like expression matrix, with some non-informative genes;\n# not all features/genes are informative, some might just not differ very much between cells)\nmat_exp = randn(n_genes, n_cells)\nmat_exp[1:(n_genes ÷ 3), 1:(n_cells ÷ 2)] .+= 10\nmat_exp[(2 * (n_genes ÷ 3) + 1):end, (n_cells ÷ 2 + 1):end] .+= 10\n\n3×30 view(::Matrix{Float64}, 7:9, 31:60) with eltype Float64:\n 11.7413 10.5735 11.3817 8.50923 … 7.38716 8.7734 11.4717\n 9.28533 11.1225 9.43421 10.8904 11.6846 10.7264 9.64063\n 9.92113 8.42122 9.59885 9.90799 9.40715 8.40956 10.2522\n\n\nTo visualize the \\((D=9) \\times (N=60)\\) data matrix mat_exp, we use the heatmap plot.\n\nheatmap(\n mat_exp;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote that:\n\nWe have made distinct feature differences between these two groups of cells (it is fairly obvious from looking at the raw data), in practice and with large enough data sets, it is often impossible to spot the differences from the raw data alone.\nIf you have some patience and compute resources you can increase the size of the dataset, or play around with the noise levels to make the problem increasingly harder.\n\n\n\nStep 3: Create the pPCA model\nHere we construct the probabilistic model pPCA(). As per the p-PCA formula, we think of each row (i.e. each gene feature) following a \\(N=60\\) dimensional multivariate normal distribution centered around the corresponding row of \\(\\mathbf{W}_{D \\times k} \\times \\mathbf{Z}_{k \\times N} + \\boldsymbol{\\mu}_{D \\times N}\\).\n\n@model function pPCA(X::AbstractMatrix{<:Real}, k::Int)\n # retrieve the dimension of input matrix X.\n N, D = size(X)\n\n # weights/loadings W\n W ~ filldist(Normal(), D, k)\n\n # latent variable z\n Z ~ filldist(Normal(), k, N)\n\n # mean offset\n μ ~ MvNormal(Eye(D))\n genes_mean = W * Z .+ reshape(μ, n_genes, 1)\n return X ~ arraydist([MvNormal(m, Eye(N)) for m in eachcol(genes_mean')])\nend;\n\nThe function pPCA() accepts:\n\nan data array \\(\\mathbf{X}\\) (with no. of instances x dimension no. of features, NB: it is a transpose of the original data matrix);\nan integer \\(k\\) which indicates the dimension of the latent space (the space the original feature matrix is projected onto).\n\nSpecifically:\n\nit first extracts the dimension \\(D\\) and number of instances \\(N\\) of the input matrix;\ndraw samples of each entries of the projection matrix \\(\\mathbf{W}\\) from a standard normal;\ndraw samples of the latent variable \\(\\mathbf{Z}_{k \\times N}\\) from an MND;\ndraw samples of the offset \\(\\boldsymbol{\\mu}\\) from an MND, assuming uniform offset for all instances;\nFinally, we iterate through each gene dimension in \\(\\mathbf{X}\\), and define an MND for the sampling distribution (i.e. likelihood).\n\n\n\nStep 4: Sampling-based inference of the pPCA model\nHere we aim to perform MCMC sampling to infer the projection matrix \\(\\mathbf{W}_{D \\times k}\\), the latent variable matrix \\(\\mathbf{Z}_{k \\times N}\\), and the offsets \\(\\boldsymbol{\\mu}_{N \\times 1}\\).\nWe run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain. You are free to try different samplers.\n\nsetprogress!(false)\n\n\nk = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice\nppca = pPCA(mat_exp', k) # instantiate the probabilistic model\nchain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nThe samples are saved in the Chains struct chain_ppca, whose shape can be checked:\n\nsize(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)\n\n(500, 159, 1)\n\n\nThe Chains struct chain_ppca also contains the sampling info such as r-hat, ess, mean estimates, etc. You can print it to check these quantities.\n\n\nStep 5: posterior predictive checks\nWe try to reconstruct the input data using the posterior mean as parameter estimates. We first retrieve the samples for the projection matrix W from chain_ppca. This can be done using the Julia group(chain, parameter_name) function. Then we calculate the mean value for each element in \\(W\\), averaging over the whole chain of samples.\n\n# Extract parameter estimates for predicting x - mean of posterior\nW = reshape(mean(group(chain_ppca, :W))[:, 2], (n_genes, k))\nZ = reshape(mean(group(chain_ppca, :Z))[:, 2], (k, n_cells))\nμ = mean(group(chain_ppca, :μ))[:, 2]\n\nmat_rec = W * Z .+ repeat(μ; inner=(1, n_cells))\n\n9×60 Matrix{Float64}:\n 6.91219 6.83052 7.5909 … 3.76186 2.63971 1.49009\n 7.3069 7.22114 8.02146 4.00178 2.81851 1.60901\n 6.76656 6.68951 7.40406 3.78882 2.73783 1.65666\n -0.0379137 -0.0404184 -0.0131903 -0.127216 -0.1722 -0.212237\n 0.321033 0.3136 0.387767 0.0436086 -0.0719438 -0.182642\n -0.0218712 -0.0231275 -0.0112274 … -0.0699528 -0.0877641 -0.105697\n 2.8036 2.88933 2.08779 6.10451 7.29146 8.50235\n 2.96181 3.04241 2.29292 6.07293 7.1778 8.31124\n 2.65166 2.73713 1.9412 5.9486 7.12336 8.32671\n\n\n\nheatmap(\n mat_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can quantitatively check the absolute magnitudes of the column average of the gap between mat_exp and mat_rec:\n\ndiff_matrix = mat_exp .- mat_rec\nfor col in 4:6\n @assert abs(mean(diff_matrix[:, col])) <= 0.5\nend\n\nWe observe that, using posterior mean, the recovered data matrix mat_rec has values align with the original data matrix - particularly the same pattern in the first and last 3 gene features are captured, which implies the inference and p-PCA decomposition are successful. This is satisfying as we have just projected the original 9-dimensional space onto a 2-dimensional space - some info has been cut off in the projection process, but we haven’t lost any important info, e.g. the key differences between the two groups. The is the desirable property of PCA: it picks up the principal axes along which most of the (original) data variations cluster, and remove those less relevant. If we choose the reduced space dimension \\(k\\) to be exactly \\(D\\) (the original data dimension), we would recover exactly the same original data matrix mat_exp, i.e. all information will be preserved.\nNow we have represented the original high-dimensional data in two dimensions, without lossing the key information about the two groups of cells in the input data. Finally, the benefits of performing PCA is to analyse and visualise the dimension-reduced data in the projected, low-dimensional space. we save the dimension-reduced matrix \\(\\mathbf{Z}\\) as a DataFrame, rename the columns and visualise the first two dimensions.\n\ndf_pca = DataFrame(Z', :auto)\nrename!(df_pca, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pca[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\n\nscatter(df_pca[:, :z1], df_pca[:, :z2]; xlabel=\"z1\", ylabel=\"z2\", group=df_pca[:, :type])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe see the two groups are well separated in this 2-D space. As an unsupervised learning method, performing PCA on this dataset gives membership for each cell instance. Another way to put it: 2 dimensions is enough to capture the main structure of the data.\n\n\nFurther extension: automatic choice of the number of principal components with ARD\nA direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [3].\nFor p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [4]. Essentially, we are using a specific prior over the factor loadings \\(\\mathbf{W}\\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \\(\\alpha\\). Here, smaller values of \\(\\alpha\\) correspond to more important components. You can find more details about this in e.g. [5].\n\n@model function pPCA_ARD(X)\n # Dimensionality of the problem.\n N, D = size(X)\n\n # latent variable Z\n Z ~ filldist(Normal(), D, N)\n\n # weights/loadings w with Automatic Relevance Determination part\n α ~ filldist(Gamma(1.0, 1.0), D)\n W ~ filldist(MvNormal(zeros(D), 1.0 ./ sqrt.(α)), D)\n\n mu = (W' * Z)'\n\n tau ~ Gamma(1.0, 1.0)\n return X ~ arraydist([MvNormal(m, 1.0 / sqrt(tau)) for m in eachcol(mu)])\nend;\n\nInstead of drawing samples of each entry in \\(\\mathbf{W}\\) from a standard normal, this time we repeatedly draw \\(D\\) samples from the \\(D\\)-dimensional MND, forming a \\(D \\times D\\) matrix \\(\\mathbf{W}\\). This matrix is a function of \\(\\alpha\\) as the samples are drawn from the MND parameterized by \\(\\alpha\\). We also introduce a hyper-parameter \\(\\tau\\) which is the precision in the sampling distribution. We also re-paramterise the sampling distribution, i.e. each dimension across all instances is a 60-dimensional multivariate normal distribution. Re-parameterisation can sometimes accelrate the sampling process.\nWe instantiate the model and ask Turing to sample from it using NUTS sampler. The sample trajectories of \\(\\alpha\\) is plotted using the plot function from the package StatsPlots.\n\nppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model\nchain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling\nplot(group(chain_ppcaARD, :α); margin=6.0mm)\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAgain, we do some inference diagnostics. Here we look at the convergence of the chains for the \\(α\\) parameter. This parameter determines the relevance of individual components. We see that the chains have converged and the posterior of the \\(\\alpha\\) parameters is centered around much smaller values in two instances. In the following, we will use the mean of the small values to select the relevant dimensions (remember that, smaller values of \\(\\alpha\\) correspond to more important components.). We can clearly see from the values of \\(\\alpha\\) that there should be two dimensions (corresponding to \\(\\bar{\\alpha}_3=\\bar{\\alpha}_5≈0.05\\)) for this dataset.\n\n# Extract parameter mean estimates of the posterior\nW = permutedims(reshape(mean(group(chain_ppcaARD, :W))[:, 2], (n_genes, n_genes)))\nZ = permutedims(reshape(mean(group(chain_ppcaARD, :Z))[:, 2], (n_genes, n_cells)))'\nα = mean(group(chain_ppcaARD, :α))[:, 2]\nplot(α; label=\"α\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can inspect α to see which elements are small (i.e. high relevance). To do this, we first sort α using sortperm() (in ascending order by default), and record the indices of the first two smallest values (among the \\(D=9\\) \\(\\alpha\\) values). After picking the desired principal directions, we extract the corresponding subset loading vectors from \\(\\mathbf{W}\\), and the corresponding dimensions of \\(\\mathbf{Z}\\). We obtain a posterior predicted matrix \\(\\mathbf{X} \\in \\mathbb{R}^{2 \\times 60}\\) as the product of the two sub-matrices, and compare the recovered info with the original matrix.\n\nα_indices = sortperm(α)[1:2]\nk = size(α_indices)[1]\nX_rec = W[:, α_indices] * Z[α_indices, :]\n\ndf_rec = DataFrame(X_rec', :auto)\nheatmap(\n X_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe observe that, the data in the original space is recovered with key information, the distinct feature values in the first and last three genes for the two cell groups, are preserved. We can also examine the data in the dimension-reduced space, i.e. the selected components (rows) in \\(\\mathbf{Z}\\).\n\ndf_pro = DataFrame(Z[α_indices, :]', :auto)\nrename!(df_pro, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pro[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\nscatter(\n df_pro[:, 1], df_pro[:, 2]; xlabel=\"z1\", ylabel=\"z2\", color=df_pro[:, \"type\"], label=\"\"\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThis plot is very similar to the low-dimensional plot above, with the relevant dimensions chosen based on the values of \\(α\\) via ARD. When you are in doubt about the number of dimensions to project onto, ARD might provide an answer to that question.",
+ "text": "The gene expression example\nIn the first example, we illustrate:\n\nhow to specify the probabilistic model and\nhow to perform inference on \\(\\mathbf{W}\\), \\(\\boldsymbol{\\mu}\\) and \\(\\sigma\\) using MCMC.\n\nWe use simulated gemnome data to demonstrate these. The simulation is inspired by biological measurement of expression of genes in cells, and each cell is characterized by different gene features. While the human genome is (mostly) identical between all the cells in the body, there exist interesting differences in gene expression in different human tissues and disease conditions. One way to investigate certain diseases is to look at differences in gene expression in cells from patients and healthy controls (usually from the same tissue).\nUsually, we can assume that the changes in gene expression only affect a subset of all genes (and these can be linked to diseases in some way). One of the challenges for this kind of data is to explore the underlying structure, e.g. to make the connection between a certain state (healthy/disease) and gene expression. This becomes difficult when the dimensions is very large (up to 20000 genes across 1000s of cells). So in order to find structure in this data, it is useful to project the data into a lower dimensional space.\nRegardless of the biological background, the more abstract problem formulation is to project the data living in high-dimensional space onto a representation in lower-dimensional space where most of the variation is concentrated in the first few dimensions. We use PCA to explore underlying structure or pattern which may not necessarily be obvious from looking at the raw data itself.\n\nStep 1: configuration of dependencies\nFirst, we load the dependencies used.\n\nusing Turing\nusing ReverseDiff\nusing LinearAlgebra, FillArrays\n\n# Packages for visualization\nusing DataFrames, StatsPlots, Measures\n\n# Set a seed for reproducibility.\nusing Random\nRandom.seed!(1789);\n\nAll packages used in this tutorial are listed here. You can install them via using Pkg; Pkg.add(\"package_name\").\n\n\n\n\n\n\nPackage usages:\n\n\n\nWe use DataFrames for instantiating matrices, LinearAlgebra and FillArrays to perform matrix operations; Turing for model specification and MCMC sampling, ReverseDiff for setting the automatic differentiation backend when sampling. StatsPlots for visualising the resutls. , Measures for setting plot margin units. As all examples involve sampling, for reproducibility we set a fixed seed using the Random standard library.\n\n\n\n\nStep 2: Data generation\nHere, we simulate the biological gene expression problem described earlier. We simulate 60 cells, each cell has 9 gene features. This is a simplified problem with only a few cells and genes for demonstration purpose, which is not comparable to the complexity in real-life (e.g. thousands of features for each individual). Even so, spotting the structures or patterns in a 9-feature space would be a challenging task; it would be nice to reduce the dimentionality using p-PCA.\nBy design, we mannually divide the 60 cells into two groups. the first 3 gene features of the first 30 cells have mean 10, while those of the last 30 cells have mean 10. These two groups of cells differ in the expression of genes.\n\nn_genes = 9 # D\nn_cells = 60 # N\n\n# create a diagonal block like expression matrix, with some non-informative genes;\n# not all features/genes are informative, some might just not differ very much between cells)\nmat_exp = randn(n_genes, n_cells)\nmat_exp[1:(n_genes ÷ 3), 1:(n_cells ÷ 2)] .+= 10\nmat_exp[(2 * (n_genes ÷ 3) + 1):end, (n_cells ÷ 2 + 1):end] .+= 10\n\n3×30 view(::Matrix{Float64}, 7:9, 31:60) with eltype Float64:\n 11.7413 10.5735 11.3817 8.50923 … 7.38716 8.7734 11.4717\n 9.28533 11.1225 9.43421 10.8904 11.6846 10.7264 9.64063\n 9.92113 8.42122 9.59885 9.90799 9.40715 8.40956 10.2522\n\n\nTo visualize the \\((D=9) \\times (N=60)\\) data matrix mat_exp, we use the heatmap plot.\n\nheatmap(\n mat_exp;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote that:\n\nWe have made distinct feature differences between these two groups of cells (it is fairly obvious from looking at the raw data), in practice and with large enough data sets, it is often impossible to spot the differences from the raw data alone.\nIf you have some patience and compute resources you can increase the size of the dataset, or play around with the noise levels to make the problem increasingly harder.\n\n\n\nStep 3: Create the pPCA model\nHere we construct the probabilistic model pPCA(). As per the p-PCA formula, we think of each row (i.e. each gene feature) following a \\(N=60\\) dimensional multivariate normal distribution centered around the corresponding row of \\(\\mathbf{W}_{D \\times k} \\times \\mathbf{Z}_{k \\times N} + \\boldsymbol{\\mu}_{D \\times N}\\).\n\n@model function pPCA(X::AbstractMatrix{<:Real}, k::Int)\n # retrieve the dimension of input matrix X.\n N, D = size(X)\n\n # weights/loadings W\n W ~ filldist(Normal(), D, k)\n\n # latent variable z\n Z ~ filldist(Normal(), k, N)\n\n # mean offset\n μ ~ MvNormal(Eye(D))\n genes_mean = W * Z .+ reshape(μ, n_genes, 1)\n return X ~ arraydist([MvNormal(m, Eye(N)) for m in eachcol(genes_mean')])\nend;\n\nThe function pPCA() accepts:\n\nan data array \\(\\mathbf{X}\\) (with no. of instances x dimension no. of features, NB: it is a transpose of the original data matrix);\nan integer \\(k\\) which indicates the dimension of the latent space (the space the original feature matrix is projected onto).\n\nSpecifically:\n\nit first extracts the dimension \\(D\\) and number of instances \\(N\\) of the input matrix;\ndraw samples of each entries of the projection matrix \\(\\mathbf{W}\\) from a standard normal;\ndraw samples of the latent variable \\(\\mathbf{Z}_{k \\times N}\\) from an MND;\ndraw samples of the offset \\(\\boldsymbol{\\mu}\\) from an MND, assuming uniform offset for all instances;\nFinally, we iterate through each gene dimension in \\(\\mathbf{X}\\), and define an MND for the sampling distribution (i.e. likelihood).\n\n\n\nStep 4: Sampling-based inference of the pPCA model\nHere we aim to perform MCMC sampling to infer the projection matrix \\(\\mathbf{W}_{D \\times k}\\), the latent variable matrix \\(\\mathbf{Z}_{k \\times N}\\), and the offsets \\(\\boldsymbol{\\mu}_{N \\times 1}\\).\nWe run the inference using the NUTS sampler, of which the chain length is set to be 500, target accept ratio 0.65 and initial stepsize 0.1. By default, the NUTS sampler samples 1 chain. You are free to try different samplers.\n\nsetprogress!(false)\n\n\nk = 2 # k is the dimension of the projected space, i.e. the number of principal components/axes of choice\nppca = pPCA(mat_exp', k) # instantiate the probabilistic model\nchain_ppca = sample(ppca, NUTS(;adtype=AutoReverseDiff()), 500);\n\n┌ Info: Found initial step size\n└ ϵ = 0.4\n\n\nThe samples are saved in the Chains struct chain_ppca, whose shape can be checked:\n\nsize(chain_ppca) # (no. of iterations, no. of vars, no. of chains) = (500, 159, 1)\n\n(500, 159, 1)\n\n\nThe Chains struct chain_ppca also contains the sampling info such as r-hat, ess, mean estimates, etc. You can print it to check these quantities.\n\n\nStep 5: posterior predictive checks\nWe try to reconstruct the input data using the posterior mean as parameter estimates. We first retrieve the samples for the projection matrix W from chain_ppca. This can be done using the Julia group(chain, parameter_name) function. Then we calculate the mean value for each element in \\(W\\), averaging over the whole chain of samples.\n\n# Extract parameter estimates for predicting x - mean of posterior\nW = reshape(mean(group(chain_ppca, :W))[:, 2], (n_genes, k))\nZ = reshape(mean(group(chain_ppca, :Z))[:, 2], (k, n_cells))\nμ = mean(group(chain_ppca, :μ))[:, 2]\n\nmat_rec = W * Z .+ repeat(μ; inner=(1, n_cells))\n\n9×60 Matrix{Float64}:\n 6.91219 6.83052 7.5909 … 3.76186 2.63971 1.49009\n 7.3069 7.22114 8.02146 4.00178 2.81851 1.60901\n 6.76656 6.68951 7.40406 3.78882 2.73783 1.65666\n -0.0379137 -0.0404184 -0.0131903 -0.127216 -0.1722 -0.212237\n 0.321033 0.3136 0.387767 0.0436086 -0.0719438 -0.182642\n -0.0218712 -0.0231275 -0.0112274 … -0.0699528 -0.0877641 -0.105697\n 2.8036 2.88933 2.08779 6.10451 7.29146 8.50235\n 2.96181 3.04241 2.29292 6.07293 7.1778 8.31124\n 2.65166 2.73713 1.9412 5.9486 7.12336 8.32671\n\n\n\nheatmap(\n mat_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can quantitatively check the absolute magnitudes of the column average of the gap between mat_exp and mat_rec:\n\ndiff_matrix = mat_exp .- mat_rec\nfor col in 4:6\n @assert abs(mean(diff_matrix[:, col])) <= 0.5\nend\n\nWe observe that, using posterior mean, the recovered data matrix mat_rec has values align with the original data matrix - particularly the same pattern in the first and last 3 gene features are captured, which implies the inference and p-PCA decomposition are successful. This is satisfying as we have just projected the original 9-dimensional space onto a 2-dimensional space - some info has been cut off in the projection process, but we haven’t lost any important info, e.g. the key differences between the two groups. The is the desirable property of PCA: it picks up the principal axes along which most of the (original) data variations cluster, and remove those less relevant. If we choose the reduced space dimension \\(k\\) to be exactly \\(D\\) (the original data dimension), we would recover exactly the same original data matrix mat_exp, i.e. all information will be preserved.\nNow we have represented the original high-dimensional data in two dimensions, without lossing the key information about the two groups of cells in the input data. Finally, the benefits of performing PCA is to analyse and visualise the dimension-reduced data in the projected, low-dimensional space. we save the dimension-reduced matrix \\(\\mathbf{Z}\\) as a DataFrame, rename the columns and visualise the first two dimensions.\n\ndf_pca = DataFrame(Z', :auto)\nrename!(df_pca, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pca[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\n\nscatter(df_pca[:, :z1], df_pca[:, :z2]; xlabel=\"z1\", ylabel=\"z2\", group=df_pca[:, :type])\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe see the two groups are well separated in this 2-D space. As an unsupervised learning method, performing PCA on this dataset gives membership for each cell instance. Another way to put it: 2 dimensions is enough to capture the main structure of the data.\n\n\nFurther extension: automatic choice of the number of principal components with ARD\nA direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained 5.\nFor p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features 6. Essentially, we are using a specific prior over the factor loadings \\(\\mathbf{W}\\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \\(\\alpha\\). Here, smaller values of \\(\\alpha\\) correspond to more important components. You can find more details about this in, for example, Bishop (2006) 7.\n\n@model function pPCA_ARD(X)\n # Dimensionality of the problem.\n N, D = size(X)\n\n # latent variable Z\n Z ~ filldist(Normal(), D, N)\n\n # weights/loadings w with Automatic Relevance Determination part\n α ~ filldist(Gamma(1.0, 1.0), D)\n W ~ filldist(MvNormal(zeros(D), 1.0 ./ sqrt.(α)), D)\n\n mu = (W' * Z)'\n\n tau ~ Gamma(1.0, 1.0)\n return X ~ arraydist([MvNormal(m, 1.0 / sqrt(tau)) for m in eachcol(mu)])\nend;\n\nInstead of drawing samples of each entry in \\(\\mathbf{W}\\) from a standard normal, this time we repeatedly draw \\(D\\) samples from the \\(D\\)-dimensional MND, forming a \\(D \\times D\\) matrix \\(\\mathbf{W}\\). This matrix is a function of \\(\\alpha\\) as the samples are drawn from the MND parameterized by \\(\\alpha\\). We also introduce a hyper-parameter \\(\\tau\\) which is the precision in the sampling distribution. We also re-paramterise the sampling distribution, i.e. each dimension across all instances is a 60-dimensional multivariate normal distribution. Re-parameterisation can sometimes accelrate the sampling process.\nWe instantiate the model and ask Turing to sample from it using NUTS sampler. The sample trajectories of \\(\\alpha\\) is plotted using the plot function from the package StatsPlots.\n\nppca_ARD = pPCA_ARD(mat_exp') # instantiate the probabilistic model\nchain_ppcaARD = sample(ppca_ARD, NUTS(;adtype=AutoReverseDiff()), 500) # sampling\nplot(group(chain_ppcaARD, :α); margin=6.0mm)\n\n┌ Info: Found initial step size\n└ ϵ = 0.05\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAgain, we do some inference diagnostics. Here we look at the convergence of the chains for the \\(α\\) parameter. This parameter determines the relevance of individual components. We see that the chains have converged and the posterior of the \\(\\alpha\\) parameters is centered around much smaller values in two instances. In the following, we will use the mean of the small values to select the relevant dimensions (remember that, smaller values of \\(\\alpha\\) correspond to more important components.). We can clearly see from the values of \\(\\alpha\\) that there should be two dimensions (corresponding to \\(\\bar{\\alpha}_3=\\bar{\\alpha}_5≈0.05\\)) for this dataset.\n\n# Extract parameter mean estimates of the posterior\nW = permutedims(reshape(mean(group(chain_ppcaARD, :W))[:, 2], (n_genes, n_genes)))\nZ = permutedims(reshape(mean(group(chain_ppcaARD, :Z))[:, 2], (n_genes, n_cells)))'\nα = mean(group(chain_ppcaARD, :α))[:, 2]\nplot(α; label=\"α\")\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe can inspect α to see which elements are small (i.e. high relevance). To do this, we first sort α using sortperm() (in ascending order by default), and record the indices of the first two smallest values (among the \\(D=9\\) \\(\\alpha\\) values). After picking the desired principal directions, we extract the corresponding subset loading vectors from \\(\\mathbf{W}\\), and the corresponding dimensions of \\(\\mathbf{Z}\\). We obtain a posterior predicted matrix \\(\\mathbf{X} \\in \\mathbb{R}^{2 \\times 60}\\) as the product of the two sub-matrices, and compare the recovered info with the original matrix.\n\nα_indices = sortperm(α)[1:2]\nk = size(α_indices)[1]\nX_rec = W[:, α_indices] * Z[α_indices, :]\n\ndf_rec = DataFrame(X_rec', :auto)\nheatmap(\n X_rec;\n c=:summer,\n colors=:value,\n xlabel=\"cell number\",\n yflip=true,\n ylabel=\"gene feature\",\n yticks=1:9,\n colorbar_title=\"expression\",\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nWe observe that, the data in the original space is recovered with key information, the distinct feature values in the first and last three genes for the two cell groups, are preserved. We can also examine the data in the dimension-reduced space, i.e. the selected components (rows) in \\(\\mathbf{Z}\\).\n\ndf_pro = DataFrame(Z[α_indices, :]', :auto)\nrename!(df_pro, Symbol.([\"z\" * string(i) for i in collect(1:k)]))\ndf_pro[!, :type] = repeat([1, 2]; inner=n_cells ÷ 2)\nscatter(\n df_pro[:, 1], df_pro[:, 2]; xlabel=\"z1\", ylabel=\"z2\", color=df_pro[:, \"type\"], label=\"\"\n)\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nThis plot is very similar to the low-dimensional plot above, with the relevant dimensions chosen based on the values of \\(α\\) via ARD. When you are in doubt about the number of dimensions to project onto, ARD might provide an answer to that question.",
"crumbs": [
"Get Started",
"Users",
@@ -933,7 +933,7 @@
"href": "tutorials/11-probabilistic-pca/index.html#footnotes",
"title": "Probabilistic Principal Component Analysis (p-PCA)",
"section": "Footnotes",
- "text": "Footnotes\n\n\nGilbert Strang, Introduction to Linear Algebra, 5th Ed., Wellesley-Cambridge Press, 2016.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎",
+ "text": "Footnotes\n\n\nGilbert Strang, Introduction to Linear Algebra, 5th Ed., Wellesley-Cambridge Press, 2016.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nProbabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎\nGareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎\nDavid Wipf, Srikantan Nagarajan, A New View of Automatic Relevance Determination, NIPS 2007.↩︎\nChristopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.↩︎",
"crumbs": [
"Get Started",
"Users",
diff --git a/pr-previews/553/sitemap.xml b/pr-previews/553/sitemap.xml
index 770967542..a982293b3 100644
--- a/pr-previews/553/sitemap.xml
+++ b/pr-previews/553/sitemap.xml
@@ -2,154 +2,154 @@
Note that:
@@ -1276,55 +1276,55 @@We can quantitatively check the absolute magnitudes of the column average of the gap between mat_exp
and mat_rec
:
We see the two groups are well separated in this 2-D space. As an unsupervised learning method, performing PCA on this dataset gives membership for each cell instance. Another way to put it: 2 dimensions is enough to capture the main structure of the data.
A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [3].
-For p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [4]. Essentially, we are using a specific prior over the factor loadings \(\mathbf{W}\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \(\alpha\). Here, smaller values of \(\alpha\) correspond to more important components. You can find more details about this in e.g. [5].
+A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained 5.
+For p-PCA, this can be done in an elegant and principled way, using a technique called Automatic Relevance Determination (ARD). ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features 6. Essentially, we are using a specific prior over the factor loadings \(\mathbf{W}\) that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter \(\alpha\). Here, smaller values of \(\alpha\) correspond to more important components. You can find more details about this in, for example, Bishop (2006) 7.
@model function pPCA_ARD(X)
# Dimensionality of the problem.
@@ -1938,544 +1938,544 @@
We can inspect α
to see which elements are small (i.e. high relevance). To do this, we first sort α
using sortperm()
(in ascending order by default), and record the indices of the first two smallest values (among the \(D=9\) \(\alpha\) values). After picking the desired principal directions, we extract the corresponding subset loading vectors from \(\mathbf{W}\), and the corresponding dimensions of \(\mathbf{Z}\). We obtain a posterior predicted matrix \(\mathbf{X} \in \mathbb{R}^{2 \times 60}\) as the product of the two sub-matrices, and compare the recovered info with the original matrix.
We observe that, the data in the original space is recovered with key information, the distinct feature values in the first and last three genes for the two cell groups, are preserved. We can also examine the data in the dimension-reduced space, i.e. the selected components (rows) in \(\mathbf{Z}\).
@@ -3055,102 +3055,102 @@Gareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎
Probabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎
Probabilistic PCA by TensorFlow, “https://www.tensorflow.org/probability/examples/Probabilistic_PCA”.↩︎
Gareth M. James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical Learning, Springer, 2013.↩︎
David Wipf, Srikantan Nagarajan, A New View of Automatic Relevance Determination, NIPS 2007.↩︎
Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.↩︎