diff --git a/_freeze/posts/ribosome-tunnel-extraction/index/execute-results/html.json b/_freeze/posts/ribosome-tunnel-extraction/index/execute-results/html.json index eb48750..16a3f5e 100644 --- a/_freeze/posts/ribosome-tunnel-extraction/index/execute-results/html.json +++ b/_freeze/posts/ribosome-tunnel-extraction/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "73a739f2ee0a913f8f0aa8c6c3be6c32", + "hash": "bd656730de613e222448ae4fbafbaf0e", "result": { "engine": "jupyter", - "markdown": "---\ntitle: \"3D tessellation of biomolecular cavity\"\nsubtitle: \"Protocol for analyzing the ribosome exit tunnel\"\nbibliography: references.bib\ncsl: nature.csl\nengine: \"jupyter\"\nauthor:\n - name: \"Artem Kushner\" \n email: \"rtkushner@gmail.com\"\n affiliations:\n - name: KDD Group\n url: \"https://rtviii.xyz/\"\n\n - name: \"Khanh Dao Duc\" \n email: \"kdd@math.ubc.ca\"\n affiliations:\n - name: Department of Mathematics, UBC\n url: \"https://www.math.ubc.ca/\"\n - name: Department of Computer Science, UBC\n url: \"https://www.cs.ubc.ca/\"\n\ndate: \"29 June 2024\"\ncategories: [biology, bioinformatics, surface-reconstruction, computer graphics] \n\ncallout-icon: false\n# format:\n# pdf:\n# include-in-header:\n# text: |\n# \\usepackage{amsmath}\n\nexecute:\n echo: false\n freeze: auto\n pip: [\"pyvista\", \"open3d\", \"scikit-learn\", \"mendeleev\", \"compas\", \"matplotlib\"]\n\n---\n\n\n\n\n\n\n\n\n\n\n## Summary and Background\n\nWe present a protocol to extract the surface of a biomolecular cavity for shape analysis and molecular simulations.\n\nWe apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [dao2018impact][dao2019differences]\n\n\n\n\n\n::: {layout=\"[[57,70] ]\"}\n![PDB 8OJ0. The structure of human ribosome.](./data/8OJ0.gif){fig-alt=\"\"}\n\n![The locations of the ribosome exit tunnel and the PTC.](./data/ptc_and_tunnel_illustration.png){fig-alt=\"\"}\n:::\n\n\n\n\n## Visual Protocol#\n\n\n![Schematic representation of the tunnel geometry surface reconstruction ](./data/visual_protocol.png){fig-alt=\"\"}\n\n\n\n## 0. Mole-based centerline extraction\n\n\nOne representation of the ribosome exit tunnel can be obtained via the [ MOLE ](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765717/) software, which is an implementation of the \"probe\"-based family of algorithms popular for extracting general biomolecular cavities and pores. This step is non-essential if other means of extracting the intial pointcloud are available, but for convenience this will be our starting point.\n\nA probe is a sphere of varying radius which is \"rolled\" by the algorithm against the walls of a given molecule tracing out a path and a radius.\n\n_The algorithm yields an array of varying x,y,z coordinates (henceforth, the $C$~$x,y,z$~) and radius R at each coordinate (henceforth, the $R$~$x,y,z$~)_\n\n:::{layout=\"[[1,1]]\"}\n\n![](./data/mole3.png){width=50%}\n\n![](./data/mole1.png){width=50%}\n:::\n\n## 1. Bounding Box \n\nThis step captures the subset of atoms enclosing the cavity of interest (the tunnel) from the original structure.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Parameters\n\n$R_{\\mathbf{expansion}}$: define the radius of expansion to be the value added to $R_{x,y,z}$ of the centerline at every $C_{x,y,z}$\n\n$B$: The smallest bounding box containing all of the coordinates formed by the expanded centerline \n\n$R_{\\mathbf{vdw}}$: Van der Waals radius of an atom in Ã…ngstroms. Default to the value of 2.\n\n$pc_{B}$: The pointcloud formed by applying $B$ to the original structure and widening each atom coordinate to include points withing a sphere with radius $R_{\\mathbf{vdw}}$\n\n:::\n\n1. For each $C_{x,y,z}$, capture atoms whose center belongs to the sphere formed by $R_{x,y,z} + R_{\\mathbf{expansion}}$, filter out duplicates. Call this coordinate set the ${\\textit{Centerline Expansion Atoms}}$.\n\n2. Calculate the coordinates of the bounding box $B$ around the $\\text{\\textit{Centerline Expansion Atoms}}$.\n\n3. Apply $B$ to the initial structure to extract all atoms that belong to $B$. Call this $\\mathbf{pc^{B}}$.\n\n4. Widen each coordinate (atom center) $C_{x,y,z}^{pc_{B}}$ inside $pc_{B}$ by $R_{vdw}$ to include a more realistic representation of atoms. This is done by creating a voxel subgrid whose bounding cube is between $(C_{x}^{pc_{B}}-R_{vdw},C_{y}^{pc_{B}}-R_{vdw},C_{z}^{pc_{B}}-R_{vdw})$ and $(C_{x}^{pc_{B}}+R_{vdw},C_{y}^{pc_{B}}+R_{vdw},C_{z}^{pc_{B}}+R_{vdw})$, a cube of indices centered at $C_{x,y,z}^{pc_{B}}$. The resultant coordinate set is $\\mathbf{pc_{B}}$.\n\n5. Anchor the coordinates of the $pc_{B}$ to the origin by subtracting the $\\mu(pc_{B})$ of the coordinate set from each $C_{x,y,z}^{pc_{B}}$ and then shifting each $C_{x,y,z}^{pc_{B}}$ upwards by $|\\min(x,y,z)|$. This is done to reduce the amount of empty voxel cells in the following steps, reduce compute. \n\n## 2. Voxelization\n\n\n1. Assume voxel size of $1$ in correspondence to the units of the dataset, Angstroms in our case. (Alternatively, atom-to-sphere expansion in step **1. Bounding Box** should be accordingly scaled). \n\n2. Create a boolean voxel grid with the dimensions of the ($pc_{B}$ + $1$), call this the $Grid_{index}$ (as opposed to $Grid_{coordinate}$)\n\n3. Set voxels at _index_ [$C_{x},C_{y},C_{z}$] for every $C$ in $pc_{B}$ in the $Grid_{index}$ to $1$. All other voxel are $0$.\n\n\n## 3. Inversion\n\nInvert the $Grid_{index}$ to create a representation of the _\"empty space\"_ inside the exit tunnel. \n\n## 4. DBSCAN\n\nThe aim in this step is to extract only the voxels belonging to the \"empty space\" inside the tunnel and no other. Given that we have a good idea of the Van der Waals radii of the atoms that constitute the walls and have control over the size of the voxel in the $Grid_{index}$, one method that we can apply is DBSCAN. \n\nDBSCAN is a density-based clustering non-parametric algorithm that is akin to UMAP/t-SNE. \n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## DBSCAN Parameters\n\n$eps$: The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.\n\n$min\\_nbrs$ ($min\\_samples$): If $min\\_samples$ is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. The metric to use when calculating distance between instances in a feature array.\n\n$metric$: The metric to use when calculating distance between instances in a feature array. We use the Euclidian distance.\n:::\n\n::: {#f0c511d1 .cell execution_count=1}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-2-output-1.png){}\n:::\n:::\n\n\n## 5. Interior Surface via Delaunay Triangulation\n\nThe aim of this step is to extract a point cloud containing only the voxels on the surface of convex hull enclosing the interior space of the tunnel.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Delaunay 3D parameters\n\n\n$alpha$: Controls the smoothness of the constructed alpha-shape. For a non-zero $alpha$ value, only vertices, edges, faces, or tetrahedra contained within the circumsphere (of radius $alpha$) will be output. Otherwise, only tetrahedra will be output.\n\n$tol$: Tolerance to control discarding of closely spaced points. This tolerance is specified as a fraction of the diagonal length of the bounding box of the points.\n\n$offset$: Multiplier to control the size of the initial, bounding Delaunay triangulation.\n\n:::\n\n## 6. Normal Estimation & Orientation \n\nThe aim of this step is to prepare the convex hull point cloud for the surface reconstruction algorithm. For the final mesh to be smooth and free of artifacts, this step has to assign a normal vector point outwards at each point of the convex hull thus defining a clear boundary between \"inner\" and \"outer\" space vis-a-vis the surface.\n\nOne popular method for normal estimation is a KDTree search and for smoothing their orientations a collection of tangent planes is used.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## KD Tree Search and Tangent Plane Orientation\n\n$kdtree\\_radius$: \n\n$kdtree\\_max\\_nn$:\n\n$tangent\\_planes\\_n$:\n\n:::\n\n\n## 7. Surface Reconstruction\n\n::: {#972a8143 .cell execution_count=2}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-3-output-1.png){}\n:::\n:::\n\n\n## References\n\n", + "markdown": "---\ntitle: \"3D tessellation of biomolecular cavity\"\nsubtitle: \"Protocol for analyzing the ribosome exit tunnel\"\nbibliography: \"references.bib\"\ncsl: nature.csl\nengine: \"jupyter\"\nauthor:\n - name: \"Artem Kushner\" \n email: \"rtkushner@gmail.com\"\n affiliations:\n - name: KDD Group\n url: \"https://rtviii.xyz/\"\n\n - name: \"Khanh Dao Duc\" \n email: \"kdd@math.ubc.ca\"\n affiliations:\n - name: Department of Mathematics, UBC\n url: \"https://www.math.ubc.ca/\"\n - name: Department of Computer Science, UBC\n url: \"https://www.cs.ubc.ca/\"\n\ndate: \"29 June 2024\"\ncategories: [biology, bioinformatics, surface-reconstruction, computer graphics] \n\ncallout-icon: false\n# format:\n# pdf:\n# include-in-header:\n# text: |\n# \\usepackage{amsmath}\n\nexecute:\n echo: false\n freeze: auto\n pip: [\"pyvista\", \"open3d\", \"scikit-learn\", \"mendeleev\", \"compas\", \"matplotlib\"]\n\n---\n\n\n\n\n\n\n\n\n\n\n## Summary and Background\n\nWe present a protocol to extract the surface of a biomolecular cavity for shape analysis and molecular simulations.\n\nWe apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [@dao2018impact][@dao2019differences]\n\n\n\n\n\n::: {layout=\"[[57,70] ]\"}\n![PDB 8OJ0. The structure of human ribosome.](./data/8OJ0.gif){fig-alt=\"\"}\n\n![The locations of the ribosome exit tunnel and the PTC.](./data/ptc_and_tunnel_illustration.png){fig-alt=\"\"}\n:::\n\n\n\n\n## Visual Protocol#\n\n\n![Schematic representation of the tunnel geometry surface reconstruction ](./data/visual_protocol.png){fig-alt=\"\"}\n\n\n\n## 0. Mole-based centerline extraction\n\n\nOne representation of the ribosome exit tunnel can be obtained via the [ MOLE ](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765717/) software, which is an implementation of the \"probe\"-based family of algorithms popular for extracting general biomolecular cavities and pores. This step is non-essential if other means of extracting the intial pointcloud are available, but for convenience this will be our starting point.\n\nA probe is a sphere of varying radius which is \"rolled\" by the algorithm against the walls of a given molecule tracing out a path and a radius.\n\n_The algorithm yields an array of varying x,y,z coordinates (henceforth, the $C$~$x,y,z$~) and radius R at each coordinate (henceforth, the $R$~$x,y,z$~)_\n\n:::{layout=\"[[1,1]]\"}\n\n![](./data/mole3.png){width=50%}\n\n![](./data/mole1.png){width=50%}\n:::\n\n## 1. Bounding Box \n\nThis step captures the subset of atoms enclosing the cavity of interest (the tunnel) from the original structure.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Parameters\n\n$R_{\\mathbf{expansion}}$: define the radius of expansion to be the value added to $R_{x,y,z}$ of the centerline at every $C_{x,y,z}$\n\n$B$: The smallest bounding box containing all of the coordinates formed by the expanded centerline \n\n$R_{\\mathbf{vdw}}$: Van der Waals radius of an atom in Ã…ngstroms. Default to the value of 2.\n\n$pc_{B}$: The pointcloud formed by applying $B$ to the original structure and widening each atom coordinate to include points withing a sphere with radius $R_{\\mathbf{vdw}}$\n\n:::\n\n1. For each $C_{x,y,z}$, capture atoms whose center belongs to the sphere formed by $R_{x,y,z} + R_{\\mathbf{expansion}}$, filter out duplicates. Call this coordinate set the ${\\textit{Centerline Expansion Atoms}}$.\n\n2. Calculate the coordinates of the bounding box $B$ around the $\\text{\\textit{Centerline Expansion Atoms}}$.\n\n3. Apply $B$ to the initial structure to extract all atoms that belong to $B$. Call this $\\mathbf{pc^{B}}$.\n\n4. Widen each coordinate (atom center) $C_{x,y,z}^{pc_{B}}$ inside $pc_{B}$ by $R_{vdw}$ to include a more realistic representation of atoms. This is done by creating a voxel subgrid whose bounding cube is between $(C_{x}^{pc_{B}}-R_{vdw},C_{y}^{pc_{B}}-R_{vdw},C_{z}^{pc_{B}}-R_{vdw})$ and $(C_{x}^{pc_{B}}+R_{vdw},C_{y}^{pc_{B}}+R_{vdw},C_{z}^{pc_{B}}+R_{vdw})$, a cube of indices centered at $C_{x,y,z}^{pc_{B}}$. The resultant coordinate set is $\\mathbf{pc_{B}}$.\n\n5. Anchor the coordinates of the $pc_{B}$ to the origin by subtracting the $\\mu(pc_{B})$ of the coordinate set from each $C_{x,y,z}^{pc_{B}}$ and then shifting each $C_{x,y,z}^{pc_{B}}$ upwards by $|\\min(x,y,z)|$. This is done to reduce the amount of empty voxel cells in the following steps, reduce compute. \n\n## 2. Voxelization\n\n\n1. Assume voxel size of $1$ in correspondence to the units of the dataset, Angstroms in our case. (Alternatively, atom-to-sphere expansion in step **1. Bounding Box** should be accordingly scaled). \n\n2. Create a boolean voxel grid with the dimensions of the ($pc_{B}$ + $1$), call this the $Grid_{index}$ (as opposed to $Grid_{coordinate}$)\n\n3. Set voxels at _index_ [$C_{x},C_{y},C_{z}$] for every $C$ in $pc_{B}$ in the $Grid_{index}$ to $1$. All other voxel are $0$.\n\n\n## 3. Inversion\n\nInvert the $Grid_{index}$ to create a representation of the _\"empty space\"_ inside the exit tunnel. \n\n## 4. DBSCAN\n\nThe aim in this step is to extract only the voxels belonging to the \"empty space\" inside the tunnel and no other. Given that we have a good idea of the Van der Waals radii of the atoms that constitute the walls and have control over the size of the voxel in the $Grid_{index}$, one method that we can apply is DBSCAN. \n\nDBSCAN is a density-based clustering non-parametric algorithm that is akin to UMAP/t-SNE. \n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## DBSCAN Parameters\n\n$eps$: The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.\n\n$min\\_nbrs$ ($min\\_samples$): If $min\\_samples$ is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. The metric to use when calculating distance between instances in a feature array.\n\n$metric$: The metric to use when calculating distance between instances in a feature array. We use the Euclidian distance.\n:::\n\n::: {#c34937a8 .cell execution_count=1}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-2-output-1.png){}\n:::\n:::\n\n\n## 5. Interior Surface via Delaunay Triangulation\n\nThe aim of this step is to extract a point cloud containing only the voxels on the surface of convex hull enclosing the interior space of the tunnel.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Delaunay 3D parameters\n\n\n$alpha$: Controls the smoothness of the constructed alpha-shape. For a non-zero $alpha$ value, only vertices, edges, faces, or tetrahedra contained within the circumsphere (of radius $alpha$) will be output. Otherwise, only tetrahedra will be output.\n\n$tol$: Tolerance to control discarding of closely spaced points. This tolerance is specified as a fraction of the diagonal length of the bounding box of the points.\n\n$offset$: Multiplier to control the size of the initial, bounding Delaunay triangulation.\n\n:::\n\n## 6. Normal Estimation & Orientation \n\nThe aim of this step is to prepare the convex hull point cloud for the surface reconstruction algorithm. For the final mesh to be smooth and free of artifacts, this step has to assign a normal vector point outwards at each point of the convex hull thus defining a clear boundary between \"inner\" and \"outer\" space vis-a-vis the surface.\n\nOne popular method for normal estimation is a KDTree search and for smoothing their orientations a collection of tangent planes is used.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## KD Tree Search and Tangent Plane Orientation\n\n$kdtree\\_radius$: \n\n$kdtree\\_max\\_nn$:\n\n$tangent\\_planes\\_n$:\n\n:::\n\n\n## 7. Surface Reconstruction\n\n::: {#46f64e0d .cell execution_count=2}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-3-output-1.png){}\n:::\n:::\n\n\n## References\n\n", "supporting": [ "index_files/figure-html" ], diff --git a/posts/ribosome-tunnel-extraction/index.qmd b/posts/ribosome-tunnel-extraction/index.qmd index e870b60..a48bd13 100644 --- a/posts/ribosome-tunnel-extraction/index.qmd +++ b/posts/ribosome-tunnel-extraction/index.qmd @@ -1,7 +1,7 @@ --- title: "3D tessellation of biomolecular cavity" subtitle: "Protocol for analyzing the ribosome exit tunnel" -bibliography: references.bib +bibliography: "references.bib" csl: nature.csl engine: "jupyter" author: @@ -45,7 +45,7 @@ execute: We present a protocol to extract the surface of a biomolecular cavity for shape analysis and molecular simulations. -We apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [dao2018impact][dao2019differences] +We apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [@dao2018impact; @dao2019differences] :::