-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
6 additions
and
5 deletions.
There are no files selected for viewing
4 changes: 2 additions & 2 deletions
4
_freeze/posts/ribosome-tunnel-extraction/index/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
{ | ||
"hash": "73a739f2ee0a913f8f0aa8c6c3be6c32", | ||
"hash": "bd656730de613e222448ae4fbafbaf0e", | ||
"result": { | ||
"engine": "jupyter", | ||
"markdown": "---\ntitle: \"3D tessellation of biomolecular cavity\"\nsubtitle: \"Protocol for analyzing the ribosome exit tunnel\"\nbibliography: references.bib\ncsl: nature.csl\nengine: \"jupyter\"\nauthor:\n - name: \"Artem Kushner\" \n email: \"[email protected]\"\n affiliations:\n - name: KDD Group\n url: \"https://rtviii.xyz/\"\n\n - name: \"Khanh Dao Duc\" \n email: \"[email protected]\"\n affiliations:\n - name: Department of Mathematics, UBC\n url: \"https://www.math.ubc.ca/\"\n - name: Department of Computer Science, UBC\n url: \"https://www.cs.ubc.ca/\"\n\ndate: \"29 June 2024\"\ncategories: [biology, bioinformatics, surface-reconstruction, computer graphics] \n\ncallout-icon: false\n# format:\n# pdf:\n# include-in-header:\n# text: |\n# \\usepackage{amsmath}\n\nexecute:\n echo: false\n freeze: auto\n pip: [\"pyvista\", \"open3d\", \"scikit-learn\", \"mendeleev\", \"compas\", \"matplotlib\"]\n\n---\n\n\n\n\n\n\n\n\n\n\n## Summary and Background\n\nWe present a protocol to extract the surface of a biomolecular cavity for shape analysis and molecular simulations.\n\nWe apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [dao2018impact][dao2019differences]\n\n\n<!--\nIt is central to the protein synthesis in all living organisms. The assembly of most proteins happens at the location known as the __Peptidyl Transferse Center__, where the peptide chain of any given protein is extended with another amino acid like a chain of beads, one bead a time. \n\nThe built protein exits the ribosome through a channel known as the __Ribosome Exit Tunnel__. The interior geometry of the tunnel influences the escape speed of proteins and can be blocked by ligands and antibiotics making it a crucial site for all processes of life.\n\n_Here, we are interested in obtaining a representation of the Exit Tunel's geometry and describe a protocol for doing so._\n-->\n\n\n::: {layout=\"[[57,70] ]\"}\n![PDB 8OJ0. The structure of human ribosome.](./data/8OJ0.gif){fig-alt=\"\"}\n\n![The locations of the ribosome exit tunnel and the PTC.](./data/ptc_and_tunnel_illustration.png){fig-alt=\"\"}\n:::\n\n\n\n\n## Visual Protocol#\n\n\n![Schematic representation of the tunnel geometry surface reconstruction ](./data/visual_protocol.png){fig-alt=\"\"}\n\n\n\n## 0. Mole-based centerline extraction\n\n\nOne representation of the ribosome exit tunnel can be obtained via the [ MOLE ](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765717/) software, which is an implementation of the \"probe\"-based family of algorithms popular for extracting general biomolecular cavities and pores. This step is non-essential if other means of extracting the intial pointcloud are available, but for convenience this will be our starting point.\n\nA probe is a sphere of varying radius which is \"rolled\" by the algorithm against the walls of a given molecule tracing out a path and a radius.\n\n_The algorithm yields an array of varying x,y,z coordinates (henceforth, the $C$~$x,y,z$~) and radius R at each coordinate (henceforth, the $R$~$x,y,z$~)_\n\n:::{layout=\"[[1,1]]\"}\n\n![](./data/mole3.png){width=50%}\n\n![](./data/mole1.png){width=50%}\n:::\n\n## 1. Bounding Box \n\nThis step captures the subset of atoms enclosing the cavity of interest (the tunnel) from the original structure.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Parameters\n\n$R_{\\mathbf{expansion}}$: define the radius of expansion to be the value added to $R_{x,y,z}$ of the centerline at every $C_{x,y,z}$\n\n$B$: The smallest bounding box containing all of the coordinates formed by the expanded centerline \n\n$R_{\\mathbf{vdw}}$: Van der Waals radius of an atom in Ångstroms. Default to the value of 2.\n\n$pc_{B}$: The pointcloud formed by applying $B$ to the original structure and widening each atom coordinate to include points withing a sphere with radius $R_{\\mathbf{vdw}}$\n\n:::\n\n1. For each $C_{x,y,z}$, capture atoms whose center belongs to the sphere formed by $R_{x,y,z} + R_{\\mathbf{expansion}}$, filter out duplicates. Call this coordinate set the ${\\textit{Centerline Expansion Atoms}}$.\n\n2. Calculate the coordinates of the bounding box $B$ around the $\\text{\\textit{Centerline Expansion Atoms}}$.\n\n3. Apply $B$ to the initial structure to extract all atoms that belong to $B$. Call this $\\mathbf{pc^{B}}$.\n\n4. Widen each coordinate (atom center) $C_{x,y,z}^{pc_{B}}$ inside $pc_{B}$ by $R_{vdw}$ to include a more realistic representation of atoms. This is done by creating a voxel subgrid whose bounding cube is between $(C_{x}^{pc_{B}}-R_{vdw},C_{y}^{pc_{B}}-R_{vdw},C_{z}^{pc_{B}}-R_{vdw})$ and $(C_{x}^{pc_{B}}+R_{vdw},C_{y}^{pc_{B}}+R_{vdw},C_{z}^{pc_{B}}+R_{vdw})$, a cube of indices centered at $C_{x,y,z}^{pc_{B}}$. The resultant coordinate set is $\\mathbf{pc_{B}}$.\n\n5. Anchor the coordinates of the $pc_{B}$ to the origin by subtracting the $\\mu(pc_{B})$ of the coordinate set from each $C_{x,y,z}^{pc_{B}}$ and then shifting each $C_{x,y,z}^{pc_{B}}$ upwards by $|\\min(x,y,z)|$. This is done to reduce the amount of empty voxel cells in the following steps, reduce compute. \n\n## 2. Voxelization\n\n\n1. Assume voxel size of $1$ in correspondence to the units of the dataset, Angstroms in our case. (Alternatively, atom-to-sphere expansion in step **1. Bounding Box** should be accordingly scaled). \n\n2. Create a boolean voxel grid with the dimensions of the ($pc_{B}$ + $1$), call this the $Grid_{index}$ (as opposed to $Grid_{coordinate}$)\n\n3. Set voxels at _index_ [$C_{x},C_{y},C_{z}$] for every $C$ in $pc_{B}$ in the $Grid_{index}$ to $1$. All other voxel are $0$.\n\n\n## 3. Inversion\n\nInvert the $Grid_{index}$ to create a representation of the _\"empty space\"_ inside the exit tunnel. \n\n## 4. DBSCAN\n\nThe aim in this step is to extract only the voxels belonging to the \"empty space\" inside the tunnel and no other. Given that we have a good idea of the Van der Waals radii of the atoms that constitute the walls and have control over the size of the voxel in the $Grid_{index}$, one method that we can apply is DBSCAN. \n\nDBSCAN is a density-based clustering non-parametric algorithm that is akin to UMAP/t-SNE. \n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## DBSCAN Parameters\n\n$eps$: The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.\n\n$min\\_nbrs$ ($min\\_samples$): If $min\\_samples$ is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. The metric to use when calculating distance between instances in a feature array.\n\n$metric$: The metric to use when calculating distance between instances in a feature array. We use the Euclidian distance.\n:::\n\n::: {#f0c511d1 .cell execution_count=1}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-2-output-1.png){}\n:::\n:::\n\n\n## 5. Interior Surface via Delaunay Triangulation\n\nThe aim of this step is to extract a point cloud containing only the voxels on the surface of convex hull enclosing the interior space of the tunnel.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Delaunay 3D parameters\n\n\n$alpha$: Controls the smoothness of the constructed alpha-shape. For a non-zero $alpha$ value, only vertices, edges, faces, or tetrahedra contained within the circumsphere (of radius $alpha$) will be output. Otherwise, only tetrahedra will be output.\n\n$tol$: Tolerance to control discarding of closely spaced points. This tolerance is specified as a fraction of the diagonal length of the bounding box of the points.\n\n$offset$: Multiplier to control the size of the initial, bounding Delaunay triangulation.\n\n:::\n\n## 6. Normal Estimation & Orientation \n\nThe aim of this step is to prepare the convex hull point cloud for the surface reconstruction algorithm. For the final mesh to be smooth and free of artifacts, this step has to assign a normal vector point outwards at each point of the convex hull thus defining a clear boundary between \"inner\" and \"outer\" space vis-a-vis the surface.\n\nOne popular method for normal estimation is a KDTree search and for smoothing their orientations a collection of tangent planes is used.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## KD Tree Search and Tangent Plane Orientation\n\n$kdtree\\_radius$: \n\n$kdtree\\_max\\_nn$:\n\n$tangent\\_planes\\_n$:\n\n:::\n\n\n## 7. Surface Reconstruction\n\n::: {#972a8143 .cell execution_count=2}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-3-output-1.png){}\n:::\n:::\n\n\n## References\n\n", | ||
"markdown": "---\ntitle: \"3D tessellation of biomolecular cavity\"\nsubtitle: \"Protocol for analyzing the ribosome exit tunnel\"\nbibliography: \"references.bib\"\ncsl: nature.csl\nengine: \"jupyter\"\nauthor:\n - name: \"Artem Kushner\" \n email: \"[email protected]\"\n affiliations:\n - name: KDD Group\n url: \"https://rtviii.xyz/\"\n\n - name: \"Khanh Dao Duc\" \n email: \"[email protected]\"\n affiliations:\n - name: Department of Mathematics, UBC\n url: \"https://www.math.ubc.ca/\"\n - name: Department of Computer Science, UBC\n url: \"https://www.cs.ubc.ca/\"\n\ndate: \"29 June 2024\"\ncategories: [biology, bioinformatics, surface-reconstruction, computer graphics] \n\ncallout-icon: false\n# format:\n# pdf:\n# include-in-header:\n# text: |\n# \\usepackage{amsmath}\n\nexecute:\n echo: false\n freeze: auto\n pip: [\"pyvista\", \"open3d\", \"scikit-learn\", \"mendeleev\", \"compas\", \"matplotlib\"]\n\n---\n\n\n\n\n\n\n\n\n\n\n## Summary and Background\n\nWe present a protocol to extract the surface of a biomolecular cavity for shape analysis and molecular simulations.\n\nWe apply and illustrate the protocol on the ribosome structure, which contains a subcompartment known as the ribosome exit tunnel. More details on the tunnel features and biological importance can be found in our previous work [@dao2018impact][@dao2019differences]\n\n\n<!--\nIt is central to the protein synthesis in all living organisms. The assembly of most proteins happens at the location known as the __Peptidyl Transferse Center__, where the peptide chain of any given protein is extended with another amino acid like a chain of beads, one bead a time. \n\nThe built protein exits the ribosome through a channel known as the __Ribosome Exit Tunnel__. The interior geometry of the tunnel influences the escape speed of proteins and can be blocked by ligands and antibiotics making it a crucial site for all processes of life.\n\n_Here, we are interested in obtaining a representation of the Exit Tunel's geometry and describe a protocol for doing so._\n-->\n\n\n::: {layout=\"[[57,70] ]\"}\n![PDB 8OJ0. The structure of human ribosome.](./data/8OJ0.gif){fig-alt=\"\"}\n\n![The locations of the ribosome exit tunnel and the PTC.](./data/ptc_and_tunnel_illustration.png){fig-alt=\"\"}\n:::\n\n\n\n\n## Visual Protocol#\n\n\n![Schematic representation of the tunnel geometry surface reconstruction ](./data/visual_protocol.png){fig-alt=\"\"}\n\n\n\n## 0. Mole-based centerline extraction\n\n\nOne representation of the ribosome exit tunnel can be obtained via the [ MOLE ](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765717/) software, which is an implementation of the \"probe\"-based family of algorithms popular for extracting general biomolecular cavities and pores. This step is non-essential if other means of extracting the intial pointcloud are available, but for convenience this will be our starting point.\n\nA probe is a sphere of varying radius which is \"rolled\" by the algorithm against the walls of a given molecule tracing out a path and a radius.\n\n_The algorithm yields an array of varying x,y,z coordinates (henceforth, the $C$~$x,y,z$~) and radius R at each coordinate (henceforth, the $R$~$x,y,z$~)_\n\n:::{layout=\"[[1,1]]\"}\n\n![](./data/mole3.png){width=50%}\n\n![](./data/mole1.png){width=50%}\n:::\n\n## 1. Bounding Box \n\nThis step captures the subset of atoms enclosing the cavity of interest (the tunnel) from the original structure.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Parameters\n\n$R_{\\mathbf{expansion}}$: define the radius of expansion to be the value added to $R_{x,y,z}$ of the centerline at every $C_{x,y,z}$\n\n$B$: The smallest bounding box containing all of the coordinates formed by the expanded centerline \n\n$R_{\\mathbf{vdw}}$: Van der Waals radius of an atom in Ångstroms. Default to the value of 2.\n\n$pc_{B}$: The pointcloud formed by applying $B$ to the original structure and widening each atom coordinate to include points withing a sphere with radius $R_{\\mathbf{vdw}}$\n\n:::\n\n1. For each $C_{x,y,z}$, capture atoms whose center belongs to the sphere formed by $R_{x,y,z} + R_{\\mathbf{expansion}}$, filter out duplicates. Call this coordinate set the ${\\textit{Centerline Expansion Atoms}}$.\n\n2. Calculate the coordinates of the bounding box $B$ around the $\\text{\\textit{Centerline Expansion Atoms}}$.\n\n3. Apply $B$ to the initial structure to extract all atoms that belong to $B$. Call this $\\mathbf{pc^{B}}$.\n\n4. Widen each coordinate (atom center) $C_{x,y,z}^{pc_{B}}$ inside $pc_{B}$ by $R_{vdw}$ to include a more realistic representation of atoms. This is done by creating a voxel subgrid whose bounding cube is between $(C_{x}^{pc_{B}}-R_{vdw},C_{y}^{pc_{B}}-R_{vdw},C_{z}^{pc_{B}}-R_{vdw})$ and $(C_{x}^{pc_{B}}+R_{vdw},C_{y}^{pc_{B}}+R_{vdw},C_{z}^{pc_{B}}+R_{vdw})$, a cube of indices centered at $C_{x,y,z}^{pc_{B}}$. The resultant coordinate set is $\\mathbf{pc_{B}}$.\n\n5. Anchor the coordinates of the $pc_{B}$ to the origin by subtracting the $\\mu(pc_{B})$ of the coordinate set from each $C_{x,y,z}^{pc_{B}}$ and then shifting each $C_{x,y,z}^{pc_{B}}$ upwards by $|\\min(x,y,z)|$. This is done to reduce the amount of empty voxel cells in the following steps, reduce compute. \n\n## 2. Voxelization\n\n\n1. Assume voxel size of $1$ in correspondence to the units of the dataset, Angstroms in our case. (Alternatively, atom-to-sphere expansion in step **1. Bounding Box** should be accordingly scaled). \n\n2. Create a boolean voxel grid with the dimensions of the ($pc_{B}$ + $1$), call this the $Grid_{index}$ (as opposed to $Grid_{coordinate}$)\n\n3. Set voxels at _index_ [$C_{x},C_{y},C_{z}$] for every $C$ in $pc_{B}$ in the $Grid_{index}$ to $1$. All other voxel are $0$.\n\n\n## 3. Inversion\n\nInvert the $Grid_{index}$ to create a representation of the _\"empty space\"_ inside the exit tunnel. \n\n## 4. DBSCAN\n\nThe aim in this step is to extract only the voxels belonging to the \"empty space\" inside the tunnel and no other. Given that we have a good idea of the Van der Waals radii of the atoms that constitute the walls and have control over the size of the voxel in the $Grid_{index}$, one method that we can apply is DBSCAN. \n\nDBSCAN is a density-based clustering non-parametric algorithm that is akin to UMAP/t-SNE. \n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## DBSCAN Parameters\n\n$eps$: The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.\n\n$min\\_nbrs$ ($min\\_samples$): If $min\\_samples$ is set to a higher value, DBSCAN will find denser clusters, whereas if it is set to a lower value, the found clusters will be more sparse. The metric to use when calculating distance between instances in a feature array.\n\n$metric$: The metric to use when calculating distance between instances in a feature array. We use the Euclidian distance.\n:::\n\n::: {#c34937a8 .cell execution_count=1}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-2-output-1.png){}\n:::\n:::\n\n\n## 5. Interior Surface via Delaunay Triangulation\n\nThe aim of this step is to extract a point cloud containing only the voxels on the surface of convex hull enclosing the interior space of the tunnel.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## Delaunay 3D parameters\n\n\n$alpha$: Controls the smoothness of the constructed alpha-shape. For a non-zero $alpha$ value, only vertices, edges, faces, or tetrahedra contained within the circumsphere (of radius $alpha$) will be output. Otherwise, only tetrahedra will be output.\n\n$tol$: Tolerance to control discarding of closely spaced points. This tolerance is specified as a fraction of the diagonal length of the bounding box of the points.\n\n$offset$: Multiplier to control the size of the initial, bounding Delaunay triangulation.\n\n:::\n\n## 6. Normal Estimation & Orientation \n\nThe aim of this step is to prepare the convex hull point cloud for the surface reconstruction algorithm. For the final mesh to be smooth and free of artifacts, this step has to assign a normal vector point outwards at each point of the convex hull thus defining a clear boundary between \"inner\" and \"outer\" space vis-a-vis the surface.\n\nOne popular method for normal estimation is a KDTree search and for smoothing their orientations a collection of tangent planes is used.\n\n:::{.callout-note appearance=\"simple\" collapse=\"true\"}\n\n## KD Tree Search and Tangent Plane Orientation\n\n$kdtree\\_radius$: \n\n$kdtree\\_max\\_nn$:\n\n$tangent\\_planes\\_n$:\n\n:::\n\n\n## 7. Surface Reconstruction\n\n::: {#46f64e0d .cell execution_count=2}\n\n::: {.cell-output .cell-output-display}\n![](index_files/figure-html/cell-3-output-1.png){}\n:::\n:::\n\n\n## References\n\n", | ||
"supporting": [ | ||
"index_files/figure-html" | ||
], | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters