Skip to content

Commit

Permalink
Merge pull request #20 from bioshape-analysis/qiyu
Browse files Browse the repository at this point in the history
  • Loading branch information
kally99 authored Sep 13, 2024
2 parents 01330dd + a7a21a7 commit df2e55d
Show file tree
Hide file tree
Showing 5 changed files with 44 additions and 2 deletions.
31 changes: 29 additions & 2 deletions posts/cryo_ET/demo.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,42 @@ The main advantage of cryo-ET is that it allows the cells and macromolecules to

![Tomographic slices of SARS-CoV-2 virions, with spike proteins embedded in the membrane[@Shi2023]](img/et_example.png){ width=65% style="display: block; margin-left: auto; margin-right: auto;" }

In order to reconstruct macromolecules, tomographic slices need to be processed through a pipeline. A typical cryo-ET data processing pipeline includes: tilt series alignment, CTF estimation, tomogram reconstruction, particle picking, iterative subtomogram alignment and averaging, and heterogeneity analysis. Unlike cryo-EM, many algorithms for cryo-ET processing are still under development. Therefore, a large database of cryo-ET to test and tune algorithms is important. Unfortunately, collecting cryo-ET data is both time and money-consuming, and the current database of cryo-ET is not large enough, especially for deep learning training which requires a large amount of data. Therefore, simulation becomes a substitute to generate a large amount of data in a short time and at low expense.
In order to reconstruct macromolecules, tomographic slices need to be processed through a pipeline. A typical cryo-ET data processing pipeline includes: tilt series alignment, CTF estimation, tomogram reconstruction, particle picking, iterative subtomogram alignment and averaging, and heterogeneity analysis. Unlike cryo-EM, many algorithms for cryo-ET processing are still under development. Therefore, a large database of cryo-ET to test and tune algorithms is important. Unfortunately, collecting cryo-ET data is both time and money-consuming, and the current database of cryo-ET is not large enough, especially for deep learning training which requires a large amount of data. Therefore, simulation becomes a substitute to generate a large amount of data in a short time and at low expense. In this post, we will focus on the simimulation of membrane-embedded proteins.

## Workflow
We will use the Membrane Embedded Proteins Simulator (MEPSi), a tool incorporated in PyCoAn to simulate SARS-CoV-2 spike protein. Before doing so, I will briefly go through the workflow of MEPSi.
We will use the Membrane Embedded Proteins Simulator (MEPSi), a tool incorporated in PyCoAn to simulate SARS-CoV-2 spike protein [@mepsi2022]. Here, I will briefly go through the workflow of MEPSi.

### 1. Density modeling
In the density modeling, atom coordinate lists of macromolecules of interest are given, and a "ground-truth" volume representation is simulated by placing the given macromolecules on the membrane with specified geometry. The algorithm uses a 3D Archimedean spiral to place the molecules at approximately equidistant points along the membrane. Random translations with sa bounding box defined by the equidistance and the maximum XY radius of the molecules will then be applied. This ensures there is no overlap between macromolecules on the surface. The volume is generated using direct generation of membrane density and Gaussian convolution of the atom positions.

Optionally, a solvent model can be generated and added to the density. In order to keep the computational cost low, a continuum solvent model with an adjustable contrast tuning parameter is used. A 3D version of Lapacian pyramid blending is used to account for displacements of one object from another to mitigate edge effects and emulates the existence of a hydration layer around the molecules.


### 2. Basis tilt series generation
In this step, an unperturbed basis tilt series is generated from the simulated volume. The individual tilt images are obtained by rotating the volume around the Y axis and projecting the density along Z axis. The reason that a basis tilt series is generated before final tomogram simulation is to reduce computational cost. It can speed up the process quite a lot if a perturbation-free basis tilt series is first generated to allow the user explore perturbation parameters (e.g. contrast transfer function and noise) before generating final tomograms from perturbed basis tilt series.

### 3. CTF
One possible perturbation we can add to the basis tilt series is the contrast transfer function (CTF), which models the effect of the microscope optics. One major determinant of the CTF is the defocus value at the scattering event, which changes while the electrons traverse the specimen. In order to simplify the problem, we assume that the simulated specimen as an infinitely thin slice so only focus changes caused by tilting need to be considered. Projected tilted specimen images are subjected to a CTF model in strips parallel to the tilt axis with the defocus value modulated according to the position of the strip center.

### 4. Noise

The noise model is expressed as a mixture of Gaussian and Laplacian, in contrast of white additive Gaussian usually used in many other simulation applications. The noise in the low-dose images contrivuting to a tilt series tends to have statistically significant non-zero skewness, which cannot be modeled by Gaussian error model alone.

![Overlay of an experimental intensity histogram (blue) with noise modeling by Gaussian only (red) vs. with a mix of Gaussian and Laplacian noise (green)](img/noise_model.png){ width=45% style="display: block; margin-left: auto; margin-right: auto;" }

### 5. Tomogram generation
Finally tomograms are simulated from the perturbed basis tilt series with user-specified tilt range and increment.

## Results
In order to fully demonstrate the capacity of MEPSi, tomograms were simulated from a sample containing three different conformations of SARS-Cov2 spike protein: 6VXX, 6VYB and 6X2B, with ratio 1:1:2. Protein coordinate files in .pdb format were obtained from RCSB PDB, and preprocessed in ChimeraX to align with z-axies in order to be modeled in orrect direction in density simulation.

![Three conformations of the prefusion trimer of SARS-Cov2 spike protein: all RBDs in the closed position (left, 6VXX); one RBD in the open position (center, 6VYB); two RBDs in the open position (right, 6X2B)](img/protein_structure.png){ width=65% style="display: block; margin-left: auto; margin-right: auto;" }

Solvent and CTF were added. A SNR of 0.5 was used. Finally we generated tomograms every
$1^\circ$ from $-60^\circ$ to $+60^\circ$. Below were four tomograms with different tilt angles simulated.

![](img/simulated_tilt.png){ width=100% style="display: block; margin-left: auto; margin-right: auto;" }


## References

Binary file added posts/cryo_ET/img/noise_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/cryo_ET/img/protein_structure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/cryo_ET/img/simulated_tilt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 15 additions & 0 deletions posts/cryo_ET/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,19 @@ @article{Shi2023
issn={1476-4687},
doi={10.1038/s41586-023-06273-4},
url={https://doi.org/10.1038/s41586-023-06273-4}
}

@article{mepsi2022,
title = {MEPSi: A tool for simulating tomograms of membrane-embedded proteins},
journal = {Journal of Structural Biology},
volume = {214},
number = {4},
pages = {107921},
year = {2022},
issn = {1047-8477},
doi = {https://doi.org/10.1016/j.jsb.2022.107921},
url = {https://www.sciencedirect.com/science/article/pii/S1047847722000910},
author = {Borja {Rodríguez de Francisco} and Armel Bezault and Xiao-Ping Xu and Dorit Hanein and Niels Volkmann},
keywords = {Simulations, tomographic reconstruction, Image processing, Quality metrics, cryo-EM},
abstract = {The throughput and fidelity of cryogenic cellular electron tomography (cryo-ET) is constantly increasing through advances in cryogenic electron microscope hardware, direct electron detection devices, and powerful image processing algorithms. However, the need for careful optimization of sample preparations and for access to expensive, high-end equipment, make cryo-ET a costly and time-consuming technique. Generally, only after the last step of the cryo-ET workflow, when reconstructed tomograms are available, it becomes clear whether the chosen imaging parameters were suitable for a specific type of sample in order to answer a specific biological question. Tools for a-priory assessment of the feasibility of samples to answer biological questions and how to optimize imaging parameters to do so would be a major advantage. Here we describe MEPSi (Membrane Embedded Protein Simulator), a simulation tool aimed at rapid and convenient evaluation and optimization of cryo-ET data acquisition parameters for studies of transmembrane proteins in their native environment. We demonstrate the utility of MEPSi by showing how to detangle the influence of different data collection parameters and different orientations in respect to tilt axis and electron beam for two examples: (1) simulated plasma membranes with embedded single-pass transmembrane αIIbβ3 integrin receptors and (2) simulated virus membranes with embedded SARS-CoV-2 spike proteins.}
}

0 comments on commit df2e55d

Please sign in to comment.