Skip to content

Commit

Permalink
correcting dates for blog posts
Browse files Browse the repository at this point in the history
  • Loading branch information
riddhibattu committed Nov 19, 2024
1 parent 415068f commit 642f9fc
Show file tree
Hide file tree
Showing 6 changed files with 30 additions and 373 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"hash": "a02af8cd2448720819100b6854c78a30",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: \"Identifying R-loops in AFM imaging data\"\nauthor:\n - name: \"Berkant Cunnuk\"\n email: \"[email protected]\"\n url: \"https://cnnk.xyz\"\ndate: \"November 10 2024\"\ncategories: [\"biology\", \"AFM\"]\nbibliography: references.bib\nexecute:\n freeze: auto\n---\n\n## Context and motivation\n\nR-loops are three-stranded nucleic acid structures containing a DNA:RNA hybrid and an associated single DNA strand. They are normally created when DNA and RNA interact throughout the lifespan of a cell. Although their existence can be beneficial to a cell, an excessive formation of these objects is commonly associated with instability phenotypes.\n\nThe role of R-loop structures on genome stability is still not completely determined. The determining characteristics of harmful R-loops still remain to be defined. Their architecture is not very well-known either, and they are normally classified manually.\n\nIn this blog post, we will carry AFM data to the Kernell shape space and try to develop a method to detect and classify these objects using _geomstats_ [@geomstats]. We will also talk about a rather simple method that works reasonably well.\n\n<div style=\"text-align: center;\">\n![Fig.1 Pictures of DNA fragments at the gene _Airn_ in vitro. One of them was treated with RNase H and the other was not. The image on the bottom highlights the R-loops that were formed. [@carrasco2019]](data/rloops.png)\n</div>\n\n## Preparations before data analysis\n\nOriginal images will be edited to remove background noise. The figure below from the reference article tries to do that while maintaining some colors. This is useful to track the height of a particular spot.\n\n<div style=\"text-align: center;\">\n![Fig.2 A demonstration of background noise removal [@carrasco2019]](data/rloops-without-noise.png)\n</div>\n\nI went a step further and turned these images into binary images. In other words, images we will use here will consist of black and white pixels, which correspond to 0 and 1 respectively. This makes coding a bit easier, but the height data (or the $z$ coordinate) will need to be stored in a different matrix.\n\n<div style=\"text-align: center;\">\n![Fig.3 Binarized images of R-loops, for the original image see Fig. 1](data/rloops-binary.png)\n</div>\n\nWe first import the necessary libraries.\n\n::: {#0169cf1a .cell execution_count=1}\n``` {.python .cell-code}\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport geomstats.backend as gs\ngs.random.seed(2024)\n```\n:::\n\n\nWe process our data and put it into matrices.\n\n::: {#8ff9287c .cell execution_count=2}\n``` {.python .cell-code}\ndata_original = plt.imread(\"original-data.png\")\ndata = plt.imread(\"edited-data.png\")\n\nx_values = []\ny_values = []\nz_values = []\ndata_points = []\n\nfor i,rows in enumerate(data_original):\n for j,rgb in enumerate(rows):\n if not (rgb[0]*255 < 166 and rgb[0]*255 > 162):\n continue\n if not (rgb[1]*255 < 162 and rgb[1]*255 > 167):\n continue\n if not (rgb[2]*255 < 66 and rgb[1]*255 > 61):\n continue\n # store useful height data\n z_values.append((i,j,rgb[0], rgb[1], rgb[2]))\n\nfor i,rows in enumerate(data):\n for j,entry in enumerate(rows):\n # take white pixels only (entry is a numpy array)\n if (entry.all() == 1):\n y_values.append(j+1)\n x_values.append(i+1)\n data_points.append([i,j])\n```\n:::\n\n\n## A primitive approach that surprisingly works\nA way to distinguish lines from loops is to count the amount of white pixels in each column. This heavily depends on the orientation. To get a meaningful result, it is required to do this at least $2$ times, one for columns and one for rows. This is not bulletproof and will sometimes give false positives. However, it still gives us a good idea of possible places where there is an R-loop.\n\n::: {#57f34943 .cell execution_count=3}\n``` {.python .cell-code}\nwhite_pixel_counts = [i*0 for i in range(500)]\n\ndata = plt.imread(\"data-1.png\")\n\nfor i,rows in enumerate(data):\n for j,entry in enumerate(rows):\n # count white pixels only\n if (entry.all() == 1):\n white_pixel_counts[j] += 1\n\nplt.plot(range(500), white_pixel_counts, linewidth=1, color=\"g\")\nplt.xlabel(\"columns\")\nplt.ylabel(\"white pixels\")\n\nplt.legend([\"Amount of white pixels\"])\nplt.show()\n```\n:::\n\n\n<div style=\"text-align: center;\">\n![Fig.4](data/white-pixels.png)\n</div>\n\nWe can see that in Figure $1$, the R-loops are mainly accumulated on the left side. There are a considerable amount of them on the right side as well. There are some of them around the middle, but their numbers are lower. We can see that this is clearly represented in Figure $4$.\n\nWith this approach, $2$ different white pixels in the same column will always be counted even if they are not connected at all, which gives us some false positives. To avoid this issue, we can define the following function taking the position of a white pixel as its input.\n\n$$ f((x,y)) = \\left\\lbrace \\begin{array}{r l}1, & \\text{if} ~~ \\exists c_1,c_2,c_3,\\dots c_{\\gamma} \\in [y-\\epsilon, y+\\epsilon] ~~ \\ni f(x,y) = 1 \\\\0, & \\text{otherwise}\\end{array} \\right.$$\n\n$\\epsilon$ and $\\gamma$ can be adjusted depending on the data at hand. This gives us a more precise prediction about likely places for an R-loop. In this case, choosing $\\gamma = 8$ and $\\epsilon = 10$ gives us the following graph.\n\n<div style=\"text-align: center;\">\n![Fig.5](data/white-pixels_f.png)\n</div>\n\nWe can see that the Figure $5$ and $4$ is quite similar. The columns where the graph peaks are still the same, but we see a decrease in the values between these peaks, which is the expected result. This figure has less false positives compared to the previous one, so it is a step in the right direction.\n\n## An analysis using the Kendall pre-shape space\n\nInitialize the space and the metric on it. Create a _Kendall sphere_ using geomstats.\n\n::: {#79be6878 .cell execution_count=4}\n``` {.python .cell-code}\nfrom geomstats.geometry.pre_shape import PreShapeSpace, PreShapeMetric\nfrom geomstats.visualization.pre_shape import KendallSphere\n\nS_32 = PreShapeSpace(3,2)\nS_32.equip_with_group_action(\"rotations\")\nS_32.equip_with_quotient()\nmetric = PreShapeMetric(space=S_32)\nS_32.metric = metric\n\nprojected_points = S_32.projection(gs.array(data_points))\nS = KendallSphere()\nS.draw()\nS.add_points(projected_points)\nS.draw_points(alpha=0.1, color=\"green\", label=\"DNA matter\")\nS.ax.legend()\nplt.show()\n```\n:::\n\n\n<div style=\"text-align: center;\">\n![Fig.6 White pixels projected onto the pre-shape space](data/kendall_projected.png)\n</div>\n\nTaking a close look at it will reveal more details about where the points lie in the space.\n\n<div style=\"text-align: center;\">\n![Fig.7 White pixels projected onto the pre-shape space](data/kendall_projected_2.png)\n</div>\n\nThe upper part of the curve consist of points that are in the left side of the image while the one below are closer to the middle. We see a reverse relationship between the amount of R-loops and the density of these points. This is an expected result when we consider how the Kendall pre-shape space is defined.\n\nA pre-shape space is a hypersphere. In our case, it has dimension $3$. Hypothetically, if all of our points were placed at the vertices of a triangle of similar length, their projection to the Kendall pre-shape space would be approximately a single point. In the case of circular objects, there will be multiple pairs of points that are the same distance away from each other more than we would see if the object was a straight line. Therefore, we expect points forming a loop (which is a deformed circle for our purposes) to be separated from the other points. In other words, the lower-density areas in the hypersphere correspond to areas with a higher likelihood of R-loop presence.\n\nThe presence of more R-loops does not indicate that there will be fewer points in the corresponding area of the pre-shape space. It just means that they are further apart and more uniformly spread.\n\n<div style=\"text-align: center;\">\n![Fig.8 A zoomed-in and rotated version of Figure 7. The left side has the lowest density followed by the right side. The middle part has a higher density of points, as expected.](data/kendall_projected_4.png)\n</div>\n\nPoints in the pre-shape space give us possible regions where we may find R-loops. However, they do not guarantee that there will be one in that location. This is evident when we look at the right end of this curve. It has a lower density of points than the left side, which is a result we did not want to see.\n\n<div style=\"text-align: center;\">\n![Fig.9 The right end of the curve in Figure 6](data/kendall_projected_3.png)\n</div>\n\nThis happens because there are more DNA fragments on the right side with a shape similar to a half circle. Most of them are not loops, but they are distinct enough from the rest that the corresponding projection in the pre-shape space has a low density of points, which are separated from the rest.\n\nWe can also take a look at the Fréchet mean of the projected points in the pre-shape space.\n\n::: {#7fcba7ef .cell execution_count=5}\n``` {.python .cell-code}\nprojected_points = S_32.projection(gs.array(data_points))\nS = KendallSphere(coords_type=\"extrinsic\")\nS.draw()\nS.add_points(projected_points)\nS.draw_points(alpha=0.1, color=\"green\", label=\"DNA matter\")\n\nS.clear_points()\nestimator = FrechetMean(S_32)\nestimator.fit(projected_points)\nS.add_points(estimator.estimate_)\nS.draw_points(color=\"orange\", label=\"Fréchet mean\", s=150)\nS.add_points(gs.array(S.pole))\nS.draw_curve(color=\"orange\", label=\"curve from the Fréchet mean to the north pole\")\n\nS.ax.legend()\nplt.show()\n```\n:::\n\n\n<div style=\"text-align: center;\">\n![Fig.10 Fréchet mean of the projected points](data/kendall_projected_mean.png)\n</div>\n\nThe point we find is located around the left side of the green curve, which is a result we already expected.\n\n",
"supporting": [
"rloop-analysis_files"
],
"filters": [],
"includes": {}
}
}
16 changes: 16 additions & 0 deletions _freeze/posts/tutorial/index/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"hash": "c1ac7beedad8483cd668aacb48a570a0",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: Tuturial\ndate: September 19 2024\nauthor:\n - name: Riddhi\ncategories:\n - Tuturial\nbibliography: references.bib\n---\n\n## Lets begin!\n\n::: {#de3c5ae1 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\ntitanic_df = pd.read_csv(\"data/Titanic.csv\")\ntitanic_df.head()\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Pclass</th>\n <th>Name</th>\n <th>Sex</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Ticket</th>\n <th>Fare</th>\n <th>Embarked</th>\n <th>Survived</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>3</td>\n <td>Allison Hill</td>\n <td>male</td>\n <td>17</td>\n <td>4</td>\n <td>2</td>\n <td>43d75413-a939-4bd1-a516-b0d47d3572cc</td>\n <td>144.08</td>\n <td>Q</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>1</td>\n <td>Noah Rhodes</td>\n <td>male</td>\n <td>60</td>\n <td>2</td>\n <td>2</td>\n <td>6334fa2a-8b4b-47e7-a451-5ae01754bf08</td>\n <td>249.04</td>\n <td>S</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>3</td>\n <td>Angie Henderson</td>\n <td>male</td>\n <td>64</td>\n <td>0</td>\n <td>0</td>\n <td>61a66444-e2af-4629-9efb-336e2f546033</td>\n <td>50.31</td>\n <td>Q</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>3</td>\n <td>Daniel Wagner</td>\n <td>male</td>\n <td>35</td>\n <td>4</td>\n <td>0</td>\n <td>0b6c03c8-721e-4419-afc3-e6495e911b91</td>\n <td>235.20</td>\n <td>C</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>1</td>\n <td>Cristian Santos</td>\n <td>female</td>\n <td>70</td>\n <td>0</td>\n <td>3</td>\n <td>436e3c49-770e-49db-b092-d40143675d58</td>\n <td>160.17</td>\n <td>C</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>\n```\n:::\n:::\n\n\n![Growth and Form](img/growth_and_form.jpg){#fig-growth}\n\nThis is refering to img @fig-growth\n\n\nReferences here: [@poitevin2020structural]\n\n",
"supporting": [
"index_files"
],
"filters": [],
"includes": {
"include-in-header": [
"<script src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"></script>\n<script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"></script>\n<script type=\"application/javascript\">define('jquery', [],function() {return window.jQuery;})</script>\n"
]
}
}
}
Loading

0 comments on commit 642f9fc

Please sign in to comment.