Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transects on the native MPAS grid #586

Closed
xylar opened this issue May 21, 2019 · 43 comments · Fixed by #718
Closed

transects on the native MPAS grid #586

xylar opened this issue May 21, 2019 · 43 comments · Fixed by #718

Comments

@xylar
Copy link
Collaborator

xylar commented May 21, 2019

I am frustrated with the interpolation in our transects, which I feel obscures both the complexity and the coarseness of the native grid and also gives a false sense of resolution.

I would at least like to support the option of transects on the native grid, meaning quadrilaterals that have corners at layer interfaces and edges, and a constant value corresponding to the field within the given cell. This would mean a truer representation of the ice draft and bathymetry to what is actually in the model as well.

To accomplish this, there would need to be an efficient way to find the nearest MPAS edge to a given point in a transect (possibly subdivided to make sure not to skip cells). A promising way to do this (used in COMPASS) is scipy's KDTree. A KDTree could be constructed with xEdge, yEdge and zEdge, a corresponding set of (x, y, z) vectors could constucted for points along the transect, and KDTree.query() could be used to find the edge index closes to each of these points. Then, redundant points could be removed and the cells connecting adjacent edges (if any) could be discovered.

It is fairly clear how this would be used to display model data. It is much less clear how to use this effectively to compare with WOCE or SOSE data, given that the data are sampled at different locations and at different spatial frequency. It may be appropriate to display each data set on its native grid and then use nearest neighbor, rather than linear, interpolation to a comparison grid for differencing. This will likely require some trial and error.

@milenaveneziani
Copy link
Collaborator

Well, we could do model-only transects on the native grid, and still interpolate when we do a direct comparison with obs. Really it's just the bias plot that would have to go, we could even consider plotting the WOCE transect on its own z,x data alongside the model transect.

@matthewhoffman
Copy link
Member

matthewhoffman commented Nov 26, 2019

@milenaveneziani , @xylar , after talking about this with Milena, I propose a workflow something like this:

So I see a workflow like this:

  1. Define line segments between pairs of points in lat/lon space that define the transect
  2. Convert each pair of points to x,y,z on the sphere.
  3. For each line segment, create an array of points at high resolution between the two end points along a great circle between the two points. (Define the equation of the great circle that intersects those two points in x,y,z space, and then create an array with spacing of D meters along that arc, where D is smaller than the smallest cell spacing in the entire mesh.)
  4. Create a kd tree for the entire MPAS mesh using x,y,zCell (or edge or vertex as appropriate) (If creating the kd tree for the entire MPAS mesh takes too long, a smaller one could be created using the subset of the points in the mesh that cover a box around the given line segment)
  5. Query the kd tree to get the nearest neighbor on the MPAS grid to each of the points in the array created in step 3.
  6. Remove duplicates from the resulting list of points
  7. The cleaned list can now be used to build the transect.

@xylar , maybe you've already thought about this in more detail and have other ideas.

@milenaveneziani
Copy link
Collaborator

Thanks @matthewhoffman, that all seems reasonable to me.

@xylar, to give you a bit of contest: I was thinking of trying out a 'simple' script, outside of MPAS-Analysis, just to get a feel of how this would work. My vision is to use this mostly for model visualization purposes, not for model-data comparisons (we have already a task for that). We could plot the obs (or climatology or other reanalysis data) alongside it, but just for visual comparison, not quantitative.

@milenaveneziani
Copy link
Collaborator

The most important sticking point is how efficient is kd-tree (I must admit I don't really understand how this k-dimensional tree is constructed).

@cbegeman
Copy link
Collaborator

@xylar I don't have a clear sense of what advantage kd-tree offers in this context over the field "cellsOnCell"

@milenaveneziani, I'd propose a modified version of @matthewhoffman's workflow:
(1)-(3) as above
4. Choose starting cell (index k) by minimizing distance to one end of the transect line.
5. Loop over cellsOnCell[k], choose the cell index that minimizes the distance to the "ideal" transect line (excluding previously chosen cells)
6. Stop when the other end of the transect line is reached (a cell is found within a certain distance of the end point? Not sure yet how to make this step robust)

@xylar
Copy link
Collaborator Author

xylar commented Nov 27, 2019

Thanks for these thoughts everyone!

I've been working on this for awhile (off and on) and my approach is mostly similar to @matthewhoffman's suggestion. It is important to use edges rather than cell centers because there needs to be a way to not only find the nearest cell but also to determine if you are outside of the nearest cell. I believe that's where I ran into some difficulty but I'll have to refresh my memory.

@cbegeman's approach runs into trouble when the transect intersects land, which is frequently the case for us unless transects are chosen very carefully (especially because land is different at different resolutions). It turns out to be exceedingly difficult in my experience to use cellsOnCell to try to traverse around land, which is one of the major issues with the transect code Doug Jacobsen wrote.

@cbegeman
Copy link
Collaborator

@xylar, I see w.r.t. land difficulties.

A potential issue I see is selecting too many cells near the transect (i.e., selected edges snake too much). It seems like we would need an additional cleaning step besides @matthewhoffman's (6) to make sure each cell shares at most 2 edges with other cells or some similar criterion.

@xylar
Copy link
Collaborator Author

xylar commented Nov 27, 2019

Yes, exactly! I vaguely recall that ended up really having to determine if a given edge intersects the transect, not just if its center is closest to the transect. That's obviously a bit trickier.

I'll try to get back to the work I was doing later this week and see where I left things. Assuming the path forward isn't obvious, I'll ask for advice.

@milenaveneziani
Copy link
Collaborator

Thanks @cbegeman and @xylar for the discussion. Xylar: I should have known you are thousand of miles ahead of me :) Let me know when you get to a point when it would make sense for me to try/test something or help with anything else.

@xylar xylar removed the low priority label Dec 8, 2019
@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2019

@cbegeman, @matthewhoffman and @milenaveneziani, here is my progress so far.

Steps:

  • Find edges that might intersect the transect:
    • compute Cartesian bounding boxes for each edge in the mesh
    • break transects into 10 km segments (so curvature is more or less negligible)
    • find all edge bounding boxes that intersect the bounding box of each transect segment (expanded with a buffer by the maximum cell size, just to be safe)
  • Find which edges actually intersect the transect and where
  • Find cells between edges, or if there is no shared cell this must be land
  • Interpolate the vertical coordinate for layer interfaces from cells to edges

Here is an image showing the trial transects I'm playing with on the EC60to30wLI grid (the old 100 vertical layer version):
transects
In orange, blue, green and red are the "candidate" edges for intersections. If you zoom in a lot, magenta points are the intersections between edges and transects, black lines are the edges, small green points are the cell centers of cells between these edges where properties will be taken from. Note the zig-zag but that's likely realistic...

Here are some example temperature transects.

transect0
transect1
transect2
transect3

Let me know what you think, keeping in mind this is still very much in prototype mode.

@cbegeman
Copy link
Collaborator

cbegeman commented Dec 8, 2019

@xylar If you use the adjacency matrix for the selected cells to identify and eliminate each cell that has 3 neighbors AND is a neighbor of a cell that also has 3 neighbors (updating the matrix after each removal) then you'll get rid of some of the zig-zagging. This still won't give you the straightest path possible though.

@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2019

I don't think we want to eliminate the zig zag. I think that's a realistic representation of a slice through an MPAS mesh with cellwise constant scalars.

@milenaveneziani
Copy link
Collaborator

This looks pretty decent to me, @xylar. My only question is about the last bullet point: wouldn't it be better to leave T,S onto the cell centers (basically plot what you have on the small green dots)?

@milenaveneziani
Copy link
Collaborator

And the white spaces in the upper water column in the first two figures are due to the transect going below a ice shelf, right? that's a pretty cool visual.

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2019

My only question is about the last bullet point: wouldn't it be better to leave T,S onto the cell centers (basically plot what you have on the small green dots)?

Sorry for the confusion. I reworded the last bullet point to make clear that I'm interpolating interface locations, not tracers at cell centers.

And the white spaces in the upper water column in the first two figures are due to the transect going below a ice shelf, right? that's a pretty cool visual.

Yes, that's correct. I'll probably add brown for land below and maybe slightly blue white for ice above.

@cbegeman
Copy link
Collaborator

cbegeman commented Dec 9, 2019

I don't think we want to eliminate the zig zag. I think that's a realistic representation of a slice through an MPAS mesh with cellwise constant scalars.

Just to clarify, I wasn't referring to all zig-zag, just cases where the "band" of cells identified is wide. I think this could help make the profiles look less discontinuous.

That said, I think these profiles look good and this is a great plotting option to have. It could also be nice to have a couple coordinate labels along the x-axis especially if we're using this for very long transects.

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2019

Just to clarify, I wasn't referring to all zig-zag, just cases where the "band" of cells identified is wide. I think this could help make the profiles look less discontinuous.

Ah, yeah, sorry. I was reading/responding on my phone before and didn't get the full picture. I've been careful to include all cells that a given transect intersects. The representation of the transect depicted shouldn't be thought of as going from cell center to cell center. Instead, it should be thought of as slicing through cells in a great-circle arc, but with constant scalar values in each cell. Perhaps that was already clear. Given the design, I don't think it's appropriate to eliminate any cells that a given transect passes through even when consecutive cell centers are far apart and far from the transect itself (some of the more egregious zig-zagging). Doing so would leave a gap in the transect that would need to be filled somehow (and not in a way that is consistent with the model).

It could also be nice to have a couple coordinate labels along the x-axis especially if we're using this for very long transects.

Yep, something is needed for sure! I would add the same lat and lon coordinates that we have at the top axis of our current transects. I'll also indicate the start and end with red and green y axes, respectively. And there will be an inset showing the transect on a map as usual.

@milenaveneziani
Copy link
Collaborator

I'd like to revive this discussion, including my recent experience with plotting vertical sections.
So, I started from our compute_transects.py, which we normally use as post-processing script to compute volume transport across the transects that are part of our transect mask file (Drake Passage, Florida Strait, Fram Strait, etc: about a dozen in total). This script does everything on edges. So, this is what I did for plotting my sections, for each transect:

  1. identified all edges, with transectEdgeGlobalIDs>0, which I suppose means 'all ocean edges';
  2. normalVelocity is then not-interpolated;
  3. T and S are instead interpolated by identifying the two cellsOnEdge that are adjacent to each edge, masking the ones that fall on land (cellsOnEdge=0);
  4. plot T,S,and normalVelocity, using z=refBottomDepth and x=approximate spherical distance.

I should also mention that my input data for this is the output of the climatology file that we calculate within MPAS-Analysis, which is computed through nco time averaging of the timeSeriesStatsMonthly files.

Now, while doing this, I encountered some problems related to land points or points that fall onto topography, and I'd like to share these with you to see if you have had similar issues before:

  • the first one is that, if cellsOnEdge=0 (land cell), the corresponding T and S values are not NaNs (or at least python does not recognized them as nans). So I masked these values myself.
  • the next one is related to points that fall onto topography: in particular, if I print the normalVelocity on the transect edges (no interpolation), the values that fall on topography are 0.00000, not a nan value or a big number. This is of course problematic, as vel=0 is a sensible number.. The only way to mask these values properly is to identify kmax.
  • which brings me to my last point, and that I tried to use maxLevelCell to identify a zmax for each transect edge, but for some reason the resulting zmax is much deeper that the actual zmax (the zmax that I estimate from masking T and S). No idea why. I understand that maxLevelCell is on cells, but is it possible that two adjacent cells have a very different zmax than the associated edge?

Anyways, here is an example of section that I've been able to produce so far, any input would be really appreciated:
Temp_FramStrait_E3SM-Arctic-OSI_60to10_ANN_years0166-0177

@xylar
Copy link
Collaborator Author

xylar commented Jun 18, 2020

@milenaveneziani, nice progress! I'd like to get back to this, too and this will help me get inspired.

In the meantime, a few responses to your questions and comments:

the first one is that, if cellsOnEdge=0 (land cell), the corresponding T and S values are not NaNs (or at least python does not recognized them as nans). So I masked these values myself.

Yes, this is the purpose of the collaboration with Charlie and Wenshan. In the meantime, you could use maxLevelCell to make a 3D cellMask and use that. That's the safer way than using values of T and S themselves, for example.

the next one is related to points that fall onto topography: in particular, if I print the normalVelocity on the transect edges (no interpolation), the values that fall on topography are 0.00000, not a nan value or a big number. This is of course problematic, as vel=0 is a sensible number.. The only way to mask these values properly is to identify kmax.

Yes, for velocities, maxLevelEdge is determined as the minimum of maxLevelCell on the two adjacent cells. That means that edges with one adjacent land cell are considered land edges. But for T and S, you probably want to use the maximum of maxLevelCell on the two cells and to just use the one neighbor whenever you don't have 2 valid neighbors. This would also be true for land edges adjacent to land cells, where you want to use the T and S values from the one valid neighboring cell so you don't end up with gaps. Maybe that's what you're already doing?

which brings me to my last point, and that I tried to use maxLevelCell to identify a zmax for each transect edge, but for some reason the resulting zmax is much deeper that the actual zmax (the zmax that I estimate from masking T and S). No idea why. I understand that maxLevelCell is on cells, but is it possible that two adjacent cells have a very different zmax than the associated edge?

That's a tricky one. layerThicknessEdge is not well defined on edges below maxLevelEdge in MPAS-O (because you can't interpolate layerThickness from the adjacent cells). I found it doesn't work well to extrapolate layer thickness from adjacent cells when you only have one neighbor. Instead, what I do is construct zInterfaceCell as the location of the top and bottom of each cell center (i.e. a 3D array that has nVertLevels+1 vertical entries) and then interpolate this (or keep it constant at edges without neighbors) get zInterfaceEdge. There is code for that in the paraview extractor:
https://github.com/MPAS-Dev/MPAS-Tools/blob/master/conda_package/mpas_tools/viz/paraview_extractor.py#L1813-L1914

I hope that helps some.

@milenaveneziani
Copy link
Collaborator

Thanks @xylar, that's helpful.
A few comments/questions:

In the meantime, you could use maxLevelCell to make a 3D cellMask and use that.

good idea. Although I have printed maxLevelCell and noticed that it is not 1 when the corresponding cellsOnEdge value was 0, so that is very confusing to me..

Yes, for velocities, maxLevelEdge is determined as the minimum of maxLevelCell on the two adjacent cells.

but maxLevelEdge is something that you calculate, right? I don't see it in the mesh file.

Instead, what I do is construct zInterfaceCell as the location of the top and bottom of each cell center (i.e. a 3D array that has nVertLevels+1 vertical entries) and then interpolate this (or keep it constant at edges without neighbors) get zInterfaceEdge.

Is this because the z levels of the edges are on the cells interfaces (not at the same z level as the cell centers)? I realize I should know this, I guess I have never thought about it.

@xylar
Copy link
Collaborator Author

xylar commented Jun 18, 2020

good idea. Although I have printed maxLevelCell and noticed that it is not 1 when the corresponding cellsOnEdge value was 0, so that is very confusing to me..

maxLevelCell is always a valid number because we cull all cells that are land. If cellsOnEdge is 0, there is no corresponding cell and no maxLevelCell to check on.

but maxLevelEdge is something that you calculate, right? I don't see it in the mesh file.

Correct, it is the minimum of maxLevelCell[cellsOnEdge] over the 2 cells, or 0 if only one valid neighbor is present. But you may want to compute both the min and the max for an edge. We don't store this because it's easy to compute and a waste of I/O.

Is this because the z levels of the edges are on the cells interfaces (not at the same z level as the cell centers)? I realize I should know this, I guess I have never thought about it.

If you didn't have maxLevelCell, zInterfaceEdge and zMidEdge should be computed by linear interpolation from cell centers. Taking into account maxLevelCell, there is no way to compute zInterface or zMid for cells below maxLevelCell. So there is not a clear definition of these z's at edges. So you have to make a choice. In practice, the mode doesn't need to know this information because no fluxes happen at these depths on these edges. So it's only for graphics that this matters.

@milenaveneziani
Copy link
Collaborator

maxLevelCell is always a valid number because we cull all cells that are land. If cellsOnEdge is 0, there is no corresponding cell and no maxLevelCell to check on.

ok, so, even though maxLevelCell exists when the cell number is 0, it is not valid?

I will digest all this and try to come up with a decent mask for both land points and points on topography. For what I understand, since I am choosing all edges that have a positive edgeID, I am already considering only ocean edges: is that correct?
So the point is to identify the land neighboring cells and an appropriate ocean bottom for each edge center.

@xylar
Copy link
Collaborator Author

xylar commented Jun 18, 2020

@milenaveneziani, Are you converting all the Fortran indices to python by subtracting one? If so, you'd be getting -1 for the cell index for invalid cells, and that will just grab the last cell in the array (python uses negative indices for counting from the end of the array). So it will be an arbitrary value, and definitely not what you want.

@milenaveneziani
Copy link
Collaborator

no no, I am basing the mask on cell number = 0 (I use the -1 only when I am retrieving the fields and need to take into account python indexing).

@milenaveneziani
Copy link
Collaborator

Yep, that's something we can just borrow from the existing transect plotting.

indeed, I have already done the 'borrowing' in the other scripts that I have to do regional time series and T,S,density profiles :)

@cbegeman
Copy link
Collaborator

This is because the normal velocity points in different directions at different edges. That is, it's not normal to the smooth transect, it's normal to the zig-zag of edges. That's fine for computing total transport but not great for plotting velocity transects. You would probably want use the reconstructed zonal and meridional velocities at cell centers and compute the normal to the "real" transect.

An alternative to consider: If you want to have a velocity transect that more closely resembles the computed flux normal to the transect, I've been using normalVelocity and angleEdge as well as the angle the transect makes with angleEdge, which removes these stripes.

@xylar
Copy link
Collaborator Author

xylar commented Jun 19, 2020

@cbegeman, how does that work when you don't have the tangential component of the velocity at the edge?

@cbegeman
Copy link
Collaborator

cbegeman commented Jun 19, 2020

@cbegeman, how does that work when you don't have the tangential component of the velocity at the edge?

It doesn't, it only works for the normal component.

@xylar
Copy link
Collaborator Author

xylar commented Jun 19, 2020

So effectively you're taking the velocity component normal to an edge, multiplying it by the normal vector, and then dotting it into a vector normal to the transect, is that the correct idea? I don't think that is equivalent to the velocity normal to the transect so I would be careful about using that.

@cbegeman
Copy link
Collaborator

@xylar You're right. I do use the approach you originally suggested for plotting and that seems like the right approach for Milena. Sorry for the diversion!

@milenaveneziani
Copy link
Collaborator

I think perhaps, one way could be if we had, for each edge that makes up the MPAS-grid transect, an associated 'real' transect piece, which is part of a binned spherical distance between the two end points that define the transect, basically this bullet in the methodology outlined by @xylar a few months ago:

break transects into 10 km segments (so curvature is more or less negligible)

Then, we would simply rotate the normalVelocity with respect to that.

Of course the script I have now doesn't do the initial transect binning, so I have no way of knowing about the piecewise 'real' transect.

I think I will leave the plot as is now, at least it shows the actual model velocity across each edge. Maybe I'll consider some simple filtering, just for pleasing the eye, but I'll see.

One last thing that would be nice to add is a few potential density contours overlapping these fields.
I'll make these few changes later today, and call it done for the moment.

@xylar
Copy link
Collaborator Author

xylar commented Jun 19, 2020

Then, we would simply rotate the normalVelocity with respect to that.

The trouble is that we simply don't have both components of the velocity at edges. MPAS has a way of constructing the tangential component of velocity at edges, but it is only for the very specific purpose of computing the Coriolis force and I have it on good authority from @toddringler that this velocity should not be treated as appropriate for reconstructing a velocity vector at an edge. Instead, we use radial basis functions to reconstruct the velocity components (either 3D Cartesian or zonal/meridional) at cell centers and those are the only full vector velocities available from MPAS. It would be nice to have something else based more directly on normalVelocity but I do not believe that any such thing is available.

at least it shows the actual model velocity across each edge.

So I think that's where my approach and yours have slightly different purposes. I was never intending to follow model edges. Instead, I wanted to show model properties as if you were slicing through cells where scalar properties are constant within the polygonal prism of each cell. This could be done with velocity (dotted into the local normal to the transect) as well.

I want to think about how to do smoother plotting if that's what we desire. I am not a fan of contourf, so I don't necessarily intend to take that route but I will try to make sure whatever approach I take could mimic that result even if it's produced with pcolormesh. I'm thinking about trying an approach similar to what I did in the horizontal here: MPAS-Dev/MPAS-Tools#281 (comment). The principle is essentially the same.

@milenaveneziani
Copy link
Collaborator

I started writing this comment below, but then I saw yours and am not sure it's relevant any longer, but since I'm done writing it, here it goes :D:

We haven't talked about the transect mask part, but I guess it will depend on whether we'll want to adopt this methodology to identify the transect or not.
If that's what we want to do, maybe it would be nice to add some transects to the StandardTransect group, especially for the SO: maybe a few corresponding with the ones we use for SOSE and WOCE transects, plus a few more that would be more helpful for cryo papers, such as at the entrance of Ronne-Filchner, Ross, etc.
Other options are:

  • compute the transect mask on the fly
  • do not rely on the transect mask at all, which is what I assume Xylar's script does.

I am personally fine either way, I chose to go this route only because it was simpler for me :)

@cbegeman
Copy link
Collaborator

Do we have general script(s) to accept start and end coordinates (or similar) and return both:
(1) cell indices - I think this is what @xylar's transect script does, and seems most convenient for plotting transect properties for the reasons he mentioned
(2) edge indices suitable for computing transect-normal fluxes - I think this is what @milenaveneziani's script does

@xylar
Copy link
Collaborator Author

xylar commented Jun 19, 2020

@cbegeman, the MpasMaskCreator.x (mpas_tools.conversion.mask in python) does this but it produces a less-than-straight transect so I was not happy with it. This is what @milenaveneziani is using as a starting point.

@xylar
Copy link
Collaborator Author

xylar commented Jul 20, 2020

After working on this off and on for several weeks, I'm almost there, see: MPAS-Dev/MPAS-Tools#324. I believe I should have it all wrapped up in a few more days.

@xylar
Copy link
Collaborator Author

xylar commented Aug 3, 2020

2 weeks later, I have this almost wrapped up...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants