Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Req/Progress: Interpolation within FVCOM mesh triangles #7

Open
adamkemberling opened this issue Jul 26, 2024 · 21 comments
Open

Comments

@adamkemberling
Copy link
Collaborator

Hi all!

I met with some of the Chen lab postdocs last week briefly and am now working on implementing some workflows in R based on what we chatted about. They're primarily a matlab group so I have some matlab code that I'd like to implement in R/python.

You may have already explored this functionality, but I am looking to add the ability to estimate/interpolate point values within triangles based on distance weights to the correct nodes. Ideally in a vectorized way.

They use this approach to reproject data from the newer meshes to the older GOM3 meshes, when looking to create a cohesive full timeline. It also could be applied for point extractions, or for re-projecting to a regular grid given equally spaced coordinates.

So far I am leveraging get_mesh_geometry() and st_join() to handle the triangle-to-point matching, and I'm working on calculating barycentric coordinates for the relevant nodes.

In my head the triangle matching and weights could/should probably be separate from pulling the proper nc variables at the correct time steps since the mesh coordinates don't change over time. Then maybe a second function that does the date/time node lookups and applies the weighted interpolation.

You may have already done this, so I wanted to flag it for potential collaboration/team problem-solving.

Cheers!

@adamkemberling adamkemberling changed the title Feature Req/Progress: Feature Req/Progress: Distance-weighted interpolation within triangles Jul 26, 2024
@adamkemberling adamkemberling changed the title Feature Req/Progress: Distance-weighted interpolation within triangles Feature Req/Progress: Barycentric interpolation within FVCOM mesh triangles Jul 26, 2024
@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Jul 26, 2024

This is the general what I was getting at. Need to write it up into generalized functions, but this was the goal:
image

@btupper
Copy link
Member

btupper commented Jul 30, 2024

Perhaps a dumb question, but why not rasterize onto a regular grid?

@adamkemberling
Copy link
Collaborator Author

Not a dumb question.

I/We keep getting dragged into somewhat unproductive discussions around what resolution to rasterize it to, and how the regridding to raster should be performed. We have a project working at both nearshore and offshore scales and there are some tradeoffs between using a higher resolution nearshore to take advantage of the higher node densities there that might poorly represent the sparser coverage offshore. That and file size or computation speeds if we make a 4km resolution product for the whole shelf.

Then there was a second roundabout discussion on which method to use for interpolating between the points to: IDW, Kriging, what data to use to fit those, is that introducing biases that are inconsistent with the FVCOM data itself? etc.

I asked the Chen lab what methods they recommend/use when regridding and they responded by asking me why I would even want to do that. They shared that they use linear interpolation methods. And that they would interpolate GOM3 node locations for time periods covered by newer FVCOM versions to create a consistent timeseries: https://github.com/SiqiLiOcean/matFVCOM/blob/main/interp_2d_calc_weight.m

So I embarked on this journey to recreate what they use and similar triangle-based methods to avoid overthinking and needing to defend complex interpolation solutions. The interpolation from nearest nodes seems most true to the model outputs (doesn't prescribe some spatial covariance structure) and should scale to many points well. Though, I don't expect it to give meaningfully different results than rasterizing to a thoughtful resolution based on some project needs.

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Jul 31, 2024

Point of clarification*

The primary utility of this or other interpolation methods would be to extract a value or values that do not fall directly on the mesh. This can be extended to recreate a regularly spaced grid, as an alternative method to rasterizing, but primarily it would be for getting point values directly. This approach would avoid a need to first either fill that polygon or rasterize first.

@btupper
Copy link
Member

btupper commented Jul 31, 2024

So, using sf we can determine which polygon a point falls into, and then what? Use something like this? How do we go from cartesian to barycentric coordinates? If we can do that, then the interpolation looks easy-ish.

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Jul 31, 2024

Exactly, I have some messy code for doing either linear interpolation or barycentric that I need to clean up. The lead into either is the same use of sf::st_join(). The weights are method dependent and could be function options (linear, IDW, barycentric).

Then when applying the interpolation you could idealy select the relevant file or timestep and variable(s) from the nc connection, to get the interpolated values:

These names are a mess, sorry:

# Function to add the linear interpolation weights based on node coordinates
triangulation_linear_weighting <- function(pts_sf, fvcom_mesh){

    # Identify the triangles that overlap each point:
    # Use st_join to assign elem, p1, p2, p3 IDs to the points
    pts_assigned <- st_join(
      st_transform(pts_sf, st_crs(fvcom_mesh)),
      gom3_mesh, 
      join = st_within) %>% 
      drop_na(elem)
  

    # Iterate over the rows to add weights:
    pts_weighted <- pts_assigned %>% 
     base::split(., seq_len(nrow(.))) %>%
     purrr::map_dfr(function(pt_assigned){
    
      # Subset the relevant triangle from st_join info
      triangle_match <- fvcom_mesh[pt_assigned$elem,]
      
      # Build matrices for point to interpolate & of surrounding points:
    
      # Matrix for triangle
      # Use the triangles node coordinates from the sf geometries
      # Creates 3x3 matrix: row1 x coords, row 2, y coords, row three rep(1,3)
      node_vertices <- t(st_coordinates(triangle_match[1,])[1:3,1:3])
      
      # Make matrix from the points:
      # creates 3x1 matrix: x, y, 1
      point_coords <- matrix(
        c(st_coordinates(pt_assigned[1,]), 1), 
        nrow = 3)
      
      #### For Linear Interpolation:
      
      # Get inverse of the matrix
      inverse_coordmat <- solve(node_vertices)
      
      # Solve for the weights
      node_wts <- inverse_coordmat %*% point_coords %>%
        t() %>% 
        as.data.frame() %>% 
        setNames(c("p1_wt", "p2_wt", "p3_wt"))
      
      # Return with dataframe
      bind_cols(pt_assigned, node_wts)
    
    
    })
    # End Rowwise
    return(pts_weighted)
}


# Run for all points:
pts_weighted <- triangulation_linear_weighting(
  pts_sf = pts_df, 
  fvcom_mesh = gom3_mesh)

Then for interpolation:

####  Apply Interpolation Step  ####
interpolate_from_weights <- function(df_weighted, fvcom_nc, fvcom_varid, fvcom_siglev = 1, var_out = "interp_val"){
  
  # Get the values of the variable of interest as vector
  node_vals <- ncvar_get(
    nc = fvcom_nc, 
    varid = fvcom_varid)[,fvcom_siglev]
    
  df_weighted %>% 
    mutate(
      {{var_out}} := node_vals[p1] * p1_wt + node_vals[p2] * p2_wt + node_vals[p3] * p3_wt
  )
 
}



# Run the interpolation
pts_interp <- interpolate_from_weights(
  df_weighted = pts_weighted, 
  fvcom_nc = gom3_early, 
  fvcom_varid = "temp", 
  fvcom_siglev = 1,
  var_out = "surf_temp")

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Jul 31, 2024

For barycentric I was using geometry::cart2bary(). The input matrices are a little different than above, but can be constructed from the same inputs:

# For cart2bary you need this information as a "simplex", coordinates as a matrix basically
triangle_coord_test <- triangle_matches[1,] %>% st_coordinates()
triangle_coord_test <- triangle_coord_test[c(1:3), c(1:2)]

# Make the simplex for triangle, and another matrix for the the point coords
simplex_test <- as.matrix(triangle_coord_test)
point_test <- pts_assigned[1, c("lon", "lat")] %>% st_drop_geometry() %>% as.matrix()

# Get barycentric coordinates, label them
node_numbers <- triangle_matches[1,c("p1", "p2", "p3")] %>% st_drop_geometry()
as.data.frame(cart2bary(
  X = simplex_test,
  P = point_test)) %>% 
  setNames(node_numbers)

My function for this is a needlessly confusing mess ATM

@btupper
Copy link
Member

btupper commented Jul 31, 2024

Oh, nice!

Let's build on what you have and consider adding a new file in the R directory with perhaps the following functions.

cart_to_bary = function(cart, mesh)

bary_to_cart = function(bary)

interpolate_mesh = function(cart, mesh, method = "", return_form = c("bary", "cart")[2])

The casual user would only probably call interpolate_mesh() which by default returns the interpolate in cartesian coords. What do you think of that?

Do you have a fork of the repos?

@adamkemberling adamkemberling changed the title Feature Req/Progress: Barycentric interpolation within FVCOM mesh triangles Feature Req/Progress: Interpolation within FVCOM mesh triangles Aug 1, 2024
@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Oct 16, 2024

I'm trying to sort out the best way to code up how to handle matching time steps. And I'm not sure how to make it general enough to handle however way different people may have downloaded the FVCOM data.

To use an example. The dataset I acquired for surface and bottom temperature from the Chen lab does not contain the siglev coordinate. This removes the need to provide a start or count argument with ncdf4::ncvar_get().

It also is organized into annual files, which is convenient for me, but not always going to be the case as size constraints may force people to store/access monthly files.

Here is the function I'm using for date-wise interpolation, just looping over yearly files:

#' Interpolate Values from FVCOM Mesh at Timestep
#' 
#' @description Takes a dataframe row containing node and time indices and interpolation weights and returns that contains time index from which to interpolate.
#' 
#' Should contain the following columns:
#' time_idx = integer timestep to use for interpolation
#' p1 = integer index value for node 1 surrounding the interpolation location
#' p2 = integer index value for node 2 surrounding the interpolation location
#' p3 = integer index value for node 3 surrounding the interpolation location
#' p1_wt = linear interpolation weight for p1
#' p2_wt = linear interpolation weight for p1
#' p3_wt = linear interpolation weight for p1
#'
#' @param dated_points_weighted dataframe row containing time index, node index, and node weight information columns
#' @param fvcom_nc FVCOM netcdf file to extract values from
#' @param fvcom_varid = String indicating variable in FVCOM netcdf to interpolate values with
#' @param var_out String to use as variable name in returned dataframe
#'
#' @return
#' @export
#'
#' @examples
interpolate_at_timestep <- function(dated_points_weighted, fvcom_nc, fvcom_varid, var_out){
        
    # Get the values of the variable of interest as vector
    node_vals <- ncvar_get(
      nc = fvcom_nc, 
      varid = fvcom_varid, 
      start = c(1, dated_points_weighted[["time_idx"]]),
      count = c(-1, 1))
      
  # Interpolate using the node numbers and weights
  dated_interpolation <- dated_points_weighted %>% 
    mutate(
      {{var_out}} := node_vals[p1] * p1_wt + node_vals[p2] * p2_wt + node_vals[p3] * p3_wt)
  
  return(dated_interpolation)
}

The linear interpolation method is hard-coded in.

The full workflow is:

  1. Overlay the locations onto the mesh to identify what nodes are relevant, set the weighting for interpolation methods
  2. Get the dates and time indices for the FVCOM collection. Join these into the point location data frame using date
  3. Loop over years/fvcom_files, perform time index matched interpolation for each point as a within-year loop

@btupper
Copy link
Member

btupper commented Oct 17, 2024

So, node_vals could be any value, but presumably fvcom_varid points to a time variable so you are getting time back?

For each row of dated_points_weighted are p1, p2 and p3 all from the same sigma layer and time step?

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Oct 17, 2024

Almost.

node_vals (again, just picking names on the fly) could be any variable* ex. temp, salinity, etc., a string that indicates what values to pull and interpolate.

I'm using the time index here: start = c(1, dated_points_weighted[["time_idx"]]), which I get in an earlier step, externally to this function. This ensures values are coming from the right time step to match the datetime a given trawl survey sample happened as an example use case. If there was siglev in these files it would come into play here as well.

node_vals is a vector of fvcom_varid values for all lat/lon pair nodes, at the timestep indicated at: dated_points_weighted[["time_idx"]].

p1, p2, & p3 are the three FVCOM nodes that surround the point I want to interpolate at. Their values within dated_points_weighted are the lat/lon pair index numbers in the Netcdf file. Once I have node_vals I just index out these three values and apply the weights to interpolate.

@btupper
Copy link
Member

btupper commented Oct 17, 2024

So, I think your approach looks right.

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Oct 17, 2024

The question I'm asking myself. Is whether to keep these staging steps this way to work through many points efficiently. Or package all the steps into some "interpolate_at_date_and_place" function that takes lat/lon/time and does the rest. Potentially very inefficiently for many points, but more hands-off.

@btupper
Copy link
Member

btupper commented Oct 17, 2024

Well, both!

interpolate_at_date_and_place() can be that top level function, and wouldn't it call your various speedster functions as needed?

@adamkemberling
Copy link
Collaborator Author

The broken bit is really the time index part. Kind of need to know ahead of time whether a date is in the NetCDF file to even bother.

@btupper
Copy link
Member

btupper commented Oct 17, 2024

Yeah - that is a problem

@btupper
Copy link
Member

btupper commented Oct 18, 2024

Would it be helpful to set up a time when we could sit down and work our way through this use case? I could come down and we could hack it out if that would be helpful (it would be for me.)

@adamkemberling
Copy link
Collaborator Author

Yes, let's do that. I could do something this Friday at the earliest or sometime next week before Thursday.

@btupper
Copy link
Member

btupper commented Oct 24, 2024

How about Tuesday morning at 9?

@adamkemberling
Copy link
Collaborator Author

adamkemberling commented Oct 29, 2024

I gave this a thumbs up but it's actually not a great window of time now that it's here. I could do a quick check in if you wanted to start.

@btupper
Copy link
Member

btupper commented Oct 29, 2024

Yeah - let's take this to email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants