Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize decision tree implementation? #20

Open
7yl4r opened this issue Oct 4, 2019 · 2 comments
Open

generalize decision tree implementation? #20

7yl4r opened this issue Oct 4, 2019 · 2 comments

Comments

@7yl4r
Copy link
Member

7yl4r commented Oct 4, 2019

Much of the code here implements a decision tree as a large nested list of if/else conditions.
This is both inefficient and nasty to look at.

The implementation of a decision tree like this could be generalized to take a tree-like data structure as input and compute the resulting classification raster.
I suspect a library to do this already exists in python and it would probably run much more quickly than our if/else nest.
As a bonus we could probably output pretty visualizations of the tree using the same data structure.

Additionally: Helen mentioned on the ICEBERG all hands call today that she is looking for a way to accomplish this same thing (ie "export a raster or vector from a set of rrs > or < parameters") in python instead of ArcGIS.

@7yl4r
Copy link
Member Author

7yl4r commented Oct 22, 2019

I just wrote up some simple implementations in python: https://gist.github.com/7yl4r/1ccdafb1103d784e526379f85b08ee13

One thing I don't like here is the need to encode the node evaluation order (n).
There are a number of ways to do this; the real question I have is : how the heck can one of these be implemented in matlab?

@7yl4r 7yl4r added the question label Dec 9, 2019
@7yl4r
Copy link
Member Author

7yl4r commented Dec 9, 2019

@mjm8 : I'd like to get away from having the two concurrent python & matlab versions but understand making the switch can be painful.
Maybe instead of trying to share some code structure between matlab and python we could focus only on an abstraction of the decision tree diagram?
By this I mean that we can write the python in a way that may be easier for you to get started with.

I think a start on this based on this file would look like:

root = Node("root")
mud_dev_sand = Node(
    "mud_dev_sand", parent=root, 
    fn="(Rrs(j,k,7) - Rrs(j,k,2))/(Rrs(j,k,7) + Rrs(j,k,2)) < 0.60 && Rrs(j,k,5) > Rrs(j,k,4) && Rrs(j,k,4) > Rrs(j,k,3)"
)
shadow = Node(
    "shadow", parent=mud_dev_sand, n=1,
    fn="Rrs(j,k,7) < Rrs(j,k,2) && Rrs(j,k,8) > Rrs(j,k,5)"
)
building_or_sand = Node(
    "building_or_sand", parent=shadow, n=2
    fn="Rrs(j,k,8) - Rrs(j,k,5))/(Rrs(j,k,8) + Rrs(j,k,5)) < 0.01 && Rrs(j,k,8) > 0.05"
)
# TODO: more here
not_mud_dev_sand = Node("not_mud_dev_sand", parent=root, fn="else")

If you are able to modify this decision tree like this then I can make the tree run efficiently in python.
My hope is that this might also make it easier for you to modify the tree if we write the python in this way. We could clean this up and work with something like:

root = Node("root")
mud_dev_sand = Node(
    "mud_dev_sand", parent=root, 
    fn="(b7 - b2)/(b7 + b2) < 0.60 && b5 > b4 && b4 > b3"
)
shadow = Node(
    "shadow", parent=mud_dev_sand, n=1
    fn="b7 < b2 && b8 > b5"
)
building_or_sand = Node(
    "building_or_sand", parent=shadow, n=2
    fn="(b8 - b5)/(b8 + b5) < 0.01 && b8 > 0.05",
)
# TODO: more here
not_mud_dev_sand = Node("not_mud_dev_sand", parent=root, fn="else")

From here I could output nice diagrams and possibly other helpful analyses on the tree.
Does this code make sense to you?
How do you think we should move forward?

@7yl4r 7yl4r removed their assignment Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant