Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with timbertrek.transform_trie_to_rules #4

Closed
sarah-huestis opened this issue Nov 15, 2022 · 5 comments
Closed

Issue with timbertrek.transform_trie_to_rules #4

sarah-huestis opened this issue Nov 15, 2022 · 5 comments

Comments

@sarah-huestis
Copy link

Trying to convert a trie calculated by the treeFARMS package into a rules JSON for timbertrek using code suggested in another ticket (#2), but getting an error. The trie has 80739 trees according to the message.

Code:

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from treefarms.model.threshold_guess import compute_thresholds, cut
from treefarms import TREEFARMS
from treefarms.model.model_set import ModelSetContainer
import timbertrek

X = data
# h = list(X.columns)
for met in metric_cols:
    y = df[met]
    config = {
    "regularization": 0.02,  # regularization penalizes the tree with more leaves. We recommend to set it to relative high value to find a sparse tree.
    "rashomon_bound_multiplier": 0.25, "depth_budget": 0}  # rashomon bound multiplier indicates how large of a Rashomon set would you like to get}
    model = TREEFARMS(config)
    print('configed')
    model.fit(X, y)
    print('fitted')
    # Get the rashomon in a trie structure
    trie = model.model_set.to_trie()
    print('trie-d')
    df = model.dataset
    # Convert the trie to decision paths
    feature_names = df.columns
    decision_paths = timbertrek.transform_trie_to_rules(trie,df,feature_names=feature_names)
    # Save the decision paths in a JSON file
    dump(decision_paths, open('tree_for_'+str(met)+'.json', 'w'))

Error:

IndexError Traceback (most recent call last)
/tmp/ipykernel_16824/365095990.py in
26 trie,
27 df,
---> 28 feature_names=feature_names,
29 )
30 # Save the decision paths in a JSON file

/opt/conda/lib/python3.7/site-packages/timbertrek/timbertrek.py in transform_trie_to_rules(trie, data_df, feature_names, feature_description)
683 # Construct trees
684 decision_rule_hierarchy, tree_map = get_decision_rule_hierarchy_dict(
--> 685 trie, keep_position=False
686 )
687 new_tree_map = get_tree_map_hierarchy(tree_map)

/opt/conda/lib/python3.7/site-packages/timbertrek/timbertrek.py in get_decision_rule_hierarchy_dict(trie, keep_position)
483 for i in tree_map["map"]:
484 cur_string = tree_map["map"][i][0]
--> 485 all_rules = get_decision_rules(cur_string)
486
487 # Iterate the set and build the hierarchy dict

/opt/conda/lib/python3.7/site-packages/timbertrek/timbertrek.py in get_decision_rules(tree_strings)
238 cur_feature, pre_features = working_queue.popleft()
239
--> 240 cur_string = tree_strings[i]
241 cur_string_split = cur_string.split()
242

IndexError: list index out of range

Including screenshots.
image
image
image

@xiaohk
Copy link
Member

xiaohk commented Nov 15, 2022

Hi @sarah-huestis, hmm, it is a bit unusual to have -9 and -3 in the trie strings. Is it possible to save trie dictionary as a JSON file and share the file with us (e.g., post it as a private gist)?

@sarah-huestis
Copy link
Author

Thanks for the quick response! File here: https://gist.github.com/sarah-huestis/74627cae7cd6ef9f036d5b2d0dfbc1aa

@xiaohk
Copy link
Member

xiaohk commented Nov 15, 2022

It seems -9 and -3 comes from here (thanks for the help @zbw8388!) :

https://github.com/ubc-systopia/treeFarms/blob/d6d1363d1d4a4cf4bc768c98a481b6b8f484a153/treefarms/model/model_set.py#L31-L47

TimberTrek only supports binary prediction, is your model a multi-class classifier with labels 2 and 8. Maybe try to make sure the labels are either 0 or 1?

@sarah-huestis
Copy link
Author

Oh I wasn't aware it didn't support multi-class classification! Thanks for checking it out!

@xiaohk
Copy link
Member

xiaohk commented Nov 16, 2022

Yup. Supporting multi-class is actually not too difficult—I will look into it when I have more bandwidth. A note to myself or people who want to contribute:

  1. Support multi-class in trie parsing
  2. Compute multi-class accuracies during trie parsing
  3. Change the +/- signs to class labels in the visualization

@xiaohk xiaohk closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants