Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pipenv installation #36

Merged
merged 2 commits into from
Dec 15, 2023
Merged

Fix pipenv installation #36

merged 2 commits into from
Dec 15, 2023

Conversation

gillens
Copy link
Contributor

@gillens gillens commented Dec 1, 2023

Adds scikit-learn as a dependency and requires Python 3.10.

Scikit-learn

Scikit-learn is used by the file resources/model.joblib, so I got this error when I first ran the project after running pipenv install and then pipenv shell:

scikit-learn include error
$ python psplot.py
App is running on QT version 5.15.2
Traceback (most recent call last):
  File "/home/sean/cs/PSplot/psplot.py", line 663, in <module>
    main()
  File "/home/sean/cs/PSplot/psplot.py", line 657, in main
    window = PsPlot()
  File "/home/sean/cs/PSplot/psplot.py", line 59, in __init__
    self._setup_variables()
  File "/home/sean/cs/PSplot/psplot.py", line 78, in _setup_variables
    self.clf = joblib.load("./resources/model.joblib")
  File "/home/sean/.local/share/virtualenvs/PSplot-ZEt-0E32/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/sean/.local/share/virtualenvs/PSplot-ZEt-0E32/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
    self.append(self.find_class(module, name))
  File "/usr/lib/python3.10/pickle.py", line 1580, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn'

I tried installing the latest scikit-learn, but then got a warning about the RandomForestClassifier in the joblib file using version 1.0.2 and then psplot crashed again:

scikit-learn version error
$ python psplot.py
App is running on QT version 5.15.2
/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/sklearn/base.py:348: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.2 when using version 1.3.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/home/sean/PSplot/psplot.py", line 663, in <module>
    main()
  File "/home/sean/PSplot/psplot.py", line 657, in main
    window = PsPlot()
  File "/home/sean/PSplot/psplot.py", line 59, in __init__
    self._setup_variables()
  File "/home/sean/PSplot/psplot.py", line 78, in _setup_variables
    self.clf = joblib.load("./resources/model.joblib")
  File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.10/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/home/sean/.local/share/virtualenvs/PSplot-jf6YzhwB/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 402, in load_build
    Unpickler.load_build(self)
  File "/usr/lib/python3.10/pickle.py", line 1718, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1434, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

I set the Pipfile version to 1.0.2, and it works. This old scikit-learn is using a deprecated feature though so it generates warnings. At some point I can update the model file to use a recent version, but I'm not yet sure how to reproduce the classifier.

Use Python 3.10, not 3.8

Since 3.8 is specified in the Pipfile, pipenv install will pause for a while looking for a 3.8 install on the disk. If it is installed or the user tries to install it, running the project does not work because it uses dictionary merging with |, introduced in 3.9. Therefore just setting version to 3.10 matching the other instructions.

Pipfile.lock

Probably worth committing the Pipfile.lock file, for more deterministic installation, also should make it a bit quicker. It keeps track of the package versions, so users know they are using the same PSplot as the developers. Recommended by the pipenv docs:

Keep both Pipfile and Pipfile.lock in version control.

I can add that change here if others agree. I know the desire is to package PSplot so users don't have to use pipenv at all (#13), but this should help move toward that.

Pipfile does not have scikit-learn, leading to import error when
running:
> python psplot.py
Tried installing latest scikit-learn, but resulted in errors about
resources/model.joblib dump file using 1.0.2. Therefore setting
version in Pipfile to that.
Confirmed that Python >=3.9 needed as I had a python 3.8 install
where settings.py failed on merging dictionaries using the "|"
operator. Setting Pipfile version to 3.10 and README version to
3.10 for consistency (docs.plasticscanner.com already states it).
@gillens gillens marked this pull request as ready for review December 1, 2023 19:53
@Jerzeek Jerzeek merged commit 7209c1a into Plastic-Scanner:main Dec 15, 2023
@gillens gillens deleted the fix-pipenv branch December 15, 2023 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants