Skip to content

Commit

Permalink
Spell check + some minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
fraimondo committed Oct 23, 2024
1 parent 31c2d3a commit f9b3070
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 65 deletions.
40 changes: 0 additions & 40 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,45 +211,6 @@

# -- Sphinx-Gallery configuration --------------------------------------------


# class SubSectionTitleOrder:
# """Sort example gallery by title of subsection.

# Assumes README.txt exists for all subsections and uses the subsection with
# dashes, '---', as the adornment.
# """

# def __init__(self, src_dir):
# self.src_dir = src_dir
# self.regex = re.compile(r"^([\w ]+)\n-", re.MULTILINE)

# def __reduce__(self):
# return (self.__class__, (self.src_dir, ))

# def __repr__(self):
# return f"<{self.__class__.__name__}>"

# def __call__(self, directory):
# src_path = os.path.normpath(os.path.join(self.src_dir, directory))

# # Forces Release Highlights to the top
# if os.path.basename(src_path) == "release_highlights":
# return "0"

# readme = os.path.join(src_path, "README.txt")

# try:
# with open(readme) as f:
# content = f.read()
# except FileNotFoundError:
# return directory

# title_match = self.regex.search(content)
# if title_match is not None:
# return title_match.group(1)
# return directory


ex_dirs = [
"00_starting",
"01_model_comparison",
Expand All @@ -272,7 +233,6 @@
"examples_dirs": example_dirs,
"gallery_dirs": gallery_dirs,
"nested_sections": True,
# "subsection_order": SubSectionTitleOrder("../examples"),
"filename_pattern": "/(plot|run)_",
"download_all_examples": False,
}
Expand Down
60 changes: 35 additions & 25 deletions docs/selected_deeper_topics/joblib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Parallelizing julearn with Joblib
.. warning::
Make sure you are using the latest version of ``julearn``, as we are
actively developing and fine-tuning these packages to improve performance.
Older versions of julearn might have a huge computational impact when used
Older versions of ``julearn`` might have a huge computational impact when used
with joblib.

As with `scikit-learn`_, ``julearn`` allows you to parallelize your code using
Expand All @@ -28,7 +28,7 @@ training and testing of each fold are independent from the other folds.
Mostly all modern computers have multiple processors or *cores*, which allows
to run multiple tasks at the same time. If you are familiar, you might
have noticed that scikit-learn already has a parallelization mechanism using
the ``n_jobs`` parameter. Julearn is actually using scikit-learn, so it is
the ``n_jobs`` parameter. ``julearn`` is actually using scikit-learn, so it is
possible to use the ``n_jobs`` parameter. If you want to read more about how
scikit-learn parallelizes its code, you can check the section
:external+sklearn:ref:`parallelism` on scikit-learn's documentation.
Expand Down Expand Up @@ -63,7 +63,7 @@ with 4 processors.
Importantly, this sets the number of parallel processes that joblib dispatches
but does not set the number of threads (i.e. another low-level parallelisation
but does not set the number of threads (i.e. another low-level parallelization
mechanism) that each process uses. The ``"loky"`` backend is quite intelligent,
but can't always determine the optimal number of threads to use if you don't
want to use 100% of the resources. You can set the number of threads per
Expand All @@ -89,8 +89,8 @@ process uses only 1 thread, you can do the following:
problem_type="classification",
)
Massively parallelizing julearn with Joblib and HTcondor
--------------------------------------------------------
Massively parallelizing ``julearn`` with Joblib and HTcondor
------------------------------------------------------------

Sometimes even with multiple processors, the computation can take a long time.
As an example, assuming a model that takes 1 hour to fit, a 5 times 5-fold
Expand All @@ -108,7 +108,7 @@ hundreds of processors to obtain results within reasonable time spans.

.. csv-table::
:align: center
:header: "Total core-hours" , "Number of processors", "Time (approx)"
:header: "Total core-hours" , "Number of processors", "Time (approx.)"

"1250", "1", "52 days"
"1250", "4", "13 days"
Expand All @@ -125,7 +125,7 @@ allows joblib to submit each task as a job in an HTCondor queue, allowing to
massively parallelize computation.

By simply calling ``register_htcondor`` from the ``joblib_htcondor`` package
and configuring the backend, we can easily parallelise the computations as in
and configuring the backend, we can easily parallelize the computations as in
the following example:


Expand All @@ -134,7 +134,7 @@ the following example:
from joblib import parallel_config
from joblib_htcondor import register_htcondor
register_htcondor("INFO")
register_htcondor("INFO") # Set logging level to INFO
creator = PipelineCreator(problem_type="classification")
creator.add("zscore")
Expand Down Expand Up @@ -162,6 +162,15 @@ done in parallel. The ``request_cpus`` parameter specifies the number of CPUs
to request for each job, and the ``request_mem`` parameter specifies the
amount of memory to request.


.. note::
Note that the ``register_htcondor`` function sets the logging level to
``"INFO"``, which means that you will see information regarding the HTCondor
backend. If you want to see less information, you can set the logging level to
``"WARNING"``. If you believe that there might be an issue with the backend,
you can set the logging level to ``"DEBUG"`` to see more information that
can be later shared with the developers.

Nevertheless, as it is, it will submit as many jobs as outer folds in the
cross-validation, and it will rarely work for large projects as we need to take
into account other factors.
Expand Down Expand Up @@ -212,12 +221,12 @@ Pool
~~~~

As in any computational cluster, most probably you will be required to submit
a job to a queue, which will then run the `run_cross_validation` function that
will then submit more jobs to the queue. This is not a problem, but it needs to
be possible to submit jobs from within a job. Check with your cluster's admin
team and ask for furtther instructions. Most probably you'll need to also
specify to which `pool` the jobs will be submitted. This can be done with the
``pool`` parameter. For us, this is ``head2.htc.inm7.de``:
a job to a queue, which will then run the :func:`run_cross_validation`
function that will then submit more jobs to the queue. This is not a problem,
but it needs to be possible to submit jobs from within a job. Check with your
cluster's admin team and ask for further instructions. Most probably you'll
also need to specify to which `pool` the jobs will be submitted. This can be
done with the ``pool`` parameter. For us, this is ``head2.htc.inm7.de``:


.. code-block:: python
Expand Down Expand Up @@ -311,7 +320,7 @@ by setting the ``max_recursion_level`` parameter to ``1``:
detail.


But beware!, this will submit 254 (5 times 49) jobs for each outer fold. For a
But beware! This will submit 245 (5 times 49) jobs for each outer fold. For a
5 times 5-fold CV, this means 6125 jobs. This can be a lot of jobs, but not
for HTCondor. It is though an issue with the data transfer. If each job requires
500 MB of data, this means 3.1 TB of data transfer, which requires 3.1 TB of
Expand All @@ -325,7 +334,7 @@ Indeed, even if we queue all of the previous 6125 jobs at once, we are also
limited by the number of slots in the cluster. We can *throttle* the number of
jobs that are submitted at once by setting the ``throttle`` parameter. This
parameter specifies the number of jobs that can be either running or queued at
the same time, thus also liminting the number of files in the shared directory.
the same time, thus also limiting the number of files in the shared directory.

`joblib-htcondor`_ will submit jobs until the throttle is reached, and then it
will wait until a job finishes to submit a new one. The complicated part is
Expand Down Expand Up @@ -384,14 +393,15 @@ The overhead of submitting jobs to a cluster is not negligible. Each job
submission requires some time, and the data transfer also requires some time.
This overhead can be quite significant, and it is important to take it into
account when parallelizing your code. The `joblib-htcondor`_ backend is
intended to be used for large computations, where the overhead is negligible.
intended to be used for large computations, where the overhead is negligible
with respect to the computation.

Scikit-learn parallelization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The rule is quite simple, if it has an ``n_jobs`` parameter, it can be
parallelized using joblib. This is the case for most of the scikit-learn's
algorithms. While the devevelopers of scikit-learn are doing a great job and
algorithms. While the developers of scikit-learn are doing a great job and
currently working on documenting how this is done for each algorithm, this is
still not that evident.

Expand All @@ -408,7 +418,7 @@ relatively small jobs which will take a lot of time to complete, given the
overhead.


The following is a non-exahustive list of scikit-learn's algorithms that might
The following is a non-exhaustive list of scikit-learn's algorithms that might
make sense to set the ``n_jobs`` parameter to ``-1`` or leave as default:

* :external:class:`~sklearn.ensemble.StackingClassifier` and
Expand All @@ -423,12 +433,12 @@ make sense to set the ``n_jobs`` parameter to ``-1`` or leave as default:

Similar to the stacking models.

* Hyperparameter searchers:

Most of the hyperparameter searchers in scikit-learn will parallelize the
search for the best hyperparameters. Either at the internal CV level or at
both the hyperparameter search space and the internal CV level, depending
on the searcher.
* Hyperparameter searchers
Most of the hyperparameter searchers in scikit-learn will parallelize the
search for the best hyperparameters. Either at the internal CV level or at
both the hyperparameter search space and the internal CV level, depending
on the searcher.


Visualizing Progress
Expand Down

0 comments on commit f9b3070

Please sign in to comment.