Skip to content

Commit

Permalink
Merge branch 'NannyML:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Duncan-Hunter authored Sep 6, 2024
2 parents 63b3a4d + 0a35cf2 commit 8ad2cfd
Show file tree
Hide file tree
Showing 8 changed files with 141 additions and 36 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,21 @@ figure.show()

We want to build NannyML together with the community! The easiest to contribute at the moment is to propose new features or log bugs under [issues](https://github.com/NannyML/nannyml/issues). For more information, have a look at [how to contribute](CONTRIBUTING.rst).

Thanks to all of our contributors!

[<img alt="CoffiDev" src="https://avatars.githubusercontent.com/u/6456756?v=4&s=117" width="117">](https://github.com/CoffiDev)[<img alt="smetam" src="https://avatars.githubusercontent.com/u/17511767?v=4&s=117" width="117">](https://github.com/smetam)[<img alt="amrit110" src="https://avatars.githubusercontent.com/u/8986523?v=4&s=117" width="117">](https://github.com/amrit110)[<img alt="bgalvao" src="https://avatars.githubusercontent.com/u/17158288?v=4&s=117" width="117">](https://github.com/bgalvao)[<img alt="SoyGema" src="https://avatars.githubusercontent.com/u/24204714?v=4&s=117" width="117">](https://github.com/SoyGema)

[<img alt="sebasmos" src="https://avatars.githubusercontent.com/u/31293221?v=4&s=117" width="117">](https://github.com/sebasmos)[<img alt="shezadkhan137" src="https://avatars.githubusercontent.com/u/1761188?v=4&s=117" width="117">](https://github.com/shezadkhan137)[<img alt="highstepper" src="https://avatars.githubusercontent.com/u/22987068?v=4&s=117" width="117">](https://github.com/highstepper)[<img alt="WojtekNML" src="https://avatars.githubusercontent.com/u/100422459?v=4&s=117" width="117">](https://github.com/WojtekNML)[<img alt="YYYasin19" src="https://avatars.githubusercontent.com/u/26421646?v=4&s=117" width="117">](https://github.com/YYYasin19)

[<img alt="giodavoli" src="https://avatars.githubusercontent.com/u/79570860?v=4&s=117" width="117">](https://github.com/giodavoli)[<img alt="mireiar" src="https://avatars.githubusercontent.com/u/105557052?v=4&s=117" width="117">](https://github.com/mireiar)[<img alt="baskervilski" src="https://avatars.githubusercontent.com/u/7703701?v=4&s=117" width="117">](https://github.com/baskervilski)[<img alt="rfrenoy" src="https://avatars.githubusercontent.com/u/12834432?v=4&s=117" width="117">](https://github.com/rfrenoy)[<img alt="jrggementiza" src="https://avatars.githubusercontent.com/u/30363148?v=4&s=117" width="117">](https://github.com/jrggementiza)

[<img alt="PieDude12" src="https://avatars.githubusercontent.com/u/86422883?v=4&s=117" width="117">](https://github.com/PieDude12)[<img alt="hakimelakhrass" src="https://avatars.githubusercontent.com/u/100148105?v=4&s=117" width="117">](https://github.com/hakimelakhrass)[<img alt="maciejbalawejder" src="https://avatars.githubusercontent.com/u/47450700?v=4&s=117" width="117">](https://github.com/maciejbalawejder)[<img alt="dependabot[bot]" src="https://avatars.githubusercontent.com/in/29110?v=4&s=117" width="117">](https://github.com/apps/dependabot)[<img alt="Dbhasin1" src="https://avatars.githubusercontent.com/u/56479884?v=4&s=117" width="117">](https://github.com/Dbhasin1)

[<img alt="alexnanny" src="https://avatars.githubusercontent.com/u/124191512?v=4&s=117" width="117">](https://github.com/alexnanny)[<img alt="santiviquez" src="https://avatars.githubusercontent.com/u/10890881?v=4&s=117" width="117">](https://github.com/santiviquez)[<img alt="cartgr" src="https://avatars.githubusercontent.com/u/86645043?v=4&s=117" width="117">](https://github.com/cartgr)[<img alt="BobbuAbadeer" src="https://avatars.githubusercontent.com/u/94649276?v=4&s=117" width="117">](https://github.com/BobbuAbadeer)[<img alt="jnesfield" src="https://avatars.githubusercontent.com/u/23704688?v=4&s=117" width="117">](https://github.com/jnesfield)

[<img alt="NeoKish" src="https://avatars.githubusercontent.com/u/66986430?v=4&s=117" width="117">](https://github.com/NeoKish)[<img alt="michael-nml" src="https://avatars.githubusercontent.com/u/124588413?v=4&s=117" width="117">](https://github.com/michael-nml)[<img alt="jakubnml" src="https://avatars.githubusercontent.com/u/100147443?v=4&s=117" width="117">](https://github.com/jakubnml)[<img alt="nikml" src="https://avatars.githubusercontent.com/u/89025229?v=4&s=117" width="117">](https://github.com/nikml)[<img alt="nnansters" src="https://avatars.githubusercontent.com/u/94110348?v=4&s=117" width="117">](https://github.com/nnansters)


# 🙋 Get help

The best place to ask for help is in the [community slack](https://join.slack.com/t/nannymlbeta/shared_invite/zt-16fvpeddz-HAvTsjNEyC9CE6JXbiM7BQ). Feel free to join and ask questions or raise issues. Someone will definitely respond to you.
Expand Down
7 changes: 7 additions & 0 deletions docs/how_it_works/business_value.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ observations in that cell of the :term:`confusion matrix<Confusion Matrix>`. Usi
matrix notation the element on the i-th row and j-column of the business value matrix tells us the value
of the i-th target when we have predicted the j-th value.

.. note::
In Multiclass classification the classes are ordered alphanumerically.
This is used in the creation of the confusion matrix. The rows of the confusion matrix
represent target values in the corresponding alphanumerical order. And the columns
of the confusion matrix represent predicted classes in the same alphanumerical order.
Therefore the elements of the business value matrix should be constructed accordingly.

For binary classification this formula is easier to manage hence we will use it as an example. Classificatio problems
with more classes follow the same pattern.
Using the `sklearn confusion matrix convention`_ we designate label 0 as negative and label 1 as positive.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,10 @@ the following parameter specifications:
The format of the business value matrix must be specified so that each element represents the business
value of it's respective confusion matrix element. Hence the element on the i-th row and j-column of the
business value matrix tells us the value of the i-th target when we have predicted the j-th value.
It can be provided as a list of lists or a numpy array.
The target values that each column and row refer are sorted alphanumerically for both
the confusion matrix and the business value matrices.

The business value matrix can be provided as a list of lists or a numpy array.
For more information about the business value matrix,
check out the :ref:`Business Value "How it Works" page<business-value-deep-dive>`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,10 @@ parameters:
The format of the business value matrix must be specified so that each element represents the business
value of it's respective confusion matrix element. Hence the element on the i-th row and j-column of the
business value matrix tells us the value of the i-th target when we have predicted the j-th value.
It can be provided as a list of lists or a numpy array.
The target values that each column and row refer are sorted alphanumerically for both
the confusion matrix and the business value matrices.

The business value matrix can be provided as a list of lists or a numpy array.
For more information about the business value matrix,
check out the :ref:`Business Value "How it Works" page<business-value-deep-dive>`.

Expand Down
6 changes: 3 additions & 3 deletions docs/usage_logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ What about personal data
Apart from the hardware ID, there is nothing to link back to your machine, let alone to your identity.
You have our word on this: we will never collect any Personally Identifiable Information.
And don't just take our word: verify it! We invite you to review the implementation at
https://github.com/NannyML/nannyml/blob/feature/usage_logging/nannyml/usage_logging.py.
https://github.com/NannyML/nannyml/blob/main/nannyml/usage_logging.py.

What about my dataset?
######################
Expand Down Expand Up @@ -113,7 +113,7 @@ How usage logging works
We'll give a very brief overview of how we've implemented usage analytics.

1. We've created a `usage_logging` module within the library. It contains all the functionality related to usage analytics.
Feel free to browse the source code at https://github.com/NannyML/nannyml/blob/feature/usage_logging/nannyml/usage_logging.py.
Feel free to browse the source code at https://github.com/NannyML/nannyml/blob/main/nannyml/usage_logging.py.
2. We instrument our library by adding a `log_usage` decorator to our key functions, sometimes also providing some additional data (e.g. metric names).
3. Upon calling one of these key functions, the decorator will capture the required information. Our `usage_logging` module will then try to send it over
to **Segment**, a third-party service provider specializing in customer data.
Expand All @@ -129,7 +129,7 @@ We'll give a very brief overview of how we've implemented usage analytics.
Whilst our team at NannyML saw the need for usage analytics, we did have some deeper discussions about how to present
this to you, the end user.

Do we disable usage analytics collection by default and have the end user explicitly opt in? ]
Do we disable usage analytics collection by default and have the end user explicitly opt in?
Whilst it felt very intuitive and "correct” to do so, we asked ourselves the following question.
“Would I go through the trouble of explicitly enabling this every time I use NannyML?".
Our answer was no, we probably wouldn't bother. And if we wouldn't, it is only fair we don't expect you to.
Expand Down
12 changes: 12 additions & 0 deletions nannyml/data_quality/unseen/calculator.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ def __init__(
self,
column_names: Union[str, List[str]],
normalize: bool = True,
y_pred_column_name: Optional[str] = None,
y_true_column_name: Optional[str] = None,
timestamp_column_name: Optional[str] = None,
chunk_size: Optional[int] = None,
chunk_number: Optional[int] = None,
Expand Down Expand Up @@ -96,6 +98,10 @@ def __init__(
"column_names should be either a column name string or a list of columns names strings, "
"found\n{column_names}"
)

self.y_pred_column_name = y_pred_column_name
self.y_true_column_name = y_true_column_name

self.result: Optional[Result] = None
# Threshold strategy is the same across all columns
# By default for unseen values there is no lower threshold or threshold limit.
Expand Down Expand Up @@ -135,6 +141,12 @@ def _fit(self, reference_data: pd.DataFrame, *args, **kwargs):
# Included columns of dtype=int should be considered categorical. We'll try converting those explicitly.
reference_data = _convert_int_columns_to_categorical(reference_data, self.column_names, self._logger)

# y_true and y_pred columns are treated as categorical for the purpose of this calculator
if self.y_pred_column_name:
reference_data[self.y_pred_column_name] = reference_data[self.y_pred_column_name].astype('category')
if self.y_true_column_name:
reference_data[self.y_true_column_name] = reference_data[self.y_true_column_name].astype('category')

# All provided columns must be categorical
continuous_column_names, categorical_column_names = _split_features_by_type(reference_data, self.column_names)
if not set(self.column_names) == set(categorical_column_names):
Expand Down
Loading

0 comments on commit 8ad2cfd

Please sign in to comment.