Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: GLWE mat mul for hybrid model #868

Merged
merged 4 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions deps_licenses/licenses_linux_user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@ MarkupSafe, 2.1.5, BSD License
PyYAML, 6.0.2, MIT License
brevitas, 0.10.2, UNKNOWN
certifi, 2024.8.30, Mozilla Public License 2.0 (MPL 2.0)
charset-normalizer, 3.3.2, MIT License
charset-normalizer, 3.4.0, MIT License
coloredlogs, 15.0.1, MIT License
concrete-ml-extensions, 0.1.2, BSD-3-Clause-Clear
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for linux, we'll see in the weekly what happens on mac (should fall back to CP compilation)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run make licenses?

I thought we were only supposed to do that through GitHub Actions.
Or was it for a another purpose ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I generated the license files in the action. but this library only works on linux so only the linux license file shows it

concrete-python, 2.8.1, BSD-3-Clause
dependencies, 2.0.1, BSD License
dill, 0.3.8, BSD License
dill, 0.3.9, BSD License
filelock, 3.16.1, The Unlicense (Unlicense)
flatbuffers, 2.0.7, Apache Software License
fsspec, 2024.9.0, BSD License
huggingface-hub, 0.25.1, Apache Software License
huggingface-hub, 0.25.2, Apache Software License
humanfriendly, 10.0, MIT License
hummingbird-ml, 0.4.11, MIT License
idna, 3.10, BSD License
Expand All @@ -32,7 +33,7 @@ nvidia-curand-cu12, 10.3.2.106, Other/Proprietary License
nvidia-cusolver-cu12, 11.4.5.107, Other/Proprietary License
nvidia-cusparse-cu12, 12.1.0.106, Other/Proprietary License
nvidia-nccl-cu12, 2.20.5, Other/Proprietary License
nvidia-nvjitlink-cu12, 12.6.68, Other/Proprietary License
nvidia-nvjitlink-cu12, 12.6.77, Other/Proprietary License
nvidia-nvtx-cu12, 12.1.105, Other/Proprietary License
onnx, 1.16.1, Apache License v2.0
onnxconverter-common, 1.13.0, MIT License
Expand Down
2 changes: 1 addition & 1 deletion deps_licenses/licenses_linux_user.txt.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ac76836858506534a0dc01cae9341f7d
8ea8aec4f5aac03565c2dcb9f3f8a1da
6 changes: 3 additions & 3 deletions deps_licenses/licenses_mac_intel_user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ MarkupSafe, 2.1.5, BSD License
PyYAML, 6.0.2, MIT License
brevitas, 0.10.2, UNKNOWN
certifi, 2024.8.30, Mozilla Public License 2.0 (MPL 2.0)
charset-normalizer, 3.3.2, MIT License
charset-normalizer, 3.4.0, MIT License
coloredlogs, 15.0.1, MIT License
concrete-python, 2.8.1, BSD-3-Clause
dependencies, 2.0.1, BSD License
dill, 0.3.8, BSD License
dill, 0.3.9, BSD License
filelock, 3.16.1, The Unlicense (Unlicense)
flatbuffers, 2.0.7, Apache Software License
fsspec, 2024.9.0, BSD License
huggingface-hub, 0.25.1, Apache Software License
huggingface-hub, 0.25.2, Apache Software License
humanfriendly, 10.0, MIT License
hummingbird-ml, 0.4.11, MIT License
idna, 3.10, BSD License
Expand Down
2 changes: 1 addition & 1 deletion deps_licenses/licenses_mac_intel_user.txt.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ac76836858506534a0dc01cae9341f7d
8ea8aec4f5aac03565c2dcb9f3f8a1da
6 changes: 3 additions & 3 deletions deps_licenses/licenses_mac_silicon_user.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ MarkupSafe, 2.1.5, BSD License
PyYAML, 6.0.2, MIT License
brevitas, 0.10.2, UNKNOWN
certifi, 2024.8.30, Mozilla Public License 2.0 (MPL 2.0)
charset-normalizer, 3.3.2, MIT License
charset-normalizer, 3.4.0, MIT License
coloredlogs, 15.0.1, MIT License
concrete-python, 2.8.1, BSD-3-Clause
dependencies, 2.0.1, BSD License
dill, 0.3.8, BSD License
dill, 0.3.9, BSD License
filelock, 3.16.1, The Unlicense (Unlicense)
flatbuffers, 2.0.7, Apache Software License
fsspec, 2024.9.0, BSD License
huggingface-hub, 0.25.1, Apache Software License
huggingface-hub, 0.25.2, Apache Software License
humanfriendly, 10.0, MIT License
hummingbird-ml, 0.4.11, MIT License
idna, 3.10, BSD License
Expand Down
2 changes: 1 addition & 1 deletion deps_licenses/licenses_mac_silicon_user.txt.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ac76836858506534a0dc01cae9341f7d
8ea8aec4f5aac03565c2dcb9f3f8a1da
2 changes: 1 addition & 1 deletion docs/deep-learning/fhe_assistant.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ concrete_clf.compile(X, debug_config)

#### 3. Quantization import failed

**Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
**Error message**: `Error occurred during quantization aware training (QAT) import [...] Are you missing a QuantIdentity layer in your Brevitas model?`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have a non-intuitive error that occurs when using 'view' instead of 'reshape'.

Error: ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size ... is different from ...)

Maybe we could add it ? CC @andrei-stoian-zama

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not related, it's due to shape inference during the onnx export.


**Cause**: This error occurs when the model imported as a quantized-aware training model lacks quantization operators. See [this guide](../deep-learning/fhe_friendly_models.md) on how to use Brevitas layers. This error message indicates that some layers do not take inputs quantized through `QuantIdentity` layers.

Expand Down
3 changes: 2 additions & 1 deletion docs/guides/prediction_with_fhe.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,12 @@ class FCSmall(nn.Module):
super().__init__()
self.quant_input = qnn.QuantIdentity(bit_width=3)
self.fc1 = qnn.QuantLinear(in_features=input_output, out_features=input_output, weight_bit_width=3, bias=True)
self.quant_2 = qnn.QuantIdentity(bit_width=3)
self.act_f = nn.ReLU()
self.fc2 = qnn.QuantLinear(in_features=input_output, out_features=input_output, weight_bit_width=3, bias=True)

def forward(self, x):
return self.fc2(self.act_f(self.fc1(self.quant_input(x))))
return self.fc2(self.quant_2(self.act_f(self.fc1(self.quant_input(x)))))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was bad all along and only worked because the QAT quantization was guessed from the calibration data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of using quantIdentity layers after quantLayers.

It works fine without them, and including them just adds unnecessary complexity in my opinion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it used to work because of some automatic detection of quant parameters that we were doing. but I removed that detection since it was slow. so now we need quantidentity in the right places


torch_model = FCSmall(3)

Expand Down
1,026 changes: 590 additions & 436 deletions poetry.lock

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ python = ">=3.8.1,<3.12"
# https://python-poetry.org/docs/1.7/repositories#project-configuration
# concrete-python = {version="==2.7.0", source = "zama-pypi-cpu"}
concrete-python = {version="==2.8.1", source = "zama-pypi-cpu"}
concrete-ml-extensions = [
{version = "0.1.2", platform = "linux" }
]
setuptools = "65.6.3"
skops = {version = "0.5.0"}
xgboost = "1.6.2"
Expand Down Expand Up @@ -152,6 +155,7 @@ filterwarnings = [
"ignore:You are using `torch.load`*",
"ignore:open_text is deprecated.*:DeprecationWarning",
"ignore:read_text is deprecated.*:DeprecationWarning",
"ignore:open_binary is deprecated.*:DeprecationWarning",
]

[tool.semantic_release]
Expand Down
3 changes: 2 additions & 1 deletion script/make_utils/licenses.sh
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ then
# And check with a white-list
# Brevitas has an "UNKNOWN" license, but is actually a BSD, so it is ignored in this test
# pkg-resources reports UNKNOWN due to a Ubuntu bug, but is Apache - ignore
# concrete-ml-extensions has the same license as Concrete ML, so skip checking
LICENSES_WHITELIST="new BSD 3-Clause"
LICENSES_WHITELIST="${LICENSES_WHITELIST};3-Clause BSD License"
LICENSES_WHITELIST="${LICENSES_WHITELIST};new BSD"
Expand All @@ -181,7 +182,7 @@ then
LICENSES_WHITELIST="${LICENSES_WHITELIST};ISC License (ISCL)"
LICENSES_WHITELIST="${LICENSES_WHITELIST};The Unlicense (Unlicense)"

pip-licenses --allow-only="${LICENSES_WHITELIST}" --ignore-packages brevitas pkg-resources pkg_resources concrete-ml-extensions-brevitas
pip-licenses --allow-only="${LICENSES_WHITELIST}" --ignore-packages brevitas pkg-resources pkg_resources concrete-ml-extensions

deactivate

Expand Down
11 changes: 11 additions & 0 deletions src/concrete/ml/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,17 @@ def is_valid(fhe: Union["FheMode", str]) -> bool:
return fhe in FheMode.__members__.values()


class HybridFHEMode(enum.Enum):
"""Simple enum for different modes of execution of HybridModel."""

DISABLE = "disable" # Use torch weights
REMOTE = "remote" # Use remote FHE server
SIMULATE = "simulate" # Use FHE simulation
CALIBRATE = "calibrate" # Use calibration (to run before FHE compilation)
EXECUTE = "execute" # Use FHE execution
TORCH = "torch" # Use torch layers


def replace_invalid_arg_name_chars(arg_name: str) -> str:
"""Sanitize arg_name, replacing invalid chars by _.

Expand Down
24 changes: 13 additions & 11 deletions src/concrete/ml/pytest/torch_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,13 @@ def forward(self, inputs):
class FCSmall(nn.Module):
"""Torch model for the tests."""

def __init__(self, input_output, activation_function):
def __init__(self, input_output, activation_function, hidden=None):
super().__init__()
self.fc1 = nn.Linear(in_features=input_output, out_features=input_output)

hidden_size = input_output if hidden is None else hidden
self.fc1 = nn.Linear(in_features=input_output, out_features=hidden_size)
self.act_f = activation_function()
self.fc2 = nn.Linear(in_features=input_output, out_features=input_output)
self.fc2 = nn.Linear(in_features=hidden_size, out_features=input_output)

def forward(self, x):
"""Forward pass.
Expand Down Expand Up @@ -850,7 +852,7 @@ def forward(self, x):
return x


class SimpleQAT(nn.Module):
class StepFunctionPTQ(nn.Module):
"""Torch model implements a step function that needs Greater, Cast and Where."""

def __init__(self, input_output, activation_function, n_bits=2, disable_bit_check=False):
Expand Down Expand Up @@ -1354,17 +1356,17 @@ def __init__(
super().__init__()

self.n_blocks = n_blocks
self.quant_1 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
self.quant_1 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=False)
self.fc1 = qnn.QuantLinear(input_shape, hidden_shape, bias=False, weight_bit_width=n_bits)

self.quant_concat = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
self.quant_concat = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=False)

self.quant_2 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
self.quant_2 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=False)
self.fc2 = qnn.QuantLinear(
hidden_shape * self.n_blocks, hidden_shape, bias=True, weight_bit_width=n_bits
)

self.quant_3 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=True)
self.quant_3 = qnn.QuantIdentity(bit_width=n_bits, return_quant_tensor=False)
self.fc4 = qnn.QuantLinear(hidden_shape, output_shape, bias=True, weight_bit_width=n_bits)

def forward(self, x):
Expand All @@ -1379,9 +1381,9 @@ def forward(self, x):
x_pre = []

for i in range(self.n_blocks):
x_block = x[:, i, :]
q1_out = self.quant_1(x_block)
fc1_out = self.fc1(q1_out)
q_x = self.quant_1(x)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong.. the input to the slicing should be quantized with QuantIdentity

q_x_block = q_x[:, i, :]
fc1_out = self.fc1(q_x_block)
q_concat_out = self.quant_concat(fc1_out)

x_pre.append(q_concat_out)
Expand Down
75 changes: 60 additions & 15 deletions src/concrete/ml/quantization/base_quantized_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
QuantizationOptions,
QuantizedArray,
UniformQuantizationParameters,
UniformQuantizer,
)

# pylint: disable=too-many-lines
Expand Down Expand Up @@ -559,7 +560,10 @@ def _prepare_quantized_input(self, input_: QuantizedArray) -> QuantizedArray:
# but when parsing the ONNX graph, some options can be overwritten. Thus
# when evaluating QAT layers we ignore one of these options to allow the
# override.
if quant_opts.is_equal(input_.quantizer.quant_options, ignore_sign_qat=True):
if (
quant_opts.is_equal(input_.quantizer.quant_options, ignore_sign_qat=True)
or input_.quantizer.quant_options.is_precomputed_qat
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we were overriding the op input quantization mainly for the model input. now we override every time there is a BrevitasQuant layer before the op

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, maybe you should update the comment accordingly.

):
# Pass-through the input quantizer when the input is already quantized in
# the manner that this op requires: this makes the op use the qvalues directly,
# in q_impl and will avoid a TLU to re-quantize.
Expand Down Expand Up @@ -661,7 +665,9 @@ def _prepare_inputs_with_constants(
elif calibrate or is_clear_value:
# This is used during calibration with numpy.ndarrays
# or then the input is raw (not quantized)
prepared_inputs[curr_input_fill_idx] = input_
prepared_inputs[curr_input_fill_idx] = (
input_.values if isinstance(input_, QuantizedArray) else input_
)
elif quantize_actual_values:
# This is used by mixing (conv/gemm) or value re-arranging ops (reshape)
input_ = cast(QuantizedArray, input_)
Expand All @@ -674,9 +680,6 @@ def _prepare_inputs_with_constants(
new_input.quantizer.is_qat
and not input_.quantizer.is_precomputed_qat
and self.error_tracker is not None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically:

  • only mixing ops and reshape ops (ops that quantize their inputs) call this function with quantize_actual_values
  • only QuantizedArray that have is_precomputed_qat are now accepted for these functions if is_qat (QAT import was requested by the user).
  • you don't neeed QuantIdentity before reshape/mixing layers applied on other mixing ops outputs because the "passthrough" for is_precomputed_qat implemented above.

and not new_input.quantizer.check_is_uniform_quantized(
new_input.quantizer.quant_options
)
):
self.error_tracker.append(input_idx)

Expand All @@ -700,7 +703,7 @@ def _prepare_inputs_with_constants(

return prepared_inputs

def calibrate(self, *inputs: numpy.ndarray) -> numpy.ndarray:
def calibrate(self, *inputs: Union[QuantizedArray, numpy.ndarray]) -> numpy.ndarray:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow calibration with QuantizedArray. Thus the calibration can be performed analytically.

this is similar to what Brevitas does with QuantTensor which determines output quant parameters based on the input ones, without calibration.

"""Create corresponding QuantizedArray for the output of the activation function.

Args:
Expand All @@ -712,6 +715,8 @@ def calibrate(self, *inputs: numpy.ndarray) -> numpy.ndarray:

# Here we need the actual values of the constants, we need to pass through
# the numpy.ndarrays in the computation graph
# Mixing ops may be calibrated using QuantizedArray inputs, in order
# to pre-compute anlytical output quantization
prepared_inputs = self._prepare_inputs_with_constants(
*inputs, calibrate=True, quantize_actual_values=False
)
Expand All @@ -720,12 +725,48 @@ def calibrate(self, *inputs: numpy.ndarray) -> numpy.ndarray:
if isinstance(raw_result, RawOpOutput):
return raw_result

quantized_samples = QuantizedArray(self.n_bits, raw_result)
# If the caller passes only QuantizedArray it means
# that they are asking to quantized using analytical
# formulas
requested_analytical_quant = all(
isinstance(qv, QuantizedArray) for qv in inputs
) and isinstance(self, QuantizedMixingOp)
if requested_analytical_quant:
assert_true(
self.supported_by_linear_backend(),
"Calibration using QuantizedArray is only possible"
" for operations that can calibrate analytically",
)
q_prepared_inputs = self._prepare_inputs_with_constants(
*inputs, calibrate=False, quantize_actual_values=True
)
quantizer = self.calibrate_analytical_output(*q_prepared_inputs)
self.output_quant_params = quantizer.quant_params
self.output_quant_stats = quantizer.quant_stats
Comment on lines +743 to +745
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this? We should always have the output based on the calibration quantized params since we want a specific bit-width, no? This analytical output quantizer would mean we don't requantize the output to a lower bitwidth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly, the analytically computed quantizers are only used for dequantization

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the whole analytical part on this PR. Why are we doing this?

because doing this avoids having to actually perform the computation of the layer outputs for the calibration. thus it's much faster. since we only need the parameters for dequantization we can skip the actual computation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright so that's an optimization that will only be applied for linear only circuit. Though I am surprised it works since the condition is just for the op to be a QuantizedMixingOp. Why isn't the e.g. QuantizedAdd making our tests fail since it doesn't have an implementation for this calibrate_analytical_output ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we only test for the QuantizedGemm / QuantizedMatmul case, while for the other MixingOps we test that an error is raised if one attempts to call calibrate_analytically on them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantized add must be tested in torch. Why isn't this failing?

else:
# These output quantization parameters are only used
# for operations that produce graph output operation
# and are a non-linear
quantized_samples = QuantizedArray(self.n_bits, raw_result)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the previous "data-based calibration" that was done for op outputs. these values aren't actually used except for Univariate functions (mixing ops apply their input quantization)


self.output_quant_params = quantized_samples.quantizer.quant_params
self.output_quant_stats = quantized_samples.quantizer.quant_stats

return raw_result

self.output_quant_params = quantized_samples.quantizer.quant_params
self.output_quant_stats = quantized_samples.quantizer.quant_stats
def calibrate_analytical_output(self, *inputs: QuantizedArray) -> UniformQuantizer:
"""Calibrate output quantization based on analytical formulas.

return quantized_samples.values
Args:
*inputs (QuantizedArray): quantized operation inputs. Quantized weights
are storea in the op instance

Raises:
AssertionError: if the operation does not support analytical calibration
"""
raise AssertionError(
f"calibrate_analytical_output: not implemented for {self._impl_for_op_named} op"
)

def prepare_output(self, qoutput_activation: numpy.ndarray) -> QuantizedArray:
"""Quantize the output of the activation function.
Expand Down Expand Up @@ -817,6 +858,15 @@ def _get_output_quant_opts(self):
output_quant_opts.is_qat = False
return output_quant_opts

@classmethod
def supported_by_linear_backend(cls) -> bool:
"""Indicate if this op can be executed on the GLWE linear backend.

Returns:
bool: True if the op can be executed with GLWE.
"""
return False


class QuantizedOpUnivariateOfEncrypted(QuantizedOp, is_utility=True):
"""An univariate operator of an encrypted value.
Expand Down Expand Up @@ -931,11 +981,6 @@ def make_output_quant_parameters(
Returns:
QuantizedArray: the quantized array that will be passed to the QuantizedModule output.
"""

out_opts = self._get_output_quant_opts()
out_opts.is_signed = False
out_opts.is_symmetric = False

# Since we don't know the real bit-width of these quantized values,
# return a quantizer that has zero offset
out_params = UniformQuantizationParameters(
Expand Down
Loading
Loading