Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFX Transform component receives "Error 413 (Request Entity Too Large)!!1" from Dataflow #242

Open
mbernico opened this issue Jun 23, 2021 · 2 comments
Assignees

Comments

@mbernico
Copy link

Creating a TFX pipeline for a structure data model with 1621 features, I receive this error from TFX
0.30.0/TensorflowTransform 0.30.0:

ERROR:absl:udf_utils.get_fn {'module_file': None, 'module_path': 'tfx_util@gs://redacted/_wheels/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl', 'preprocessing_fn': None} 'preprocessing_fn'
WARNING:tensorflow:From /home/jupyter/.local/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:From /home/jupyter/.local/lib/python3.7/site-packages/tensorflow_transform/tf_utils.py:266: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:tensorflow:Tables initialized inside a tf.function will be re-initialized on every invocation of the function. This re-initialization can have significant impact on performance. Consider lifting them out of the graph context using `tf.init_scope`.
WARNING:root:This output type hint will be ignored and not used for type-checking purposes. Typically, output type hints for a PTransform are single (or nested) types wrapped by a PCollection, PDone, or None. Got: Tuple[Dict[str, Union[NoneType, _Dataset]], Union[Dict[str, Dict[str, PCollection]], NoneType]] instead.
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:tensorflow:Tensorflow version (2.4.2) found. Note that Tensorflow Transform support for TF 2.0 is currently in beta, and features such as tf.function may not work as intended. 
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring send_type hint: <class 'NoneType'>
WARNING:apache_beam.typehints.typehints:Ignoring return_type hint: <class 'NoneType'>
WARNING:apache_beam.runners.portability.stager:The .whl package "/tmp/tmpqx90xuwj/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl" is provided in --extra_package. This functionality is not officially supported. Since wheel packages are binary distributions, this package must be binary-compatible with the worker environment (e.g. Python 2.7 running on an x64 Linux host).
WARNING:apache_beam.runners.portability.stager:The .whl package "/tmp/tmph8oewj3m/tfx_user_code_Transform-0.0+9f052e692cc2c8a7d7411a095329ab307d215d22c7010cda7474824c1988ccc9-py3-none-any.whl" is provided in --extra_package. This functionality is not officially supported. Since wheel packages are binary distributions, this package must be binary-compatible with the worker environment (e.g. Python 2.7 running on an x64 Linux host).
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.0584546420241447 seconds before retrying submit_job_description because we caught exception: BrokenPipeError: [Errno 32] Broken pipe
 Traceback for above exception (most recent call last):
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
    return fun(*args, **kwargs)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 785, in submit_job_description
    response = self._client.projects_locations_jobs.Create(request)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 903, in Create
    config, request, global_params=global_params)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 729, in _RunMethod
    http, http_request, **opts)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/http_wrapper.py", line 350, in MakeRequest
    check_response_func=check_response_func)
  File "/home/jupyter/.local/lib/python3.7/site-packages/apitools/base/py/http_wrapper.py", line 400, in _MakeRequestNoRetry
    redirections=redirections, connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 175, in new_request
    redirections, connection_type)
  File "/opt/conda/lib/python3.7/site-packages/oauth2client/transport.py", line 282, in request
    connection_type=connection_type)
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1709, in request
    conn, authority, uri, request_uri, method, body, headers, redirections, cachekey,
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1424, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/opt/conda/lib/python3.7/site-packages/httplib2/__init__.py", line 1347, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/opt/conda/lib/python3.7/http/client.py", line 1277, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1323, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1272, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/conda/lib/python3.7/http/client.py", line 1071, in _send_output
    self.send(chunk)
  File "/opt/conda/lib/python3.7/http/client.py", line 993, in send
    self.sock.sendall(data)
  File "/opt/conda/lib/python3.7/ssl.py", line 1034, in sendall
    v = self.send(byte_view[count:])
  File "/opt/conda/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)

---------------------------------------------------------------------------
HttpError                                 Traceback (most recent call last)
<ipython-input-39-efea3de47a8e> in <module>
----> 1 context.run(transform)

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run_if_ipython(*args, **kwargs)
     66       # __IPYTHON__ variable is set by IPython, see
     67       # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 68       return fn(*args, **kwargs)
     69     else:
     70       absl.logging.warning(

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
    186         telemetry_utils.LABEL_TFX_RUNNER: runner_label,
    187     }):
--> 188       execution_id = launcher.launch().execution_id
    189 
    190     return execution_result.ExecutionResult(

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
    207                          copy.deepcopy(execution_decision.input_dict),
    208                          execution_decision.output_dict,
--> 209                          copy.deepcopy(execution_decision.exec_properties))
    210 
    211     absl.logging.info('Running publisher for %s',

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
     70     # output_dict can still be changed, specifically properties.
     71     executor.Do(
---> 72         copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in Do(self, input_dict, output_dict, exec_properties)
    490       label_outputs[labels.CACHE_OUTPUT_PATH_LABEL] = cache_output
    491     status_file = 'status_file'  # Unused
--> 492     self.Transform(label_inputs, label_outputs, status_file)
    493     absl.logging.debug('Cleaning up temp path %s on executor success',
    494                        temp_path)

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in Transform(***failed resolving arguments***)
   1025                       output_cache_dir, compute_statistics,
   1026                       per_set_stats_output_paths, materialization_format,
-> 1027                       len(analyze_data_paths))
   1028   # TODO(b/122478841): Writes status to status file.
   1029 

~/.local/lib/python3.7/site-packages/tfx/components/transform/executor.py in _RunBeamImpl(self, analyze_data_list, transform_data_list, preprocessing_fn, stats_options_updater_fn, force_tf_compat_v1, input_dataset_metadata, transform_output_path, raw_examples_data_format, temp_path, input_cache_dir, output_cache_dir, compute_statistics, per_set_stats_output_paths, materialization_format, analyze_paths_count)
   1338                      Executor._RecordBatchToExamples)
   1339                  | 'Materialize[{}]'.format(infix) >> self._WriteExamples(
-> 1340                      materialization_format, dataset.materialize_output_path))
   1341 
   1342     return _Status.OK()

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in __exit__(self, exc_type, exc_val, exc_tb)
    583     try:
    584       if not exc_type:
--> 585         self.result = self.run()
    586         self.result.wait_until_finish()
    587     finally:

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    538             self.to_runner_api(use_fake_coders=True),
    539             self.runner,
--> 540             self._options).run(False)
    541 
    542       if (self._options.view_as(TypeOptions).runtime_type_check and

~/.local/lib/python3.7/site-packages/apache_beam/pipeline.py in run(self, test_runner_api)
    562         finally:
    563           shutil.rmtree(tmpdir)
--> 564       return self.runner.run_pipeline(self, self._options)
    565     finally:
    566       shutil.rmtree(self.local_tempdir, ignore_errors=True)

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py in run_pipeline(self, pipeline, options)
    580     # raise an exception.
    581     result = DataflowPipelineResult(
--> 582         self.dataflow_client.create_job(self.job), self)
    583 
    584     # TODO(BEAM-4274): Circular import runners-metrics. Requires refactoring.

~/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py in wrapper(*args, **kwargs)
    251       while True:
    252         try:
--> 253           return fun(*args, **kwargs)
    254         except Exception as exn:  # pylint: disable=broad-except
    255           if not retry_filter(exn):

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py in create_job(self, job)
    682 
    683     if not template_location:
--> 684       return self.submit_job_description(job)
    685 
    686     _LOGGER.info(

~/.local/lib/python3.7/site-packages/apache_beam/utils/retry.py in wrapper(*args, **kwargs)
    251       while True:
    252         try:
--> 253           return fun(*args, **kwargs)
    254         except Exception as exn:  # pylint: disable=broad-except
    255           if not retry_filter(exn):

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py in submit_job_description(self, job)
    783 
    784     try:
--> 785       response = self._client.projects_locations_jobs.Create(request)
    786     except exceptions.BadStatusCodeError as e:
    787       _LOGGER.error(

~/.local/lib/python3.7/site-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py in Create(self, request, global_params)
    901       config = self.GetMethodConfig('Create')
    902       return self._RunMethod(
--> 903           config, request, global_params=global_params)
    904 
    905     Create.method_config = lambda: base_api.ApiMethodInfo(

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in _RunMethod(self, method_config, request, global_params, upload, upload_config, download)
    729                 http, http_request, **opts)
    730 
--> 731         return self.ProcessHttpResponse(method_config, http_response, request)
    732 
    733     def ProcessHttpResponse(self, method_config, http_response, request=None):

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in ProcessHttpResponse(self, method_config, http_response, request)
    735         return self.__client.ProcessResponse(
    736             method_config,
--> 737             self.__ProcessHttpResponse(method_config, http_response, request))

~/.local/lib/python3.7/site-packages/apitools/base/py/base_api.py in __ProcessHttpResponse(self, method_config, http_response, request)
    602                                              http_client.NO_CONTENT):
    603             raise exceptions.HttpError.FromResponse(
--> 604                 http_response, method_config=method_config, request=request)
    605         if http_response.status_code == http_client.NO_CONTENT:
    606             # TODO(craigcitro): Find out why _replace doesn't seem to work

HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/redacted-dev-datascience/locations/us-central1/jobs?alt=json>: response: <{'content-type': 'text/html; charset=UTF-8', 'referrer-policy': 'no-referrer', 'content-length': '2477', 'date': 'Wed, 23 Jun 2021 19:48:48 GMT', 'connection': 'close', 'status': '413'}>, content <<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 413 (Request Entity Too Large)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>413.</b> <ins>That���s an error.</ins>
  <p>Your client issued a request that was too large.
 <script>
  (function() { /*

 Copyright The Closure Library Authors.
 SPDX-License-Identifier: Apache-2.0
*/
var c=function(a,d,b){a=a+"=deleted; path="+d;null!=b&&(a+="; domain="+b);document.cookie=a+"; expires=Thu, 01 Jan 1970 00:00:00 GMT"};var g=function(a){var d=e,b=location.hostname;c(d,a,null);c(d,a,b);for(var f=0;;){f=b.indexOf(".",f+1);if(0>f)break;c(d,a,b.substring(f+1))}};var h;if(4E3<unescape(encodeURI(document.cookie)).length){for(var k=document.cookie.split(";"),l=[],m=0;m<k.length;m++){var n=k[m].match(/^\s*([^=]+)/);n&&l.push(n[1])}for(var p=0;p<l.length;p++){var e=l[p];g("/");for(var q=location.pathname,r=0;;){r=q.indexOf("/",r+1);if(0>r)break;var t=q.substring(0,r);g(t);g(t+"/")}"/"!=q.charAt(q.length-1)&&(g(q),g(q+"/"))}h=!0}else h=!1;
h&&setTimeout(function(){if(history.replaceState){var a=location.href;history.replaceState(null,"","/");location.replace(a)}},1E3); })();

</script>
 <ins>That���s all we know.</ins>

InteractiveContext is the orchestrator and each component is running on Cloud Dataflow.

TFX preprocessing_fn is:

def preprocessing_fn(inputs):
  """tf.transform's callback function for preprocessing inputs.
  Args:
    inputs: map from feature keys to raw not-yet-transformed features.
  Returns:
    Map from string feature key to transformed feature operations.
  """
  features = get_keys()
  absl.logging.debug(inputs.keys)

  outputs = {}

  for key in features['continuous']:
    outputs[key] = tft.scale_to_z_score(_convert_to_dense(inputs[key]))

  for key in features['vocab']:
    outputs[key] = tft.compute_and_apply_vocabulary(
        _convert_to_dense(inputs[key]),
        top_k=MAX_VOCAB_SIZE,
        num_oov_buckets=OOV_SIZE,
        vocab_filename=key)

  for key in features['identity']:
    outputs[key] = _convert_to_dense(inputs[key])

  return outputs
  • There are 20 categorical features, the rest are continuous.
  • Not able to share the dataset.
@zhitaoli
Copy link

Hi @mbernico

I see that you are using DataflowRunner for this component. Can you share the beam_pipeline_args used (might be either on pipeline level or component level)?

Also, can you try to add --experiments=upload_graph to the beam_pipeline_args and let us know whether the issue would disappear?

@arghyaganguly
Copy link

@mbernico , please update on @zhitaoli's comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants