Problems with null values on prediction time #52

pachoning · 2019-05-17T11:17:29Z

I have trained a classification model with two categorical variables:

country
browser

I have done it in Datalab with Tensorflow version 1.8. Since there can be missing values in both variables, I have used defaults values in the reading function:

CSV_COLUMNS = ['country', 'browser', 'indDownloaded']
LABEL_COLUMN = 'indDownloaded'
DEFAULTS = [['Zimbabwe'], ['Opera'], [0]]

def read_dataset(filename, mode, batch_size = 512):
      def decode_csv(value_column):
          columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
          features = dict(zip(CSV_COLUMNS, columns))
          label = features.pop(LABEL_COLUMN)
          # The key passes through the graph unused.
          return features, label

      filenames_dataset = tf.data.Dataset.list_files(filename)
      textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
      dataset = textlines_dataset.map(decode_csv)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # indefinitely
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      
      return dataset

I wanted to export the model to use it in real time. In the serving function I have used placeholder_with_default functions in order to manage the missing values:

def serving_input_fn():
    json_feature_placeholders = {
      'country' : tf.placeholder_with_default(
        input = ['Zimbabwe'], 
        shape = [None]
      ),
      
      'browser' : tf.placeholder_with_default(
        input = ['Opera'], 
        shape = [None]
      )
    }
    
    features = json_feature_placeholders # no transformation needed
    return tf.estimator.export.ServingInputReceiver(features, json_feature_placeholders)

Once the model was ready to be deployed, I wanted to check what happened when one variable was missing (country is the missing one). This is how I did:

%%writefile ./test.json
{"country": "Venezuela"}

%%bash
gcloud ml-engine local predict  \
    --model-dir=path _to_the_model \
    --json-instances=./test.json

Everything is ok with the local prediction. However, when I deploy to the MLE and test with missing values, I get some errors.

Input 1

%%writefile ./test.json
{"country": "Venezuela"}

Output1

{
  "error": "Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"input size does not match signature\")"
}

Input 2

%%writefile ./test.json
{"country": "Venezuela", "browser" : null}

Output 2

{
  "error": "Prediction failed: Error processing input: Expected string, got None of type '_Message' instead."
}

I am using the following comand:

%%bash
gcloud ml-engine predict --model=${MODEL_NAME} --version=${MODEL_VERSION} --json-instances=./test.json

So, locally I manage to set one variable to null and get a prediction. However, on the Cloud that is not possible. Can you help me, please? I think this is related to https://github.com/tensorflow/tensorflow/issues/10014

andrewferlitsch · 2019-10-03T16:50:18Z

@dizcology could you followup this.

andrewferlitsch · 2020-03-16T17:19:14Z

@dizcology PTAL

gogasca self-assigned this Jun 26, 2019

gogasca added the help wanted label Jun 26, 2019

andrewferlitsch assigned dizcology Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with null values on prediction time #52

Problems with null values on prediction time #52

pachoning commented May 17, 2019 •

edited

Loading

andrewferlitsch commented Oct 3, 2019

andrewferlitsch commented Mar 16, 2020

Problems with null values on prediction time #52

Problems with null values on prediction time #52

Comments

pachoning commented May 17, 2019 • edited Loading

andrewferlitsch commented Oct 3, 2019

andrewferlitsch commented Mar 16, 2020

pachoning commented May 17, 2019 •

edited

Loading