Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with null values on prediction time #52

Open
pachoning opened this issue May 17, 2019 · 2 comments
Open

Problems with null values on prediction time #52

pachoning opened this issue May 17, 2019 · 2 comments
Assignees

Comments

@pachoning
Copy link

pachoning commented May 17, 2019

I have trained a classification model with two categorical variables:

  • country
  • browser

I have done it in Datalab with Tensorflow version 1.8. Since there can be missing values in both variables, I have used defaults values in the reading function:

CSV_COLUMNS = ['country', 'browser', 'indDownloaded']
LABEL_COLUMN = 'indDownloaded'
DEFAULTS = [['Zimbabwe'], ['Opera'], [0]]

def read_dataset(filename, mode, batch_size = 512):
      def decode_csv(value_column):
          columns = tf.decode_csv(value_column, record_defaults = DEFAULTS)
          features = dict(zip(CSV_COLUMNS, columns))
          label = features.pop(LABEL_COLUMN)
          # The key passes through the graph unused.
          return features, label

      filenames_dataset = tf.data.Dataset.list_files(filename)
      textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
      dataset = textlines_dataset.map(decode_csv)

      if mode == tf.estimator.ModeKeys.TRAIN:
          num_epochs = None # indefinitely
          dataset = dataset.shuffle(buffer_size = 10 * batch_size)
      else:
          num_epochs = 1

      dataset = dataset.repeat(num_epochs).batch(batch_size)
      
      return dataset

I wanted to export the model to use it in real time. In the serving function I have used placeholder_with_default functions in order to manage the missing values:

def serving_input_fn():
    json_feature_placeholders = {
      'country' : tf.placeholder_with_default(
        input = ['Zimbabwe'], 
        shape = [None]
      ),
      
      'browser' : tf.placeholder_with_default(
        input = ['Opera'], 
        shape = [None]
      )
    }
    
    features = json_feature_placeholders # no transformation needed
    return tf.estimator.export.ServingInputReceiver(features, json_feature_placeholders)

Once the model was ready to be deployed, I wanted to check what happened when one variable was missing (country is the missing one). This is how I did:

%%writefile ./test.json
{"country": "Venezuela"} 
%%bash
gcloud ml-engine local predict  \
    --model-dir=path _to_the_model \
    --json-instances=./test.json

Everything is ok with the local prediction. However, when I deploy to the MLE and test with missing values, I get some errors.

Input 1

%%writefile ./test.json
{"country": "Venezuela"} 

Output1

{
  "error": "Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"input size does not match signature\")"
}

Input 2

%%writefile ./test.json
{"country": "Venezuela", "browser" : null} 

Output 2

{
  "error": "Prediction failed: Error processing input: Expected string, got None of type '_Message' instead."
}

I am using the following comand:

%%bash
gcloud ml-engine predict --model=${MODEL_NAME} --version=${MODEL_VERSION} --json-instances=./test.json

So, locally I manage to set one variable to null and get a prediction. However, on the Cloud that is not possible. Can you help me, please? I think this is related to https://github.com/tensorflow/tensorflow/issues/10014

@andrewferlitsch
Copy link
Contributor

@dizcology could you followup this.

@andrewferlitsch
Copy link
Contributor

@dizcology PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants