Skip to content

Preprocessing in Titus

Jim Pivarski edited this page Dec 7, 2015 · 7 revisions

Motivation

Often, you want a PFA scoring engine to exactly reproduce the preprocessing (or postprocessing) steps involved in model-production. For instance, if you normalize vectors before building k-means clusters, you must normalize the data using the same offset and scale while scoring.

PFA preprocessing steps are expressed using PFA functions, but the model producer may use its own language to express them. If you're building models in Python/Numpy, you're probably preprocessing the training data as Python lists or Numpy arrays.

The titus.producer.transformation.Transformation class is intended to help coordinate offline (producer) transformations in Python/Numpy with online (scoring) transformations in PFA. It uses a Python-to-PFA syntax translation engine based on Python's built-in AST.

Before you begin...

Download and install Titus and Numpy. This article was tested with Titus 0.8.1 and Numpy 1.8.2; newer versions should work with no modification. Python >= 2.6 and < 3.0 is required.

Launch a Python prompt and import titus.producer.transformation:

Python 2.7.6
Type "help", "copyright", "credits" or "license" for more information.
>>> from titus.producer.transformation import Transformation

Defining a transformation

Clone this wiki locally