Skip to content

Latest commit

 

History

History

tabular

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Task

The task is to predict the price of houses.

Dataset

We use the california housing dataset. It has 8 numerical and no categorical features. The total size of the dataset is 20640. The test split used for evaluation is the same as in Revisiting Deep Learning Models for Tabular Data.

Model

We use the FT-Transformer model with its default parameter settings.

Performance

We compare the Root Mean Squared Error (RMSE). Our model reaches a performance of 0.397 ± 0.006 RMSE. The search grid used to find the optimal hyperparameters can be found here.

Performance Comparison

We compare our model against the performance reported in Revisiting Deep Learning Models for Tabular Data. There the authors report a performance of 0.459 RMSE (see Table 2 of the paper). Reproducing the exact hyperparameters the authors used is difficult as the authors used Optuna to optimize the hyperparameters and did not state the optimal hyperparameters found. Using their default hyperparameter settings, we achieve an RMSE of 0.404±(0.004). This difference might be explained by the choice of preprocessing used. While the authors state that they use sklearns QuantileTransformer, the performance achieved in the paper is closer to what we acieve with the StandardScaler. When preprocessing with the StandardScaler on their default hyperparameters, we obtain a performance of 0.453±(0.015), which is much closer to the reported performance. To reproduce the values for this comparison, use the following config:

task:
  name: tabular
  output_dir_name: tabular_reference
  train_transforms:
    - normalizer: quantile
      noise: 1.e-3
    - normalizer: standard
      noise: 0.0
optimizer:
  name: adamw_baseline
  learning_rate: 1.e-4
  weight_decay: 1.e-5
  eta_min_factor: 1.0
engine:
  seed: [1, 2, 3]