Skip to content

PriorLabs/tabpfn-client

Repository files navigation

TabPFN Client

PyPI version Discord colab Documentation Twitter Follow License Last Commit

TabPFN is a foundation model for tabular data that outperforms traditional methods while being dramatically faster. This client library provides easy access to the TabPFN API, enabling state-of-the-art tabular machine learning in just a few lines of code.

📚 For detailed usage examples and best practices, check out our Interactive Colab Tutorial

⚠️ Alpha Release Note

This is an alpha release. While we've tested it thoroughly in our use cases, you may encounter occasional issues. We appreciate your understanding and feedback as we continue to improve the service.

This is a cloud-based service. Your data will be sent to our servers for processing.

  • Do NOT upload any Personally Identifiable Information (PII)
  • Do NOT upload any sensitive or confidential data
  • Do NOT upload any data you don't have permission to share
  • Consider anonymizing or pseudonymizing your data before upload
  • Review your organization's data sharing policies before use

🌐 TabPFN Ecosystem

Choose the right TabPFN implementation for your needs:

  • TabPFN Client (this repo): Easy-to-use API client for cloud-based inference
  • TabPFN Extensions: Community extensions and integrations
  • TabPFN: Core implementation for local deployment and research
  • TabPFN UX: No-code TabPFN usage

🏁 Quick Start

Installation

pip install tabpfn-client

Basic Usage

from tabpfn_client import init, TabPFNClassifier, TabPFNRegressor
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load an example dataset

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Use it like any sklearn model
model = TabPFNClassifier()
model.fit(X_train, y_train)
# Get predictions
predictions = model.predict(X_test)
# Get probability estimates
probabilities = model.predict_proba(X_test)

Best Results

For optimal performance, use the AutoTabPFNClassifier or AutoTabPFNRegressor for post-hoc ensembling. These can be found in the TabPFN Extensions repository. Post-hoc ensembling combines multiple TabPFN models into an ensemble.

Steps for Best Results:

  1. Install the extensions:

    git clone https://github.com/priorlabs/tabpfn-extensions.git
    pip install -e tabpfn-extensions
  2. from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier
    
    clf = AutoTabPFNClassifier(max_time=120) # 120 seconds tuning time
    clf.fit(X_train, y_train)
    predictions = clf.predict(X_test)

See our Colab example

🔑 Authentication

Load Your Token

import tabpfn_client
token = tabpfn_client.get_access_token()

and login (on another machine) using your access token, skipping the interactive flow, use:

tabpfn_client.set_access_token(token)

🤝 Join Our Community

We're building the future of tabular machine learning and would love your involvement! Here's how you can participate and get help:

  1. Try TabPFN: Use it in your projects and share your experience
  2. Connect & Learn:
  3. Contribute:
    • Report bugs or request features through issues
    • Submit pull requests (see development guide below)
    • Share your success stories and use cases
  4. Stay Updated: Star the repo and join Discord for the latest updates

📊 Usage Limits

API Cost Calculation

Each API request consumes usage credits based on the following formula:

api_cost = (num_train_rows + num_test_rows) * num_cols * n_estimators

Where n_estimators defaults to:

  • 4 for classification tasks
  • 8 for regression tasks

Per day the current prediction allowance is 5,000,000 cells. We will adjust this limit based on usage patterns.

Monitoring Usage

Track your API usage through response headers:

  • X-RateLimit-Limit: Your total allowed usage
  • X-RateLimit-Remaining: Remaining usage
  • X-RateLimit-Reset: Reset timestamp (UTC)

Usage limits reset daily at 00:00:00 UTC.

Size Limitations

  1. Maximum total cells per request must be below 100,000:
max_cells = (num_train_rows + num_test_rows) * num_cols
  1. For regression with full output (return_full_output=True), the number of test samples must be below 500:
if task == 'regression' and return_full_output and num_test_samples > 500:
    raise ValueError("Cannot return full output for regression with >500 test samples")

These limits will be increased in future releases.

Access/Delete Personal Information

You can use our UserDataClient to access and delete personal information.

from tabpfn_client import UserDataClient

print(UserDataClient.get_data_summary())

📚 Citation

@article{hollmann2025tabpfn,
 title={Accurate predictions on small data with a tabular foundation model},
 author={Hollmann, Noah and M{\"u}ller, Samuel and Purucker, Lennart and
         Krishnakumar, Arjun and K{\"o}rfer, Max and Hoo, Shi Bin and
         Schirrmeister, Robin Tibor and Hutter, Frank},
 journal={Nature},
 year={2025},
 month={01},
 day={09},
 doi={10.1038/s41586-024-08328-6},
 publisher={Springer Nature},
 url={https://www.nature.com/articles/s41586-024-08328-6},
}

🤝 License

This project is licensed under the Apache License 2.0 - see the LICENSE.txt file for details.

Development

To encourage better coding practices, ruff has been added to the pre-commit hooks. This will ensure that the code is formatted properly before being committed. To enable pre-commit (if you haven't), run the following command:

pre-commit install

Additionally, it is recommended that developers install the ruff extension in their preferred editor. For installation instructions, refer to the Ruff Integrations Documentation.

Build from GitHub

!git clone https://github.com/automl/tabpfn-client
%cd tabpfn-client
!git submodule update --init --recursive
!pip install -e .
%cd ..

Build for PyPI

if [ -d "dist" ]; then rm -rf dist/*; fi
python3 -m pip install --upgrade build; python3 -m build
python3 -m twine upload --repository pypi dist/*