In this post, we start from the target-aware tensor E2, pass it through column-wise and row-wise transformer blocks, and end with dataset-wise ICL, where test row representations attend to labeled training row representations.
New post: [P28] Architecture of TabICLv2: compression-then-ICL. How TabICLv2 compresses target-aware feature tokens into row representations, then uses dataset-wise in-context learning to predict test rows.
This post covers the next architectural step: compression-then-ICL.
The next post covers how TabICLv2 compresses these tokens into row representations, adds target information again at the row-token level for labeled rows, and then performs dataset-wise in-context learning.
[P27] Architecture of TabICLv2: target-aware embedding. How TabICLv2 uses target-aware embedding to add training labels to tabular in-context learning tokens while preventing label leakage in test rows.
With this post, I am starting a six-part miniseries on the architecture of TabICLv2. The goal is to cover the architecture one subsection at a time, so each post can focus on the details needed to understand that component without making a single article too long.
[P26] Architecture of TabICLv2: repeated feature grouping. A technical guide to TabICLv2 repeated feature grouping: why similar columns confuse encoders, how circular shifts add context, with NanoTabICL implementation.
This GitHub repository (soda-inria/nanotabicl) provides a short (~170 lines of code) self-contained implementation of the TabICLv2 architecture for educational and experimental purposes. It's a good point to start before diving into the full model's code.
Tabular foundation models TabPFN and TabICL are pretrained on synthetic data. The data generation mechanism is termed prior. The picture shows the high-level structure of the synthetic dataset generation prior of TabICL v2. Read more: arxiv.org/pdf/2602.11139.
Are tabular foundation models the same as large language models?
Picture 1: the answer. Picture 2: adaptions of LLMs to tabular data (source: TabICL v2 paper arxiv.org/pdf/2602.11139).
138 Followers 2K FollowingColombian drug lord Pablo Escobar made so much money, he spent $2,500 every month just on rubber bands to bundle up his stacks of cash.
1.4M Followers 2 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8d1e5 or download the app.
31K Followers 149 FollowingAI infrastructure that developers love 💚
Run inference, sandboxes, batch processing, training, and many other things on Modal
250K Followers 2K FollowingThe world's leading publication for data science and artificial intelligence professionals.
Submit an Article ✍️ https://t.co/57pIMegK1o
1.4M Followers 279 FollowingThe engine room of @Google. Building AI safely and responsibly to solve the world’s most complex problems. Join us: https://t.co/jUHQA27iBL
4.9M Followers 4 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6LgzPA
1.3M Followers 2 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.