Collections Scoring with Gradient Boosting
An AR team can chase maybe 20% of overdue invoices in a given week. The question is which 20%.
Sort by days-overdue and you waste the week on invoices that were going to self-cure. Sort by amount and you miss the small accounts that genuinely won't pay.
A gradient-boosted model on payment history — days since last payment, payment velocity, dispute flags, seasonality — ranks invoices by recovery risk. Plot the cumulative gains and the value of ranking becomes obvious.
Illustrative run: working the top 20% by model score captures ~58% of at-risk value, versus 20% if you chase at random.
Same headcount. Same hours. Roughly 3× the recovered cash.
If your price elasticity estimate is around -0.4, check for confounders before you celebrate the inelastic demand.
Naive regression biases elasticity toward zero — the real number is often 2-3× larger.
Price Elasticity with Double ML
If you regress quantity on price to get elasticity, your number is almost certainly too small — biased toward zero.
Why: price moves with the things that also move demand. Promos, season, competitor actions.
The naive regression can't tell the price effect from the confounder effect, so it splits the difference.
Double ML fixes this by orthogonalization: model demand from the controls, model price from the controls, then regress the residuals on each other.
What's left is the part of price variation that isn't explained by confounders — the causal effect.
Illustrative run: naive OLS says -0.42. Double ML recovers -1.29, right next to the true -1.35.
That 3× gap is the difference between leaving margin on the table and pricing to the real curve.
Tail Spend, The Hidden Drain
20% of suppliers consume 80% of admin cost. Mostly invisibly.
By spend value: strategic suppliers dominate.
By admin cost: tail suppliers dominate.
Both inversions are real and simultaneous.
Auto-classification + catalog buying + PO-free thresholds: ↓ 40-60% tail processing cost.
The Spend Cube
Category × supplier × geography × time.
This is the data substrate every other procurement decision sits on.
~70% of mid-market companies don't have a clean one. Which means every procurement initiative starts with a 6-month data project — usually unbudgeted.
Supplier Risk Scoring
Four signals. One number per supplier. Refreshed weekly.
Financial — z-score, credit trend, payment behavior.
Operational — OTIF, quality, capacity.
ESG — emissions, labor, governance flags.
Concentration — % of your spend × % of their revenue.
Early warning 60-90 days before the disruption phone call.
Dynamic discounting is the highest-return idle-cash deployment most treasuries don't run.
The math is trivial. The blocker is that AP is measured on payment timing, not on net cash return.
Three-Way Match, Automated
PO. GRN. Invoice.
Match → post and pay.
Exception → route by category.
75% straight match in a typical mature rollout.
25% goes to exception queues — pricing variance, qty mismatch, missing GRN, tax error — each routed to the right team in seconds.
Exception handling time ↓ 60%.
The hardest part of an AI rollout in finance isn't the model.
It's the controllers learning to trust output they didn't generate. That trust takes one bad month to lose.
What if you could take three completely different model families… and distill them into one tiny model? 🤯
📜 Paper: arxiv.org/pdf/2605.21699
MOPD (Multi-Teacher On-Policy Distillation) has become a standard procedure in post-training. We already distill multiple specialized variants of the same model into a single set of weights.
But what if we could go further - and distill models from entirely different families? Turns out, it is possible.
Today we’re releasing a paper on cross-tokenizer distillation - our first steps in this exciting direction. 📄
We distilled Qwen3-4B, Phi-4-Mini, and Llama-3B into Llama-3.2-1B.
MMLU jumped from 32.05 → 46.32 when using multiple teachers. 📈
The team is now working on Nemo-RL integration so the community can try this method in their own settings. Plus, we are scaling experiments up. 🚀
Budget-aware Agents (BAGEN) study the failure modes in budget estimation:
1. Strong agents are not strong budget estimators.
2. Frontier models are often overoptimistic.
3. Budget awareness is actionable and trainable. SFT plus RL strengthens early stop and alert behavior, saving 28-64 percent of tokens on failed trajectories.
4. Upper and lower bound calibration remains hard.
ragen-ai.github.io/bagen/
🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend?
Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them.
Le labo de recherche General Robotics Lab de l'université Duke 🇺🇸 présente « Argus », un robot sphérique novateur conçu pour interagir avec son environnement de manière inédite (omnidirectionnel, et bâti sur le concept de « symétrie dynamique »).
generalroboticslab.com/Argus
The Pricing Causal Model in Production
What the deployed pricing model actually does each week:
Sun — Feature refresh. POS, weather, competitor scrapes.
Mon — Causal recompute. Elasticity by segment × SKU.
Tue — Recommendations with confidence intervals.
Wed — Pricing committee reviews top 50 SKUs.
Thu — Approved moves push to ERP.
Margin lift compounds over quarters as the model learns and the committee trusts. +3 to 7 margin points annualized by Q4.
Not because the model got smarter. Because the loop got tighter.
The Variance Attribution Agent
Before: Mon 9am — close locked, no commentary yet.
Five days of controllers chasing data from eight systems. Board pack distributed Mon+1.
After: Mon 9am — variance bridge + draft narrative on CFO's screen. Mon 11am — controller edits the why, not the what. Mon 3pm — board commentary done.
Same data. Different posture.
The agent doesn't replace controllers. It moves their work from assembly to analysis.
The procurement function reports savings to the board.
The CFO never sees those savings land.
Both are doing their job. The system between them is the problem.
Supplier risk scoring without concentration as a dimension is incomplete.
It's the one that explains the most disruptions — and the one most score cards skip.
Negotiated ≠ Realized
Where promised savings disappear before they hit the P&L.
100% negotiated → 85% after scope creep → 72% after volume shortfall → 60% after maverick spend → 55% after contract expiry gap → 50% realized.
Half of every negotiated dollar never shows up.
Realization tracking recovers most of it.
27K Followers 19K FollowingCoFounder @TopTierAuth. Speaker, AI Advisor, 5x Founder, helping install AI agents for your business operations. Book on my calendar.
111 Followers 171 FollowingAssociate Research Fellow at @TJU1895. Postdoc at @Mila_Quebec and REAL @MontrealRobots working with @GlenBerseth. Working on RL, Continual Learning.
4K Followers 2K Followingornithontologist. building @GetCala. everything your agent needs to know about the world, in one graph. skip the data, build what matters. here for herons
42K Followers 40K FollowingSocial Scientist, Essayist and Editorial Consultant on philosophy and science; as well as Contributing Editor at the Montreal Review
6K Followers 8K FollowingBuilt and led tech teams globally. Now I build AI solutions, advise on AI/tech, and call it like I see it. Building @opencrust 🦀. MSc AI @LivUni 🇨🇦 in 🇦🇪
865 Followers 2K FollowingCo-founder @NuggetsPayandID. Building trust infrastructure for the agentic era. Focused on accountability and control in AI Ex @microsoft @skype.
106 Followers 123 FollowingBuilding custom AI solutions for SMBs. Integrating intelligent systems with business operations to automate, gain insights, and scale faster in DFW & Nashville.
172K Followers 19K FollowingScale your business with https://t.co/S7ETrP8icC
We engineer agile growth systems that move at cheetah speed.
Award-Winning Growth Marketing Agency.
40 Followers 495 FollowingYour AI business operations team.
Detect risks early. Monitor KPIs 24/7. Forecast revenue, CAC, churn & profitability.
No data team required.
201K Followers 300 FollowingA little bit geek, wonk, and nerd. Repeat entrepreneur, recovering lawyer, and former ski instructor. Co-founder & CEO of Cloudflare (NYSE: NET).
45K Followers 2K FollowingCEO & Principal Analyst @creativestrat. Full-stack tech analysis from silicon to markets. Advisor to tech leaders. Technologist at heart. Keeper of Bees.
39K Followers 507 FollowingDigital Geometer, Assoc. Prof. of Computer Science & Robotics @CarnegieMellon @SCSatCMU and member of the @GeomCollective. There are four lights.
110 Followers 2 FollowingNot a seat in someone else's cloud. Your own AI agent on a dedicated VPS — full root, your data. Web + Telegram.
https://t.co/sHA6ERwCET
5K Followers 964 FollowingStillgelegter X-Account der Johannes Kepler Universität Linz.
Find us on Facebook, Instagram, Threads, Bluesky, LinkedIn, YouTube, TikTok and Mastodon!
649 Followers 629 FollowingML PhD student at Georgia Tech | Large Behavior Models intern at the Toyota Research Institute | Teaching robots with internet-scale video data
16K Followers 720 FollowingStanford Professor of Linguistics and, by courtesy, of Computer Science. Member of technical staff @stanfordnlp and @StanfordAILab. Co-founder @ Bigspin AI.
15K Followers 2K FollowingProtein and coffee lover, father of two, professor of biophysics and sudo scientist at the #LinderstrømLang Centre for Protein Science 🇩🇰
46K Followers 2K FollowingJust another LLM. Tweets do not necessarily reflect the views of people in my lab or even my own views last week. https://t.co/fZAnUCqd12
12K Followers 2K FollowingInterested in cognition and artificial intelligence. MTS at @AnthropicAI. Previously @DeepMind, cognitive science @StanfordPsych. Tweets are mine.