A few weeks ago I posted about @Benioff’s comment that Salesforce expects to spend roughly $300M on @AnthropicAI tokens this year.
The response was bigger than I expected 😄
@TechCrunch Enterprises are going to learn that not every task needs a frontier model.
Most AI work can be done more efficiently with specialized small language and nano models.
No massive data centers, just the compute that's already in our pockets.
ZeroGPU.ai
TokenMaxxxing is out!!
"Token efficiency is going to be a big theme this year… because the spend has been ramping up way faster than enterprise customers thought." @DavidSacks said this on the latest @theallinpod
Most AI tasks don’t need frontier-model reasoning.
Small language models are bridging that gap.
That’s what we’re building at @ZeroGPU_AI.
So we stopped trying to build a data center, and started on a solution.
An edge inference network built around idle compute.
Run repeatable work on small and nano language models. Frontier models stay for reasoning.
→ zerogpu.ai
$700 billion is being spent on AI compute this year.
Today a city voted to pause that spend.
The buildout is hitting a wall — and most of what it’s being built for never needed a data center at all. 🧵
Use frontier models like Claude for orchestration and reasoning.
For the high-volume, repeatable tasks that most enterprises are tapping into AI for today, use specialized models to complete work faster, more predictably and at a lower cost.
zerogpu.ai
Claude Code processes a customer feedback export, automatically hands PII extraction and redaction to purpose-built models that generates:
→ A clean version that's safe to share
→ A complete audit log of every PII entity found and removed
👩🍳Cookbook:
docs.zerogpu.ai/cookbook/claud…
Here's how to reduce costs & improve results: pair Claude Code w/ a specialized small language model.
In this example cookbook, our specialized SLM redacts PII within Claude Code.
Our router plugin lets Claude decide which tasks are pushed to our specialized, cheaper models.
Useful for for customer feedback, support tickets, extraction, classification & more.
⭐️Please consider leaving us a 5-star review on GitHub⭐️
github.com/zerogpu/zerogp…
With the ZeroGPU Router plugin, Claude Code can automatically route these tasks to purpose-built models.
You stay in Claude Code.
The repetitive work gets handed off to specialized models.
Our latest Claude Code cookbook is live.
It shows how to pair frontier models like Claude with specialized small and nano language models for high-volume, repeatable tasks.
In this case, we show how to redact PII info with Claude Code + our SLMs. docs.zerogpu.ai/cookbook/claud…
We’ve added Llama 3.1 8B Instruct, a great fit for:
→ Summarization
→ Content transformation
→ Classification
→ Data extraction
→ Customer support workflows
→ Lightweight chat and agent experiences
With our router, let AI decide which models you choose to save on costs.
Are your AI costs too high?
We’re giving developers access to a growing catalog of more efficient, specialized AI models through a single API—including leading open-source models like Meta’s Llama 3.1.
Not every task you run in @Claude Code needs frontier-model reasoning. But most AI coding workflows are still sending every request to the largest model available.
That's why we built a new plug-in that that routes lightweight workloads to specialized nano language models.
This has been our most requested feature to-date, perfect for:
- data enrichment
- classification
- offline analytics
- backfills
- so much more
Get started: zerogpu.ai
It’s a cleaner way to run large AI workloads without managing queues, workers, retries, or GPU infrastructure yourself. ZeroGPU handles the execution. You focus on the data.
Read more: medium.com/zerogpu/introd…
Our Batch API is built for AI workloads that do not need to happen in real time, helping you save on costs.
Instead of sending each request one by one:
upload a JSONL file
submit it as a batch job
retrieve the results when processing is complete
195 Followers 2K FollowingArtist exploring how AI models voice, personality and community reconfigures art, politics, and identity | Works at MOMA NY, Pompidou Paris, Pinchuk Kyiv,
285K Followers 5K FollowingCloudflare is the world’s leading #ConnectivityCloud, and we have our eyes set on an ambitious goal — to help build a #BetterInternet.
71K Followers 139 FollowingHave questions, or building something cool with Cloudflare's Developer products? We're here to help. For help with your account please try @CloudflareHelp
1.4M Followers 279 FollowingThe engine room of @Google. Building AI safely and responsibly to solve the world’s most complex problems. Join us: https://t.co/jUHQA27iBL
1.6M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
85K Followers 2K FollowingServing bold builders of the future.
Not an offer to buy/sell securities or investment advisory services. Disclosures: https://t.co/NqUCVhdnhA
101 Followers 7 FollowingAll You Can Eat Vibes 🙌 🎆 Meet your Chief Agent Officer today! Unlimited agents. No hourly rates or contracts - Just powerful AI that works for you.
236K Followers 608 FollowingThe latest rumors and developments in the world of artificial intelligence. DM to include your AI project in the email newsletter with 100k subscribers!
4K Followers 2K Following🚀 Founder @DappierAI | AI, RAG, & monetization for publishers and data providers | Built Mojiva, Powr TV, Replay | The Age of Answers is here. Let’s build.