You are right @badlogicgames I copied codex exec_command and write_stdin into Pi Agent.
Then compared its performance to the plain bash tool. The result supprised me. Async bash almost lost in every task.
@deedydas Would love to know if the results change using different agents. They only tried using mini-SWE-agent.
@lateinteraction - wonder if dspy.RLM could have a crack at this.
Introducing ml-intern, the agent that just automated the post-training team @huggingface
It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem.
It can pull off crazy things:
We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%.
In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%.
For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously.
How it works?
ml-intern makes full use of the HF ecosystem:
- finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets
- browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data
- launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains
ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like.
Releasing it today as a CLI and a web app you can use from your phone/desktop.
CLI: github.com/huggingface/ml…
Web + mobile: huggingface.co/spaces/smolage…
And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.
🧐Applying world models to improve real-world policy on challenging manipulation tasks used to be considered out of reach.
😌After sustained effort, we’re now seeing encouraging progress.
🚀Thrilled to introduce RISE: Self-Improving Robot Policy with Compositional World Model
The new generation of open state-of-the-art single and multi-vector retrieval models is here
It's time, DenseOn with the LateOn 🎶
@LightOnIO releases models that leap past existing ones, and everything you need to do the same!
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see.
@eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!
@demishassabis I'm not trying to misrepresent anyone, and perhaps my Googler friends are misinformed. But I strongly suspect that by my own notions of what constitutes advanced AI adoption--and indeed, what most of the industry would expect from Google right now--you are not doing great.
At
@DavidGFar This is awesome. How far can you take this?
Are we at a point where you could train on the Hermes agent traces (huggingface.co/datasets/lambd…) to get a lightning fast routing head for an agent to select the right tool?
@BatsouEleffindsubstack.com
I built a newsfeed for Substack that shows only long-form posts from the last 24 hours.
Already discovering way better writers.
"It seems to me that there will quickly reach a point where we can treat computers in much the same manner as we treat fellow humans, without ever assuming that they are human or should be.
For instance, I think it not unreasonable to ask a computer to understand me (maybe someday in natural language), to cooperate with me, to take some initiative on its own, and to make life simpler for me. It is reasonable for the computer to not understand occasionally, and to need clarification, or even for it to screw up and do as I said, and not what I meant."
- The Mind's I - Jan 21 1983 usenet
@ThePrimeagen I'm building findsubstack.com
A newsfeed for Substack that shows only long-form posts from the last 24 hours.
Already discovering way better writers.
An interesting side effect of vibe coding is the utter fragmentation of open source libraries.
How can we convince people to consolidate their efforts into one project rather than "I can make a better version with AI", which they have no interest in maintaining?
3K Followers 4K FollowingAI Engineer Building InfScale — open-source inference-time scaling for LLMs. Smarter LLM outputs on a budget.
https://t.co/VTiLAvdAAw
503 Followers 2K FollowingAI beyond the hype. Real insights, real breakthroughs, real methods. Philosophy, benchmarks, quantization, hacks—minus the marketing smoke. Injecting facts into
689 Followers 7K FollowingI love researching and creating (intangible) things, mostly in mathematical optimization, foundations of mathematics and computing, AI, physics, theoretical CS
390 Followers 7K FollowingA dreamer and an avid learner. Art and brains fascinate me but hearts put me in awe. My views are my own and don’t represent my employer in any way.
861 Followers 2K FollowingWe provide low cost access to space through a range of innovative products and services #ReFi #AegaionBlockchain #GreenManufacturing #IoT
3K Followers 1K FollowingWestern Canada’s largest angel syndicate • Pre-Seed to Series A • Fueling tech startups across Canada & USA • Pitch us, invest, scout👇
346K Followers 71K FollowingNo @PressSec affiliation. Helping President Trump drop politically incorrect truth bombs — and triggering the woke groups Democrats shield. MAGA 🇺🇸
2K Followers 7K FollowingVirtuous like water, for water does not compete... Wandering alone like a rhinoceros, valueing freedom... Or eating his wings variable to be still as a stone...
104K Followers 174 FollowingTurn ideas into professional videos in minutes. #1 on G2 for Most Realistic AI Avatars. Try it for free at https://t.co/xhvVUYZXQ2
3K Followers 4K FollowingAI Engineer Building InfScale — open-source inference-time scaling for LLMs. Smarter LLM outputs on a budget.
https://t.co/VTiLAvdAAw
3K Followers 884 FollowingAuthor of the RaBitQ quantization algorithm; Postdoc at @ETH on AI, ML System, Vector Database; prev. PhD @NTUsg; ICPC World Final;
11K Followers 17 FollowingKeeping the world free of AI slop.
This account has automated replies: Tag @pangram with 'ai?' to get an AI check on any post.
93K Followers 145 FollowingBuilding beautiful things like Mojo🔥 and MAX @Modular, lifting the world of production AI/ML software into a new phase of innovation. We’re hiring! 🚀🧠
14K Followers 35 FollowingHigh-volume account of @ESYudkowsky, the original AI alignment guy. If it's missing punctuation, it's humor. If you can't tell, it's probably also humor.
284K Followers 5K FollowingCloudflare is the world’s leading #ConnectivityCloud, and we have our eyes set on an ambitious goal — to help build a #BetterInternet.
2K Followers 567 FollowingBuilding @ NyotaAi | Indie Hacker | Building Custom AI Workflows | Ghostwriter | Working with 15+ Clients | Might be writing for your competitor | DM for collab
14K Followers 0 FollowingHigh-performance developer tools for the Python ecosystem, starting with Ruff, an extremely fast Python linter, written in Rust.
13K Followers 20 FollowingAs seen on https://t.co/KiQXWyvrkG, https://t.co/BGzobz5QJl, https://t.co/Xyj7qXGaSL, and https://t.co/XbtOiptvS8
DMS IGNORED
79K Followers 897 FollowingCreator of Flask. Building at https://t.co/uGuzfu0LKT. Bypassing Permissions. Can hand crank. Husband and father of 3 — “more nuanced in person”
58K Followers 5 FollowingPyCon US is a community-focused conference for Python users and developers. Join us in Long Beach, CA from May 13 - May 19, 2026!
29K Followers 5K FollowingA Python enthusiast and entrepreneur. Host of @TalkPython and @PythonBytes, founder of Talk Python Training. Python Software Foundation Fellow.