vivis.dev @vivis_dev

AI Engineer | Tech Lead My current projects: https://t.co/xY1SAUl0g3 | https://t.co/oLvvz6I6qo linktr.ee/vivis_dev Joined June 2025

Tweets

353
Followers

51
Following

263
Likes

146

KKY @evilpsycho42

4 weeks ago

You are right @badlogicgames I copied codex exec_command and write_stdin into Pi Agent. Then compared its performance to the plain bash tool. The result supprised me. Async bash almost lost in every task.

2 4 93 14K 96

View Details

vivis.dev @vivis_dev

a month ago

@deedydas Would love to know if the results change using different agents. They only tried using mini-SWE-agent. @lateinteraction - wonder if dspy.RLM could have a crack at this.

0 0 0 2K 0

View Details

Aksel @akseljoonas

a month ago

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

136 639 5K 1.2M 6K

View Details

Jiazhi Yang @jiazhi_yang2024

a month ago

🌏 RISE is now open-sourced! github.com/OpenDriveLab/R…

Jiazhi Yang @jiazhi_yang2024

4 months ago

🧐Applying world models to improve real-world policy on challenging manipulation tasks used to be considered out of reach. 😌After sustained effort, we’re now seeing encouraging progress. 🚀Thrilled to introduce RISE: Self-Improving Robot Policy with Compositional World Model

9 71 368 75K 250

0 27 170 23K 136

View Details

Antoine Chaffin @antoine_chaffin

a month ago

The new generation of open state-of-the-art single and multi-vector retrieval models is here It's time, DenseOn with the LateOn 🎶 @LightOnIO releases models that leap past existing ones, and everything you need to do the same!

13 52 224 40K 104

View Details

Zain Shah @zan2434

a month ago

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

1K 4K 29K 5.9M 25K

View Details

vivis.dev @vivis_dev

2 months ago

@thepericulum Agreed, the browser visual editor is so handy for UI changes - cursor.com/blog/browser-v… Don't see anything similar in the Claude universe

0 0 2 300 0

View Details

Physical Intelligence @physical_int

2 months ago

Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!

59 309 2K 449K 789

View Details

vivis.dev @vivis_dev

2 months ago

@theo I never left

0 0 0 9 0

View Details

vivis.dev @vivis_dev

2 months ago

Measuring tokens/d is a good signal to weed out slop factories vs. companies that actually ship quality products.

Steve Yegge @Steve_Yegge

2 months ago

@demishassabis I'm not trying to misrepresent anyone, and perhaps my Googler friends are misinformed. But I strongly suspect that by my own notions of what constitutes advanced AI adoption--and indeed, what most of the industry would expect from Google right now--you are not doing great. At

119 7 364 196K 74

0 0 0 16 0

View Details

vivis.dev @vivis_dev

2 months ago

@jsnnsa 9/7

0 0 0 412 0

View Details

vivis.dev @vivis_dev

2 months ago

@NotNordgaren @grok how could many years of fuzzing miss something like this?

1 0 0 126 0

View Details

vivis.dev @vivis_dev

2 months ago

@marmaduke091 I think they do this to save money, but honestly not sure how it passes review

0 0 0 408 0

View Details

vivis.dev @vivis_dev

2 months ago

@DavidGFar This is awesome. How far can you take this? Are we at a point where you could train on the Hermes agent traces (huggingface.co/datasets/lambd…) to get a lightning fast routing head for an agent to select the right tool?

1 0 3 415 2

View Details

vivis.dev @vivis_dev

2 months ago

@BatsouElef findsubstack.com I built a newsfeed for Substack that shows only long-form posts from the last 24 hours. Already discovering way better writers.

1 0 1 19 0

View Details

vivis.dev @vivis_dev

2 months ago

"It seems to me that there will quickly reach a point where we can treat computers in much the same manner as we treat fellow humans, without ever assuming that they are human or should be. For instance, I think it not unreasonable to ask a computer to understand me (maybe someday in natural language), to cooperate with me, to take some initiative on its own, and to make life simpler for me. It is reasonable for the computer to not understand occasionally, and to need clarification, or even for it to screw up and do as I said, and not what I meant." - The Mind's I - Jan 21 1983 usenet

0 0 0 41 0

View Details

vivis.dev @vivis_dev

2 months ago

@aidenybai Yep, and they produce completely different results for different models.

0 1 43 2K 4

View Details

vivis.dev @vivis_dev

2 months ago

@caprikaps @venturetwins Seriously dude? This was 100% written by AI

0 0 0 41 0

View Details

vivis.dev @vivis_dev

2 months ago

@ThePrimeagen I'm building findsubstack.com A newsfeed for Substack that shows only long-form posts from the last 24 hours. Already discovering way better writers.

0 0 0 185 0

View Details

vivis.dev @vivis_dev

2 months ago

An interesting side effect of vibe coding is the utter fragmentation of open source libraries. How can we convince people to consolidate their efforts into one project rather than "I can make a better version with AI", which they have no interest in maintaining?