In the next version of Zerolang
Agents get closer to the compiler
Instead of making agents edit source text, then recover meaning through format/check/build/test loops
Zerolang makes the compiler's semantic graph the program
Agents query it, patch it, and get checked edits
Coding is just one part of engineering. There’s also debugging, operating services, scaling up infrastructure, deciding what to optimize, setting up hardware and capacity, talking to users, product planning, etc. Coding is the easy part, everything else is not yet solved (but is also becoming increasingly automated).
As eval is downstream of everything, it determines whether you will spend your time optimizing the right metrics.
The current gap between academia and industry AI labs is the attitude toward eval.
In academia, the eval set is very hard to change since a) you need to explain why your eval is better and b) you need to benchmark against your cited works with the new eval and show that your work is superior.
Doing both a and b at the same time invites risky rebuttal, even if you are doing a good job on a. It is far easier to benchmark against the eval set that everyone has agreed upon.
In contrast, in industry AI labs, customer feedback is your eval set and it keeps changing to cover the long tail that you could never think of during years of PhD programs.
If the loss functions are not a good proxy for customer feedback, then you change them until both are aligned.
Thus, academia might train students who are very good at hill climbing but inexperienced in building eval sets that capture hard real cases. To move the needle, building the right eval set matters the most.
Seeing a number of benchmarks showing Opus is the best model for long-running work.
Five tips for running Opus autonomously for hours/days:
1. Use auto mode for permissions, so Claude doesn’t ask for approval
2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
3. Use /goal or /loop, to nudge Claude to keep going until it’s done
4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)
5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work
Can coding agents stay coherent over a 1 billion token budget?
Can they build Slack from scratch?
Rewrite a JAX codebase in PyTorch?
Build a C compiler in Rust?
Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.
📣 @SKhynix and @NVIDIA announce a multiyear technology partnership to codevelop next-generation memory for the global AI factory buildout.
SK hynix will codevelop memory for NVIDIA's platforms — from NVIDIA Vera Rubin to Jetson Thor — while advancing fab digital twins using @NVIDIAOmniverse libraries and applying NVIDIA CUDA-X and PhysicsNeMo to accelerate semiconductor design and manufacturing.
Read the press release: nvda.ws/4e43e0p
With Climate Doomerism fading, AI Doomerism will become as the central organizing catastrophe on the Left. It justifies their takeover of the economy and especially the information space. And it has enough pseudoscience and Hollywood storytelling behind it to seem compelling.
Composer 2.5 is now available inside Grok Build.
Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.
@elonmusk@aaditsh Exactly.
Software was the easy part.
Now the hard part begins: atoms, energy, and fabs.
The next decade belongs to those who can ship physical intelligence at scale
@jawwwn_@60Minutes There is obviously no “degree” you can get from a university that actually teaches you how to make an orbital rocket, as none of the professors know how to do it!
Copilot’s biggest issue isn’t cost. It isn’t product. It isn’t even the brand dilution.
They have the insurmountable problem of having the actual stupidest user base of any AI product.
The average Lovable user understands basic software development better than Copilot users.
From when I first started using @nextjs I always found @vercel to be absolute best in class for dx
Building this slackbot was incredibly simple with @aisdk and @chatsdk, I didn’t read the slack docs once
Always bet on TypeScript and always bet on @vercel
zerolang is the most exciting thing I've ever built
I've been programming for nearly 30 years. Since the day I learned to code, I've been obsessed. Late nights, weekends, vacations. Software has been the thing I've thought about more than almost anything else
But this feels different
I find myself thinking about zerolang constantly. Not just how to improve the language, but what it could mean for the future of software development
In many ways, zerolang started with a question:
"What if json-render was a programming language?"
json-render was built around a simple idea: AI systems become dramatically more reliable when they operate on structured representations instead of unconstrained text and code
Instead of generating arbitrary React, models generate validated UI specs
Instead of guessing, they work within a defined semantic space
zerolang takes that same idea and applies it to software itself
What if programming languages were designed not only for humans, but also for agents?
What if agents worked with semantic program graphs instead of source text?
What if compilers exposed structure, changes, diagnostics directly instead of forcing models to reverse engineer them from strings?
I believe that's where software development is heading
We've made incredible progress in models, agent harnesses and dev tools
It won't happen overnight. But I believe the way we build software will fundamentally change
We've changed the trigger word from "workflow" to "ultracode".
You can still say "use a workflow for this", but when you're clearly referring to something else, Claude won't kick off a dynamic workflow. For an explicit trigger, use "ultracode". We appreciate the feedback!
New in Claude Code (research preview): dynamic workflows.
Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks.
Use the word "workflow" in a prompt to get started.
one of the quotes i find most inspiring on a hard day:
"Whatever your hand finds to do, do it with all your might, for in the realm of the dead, where you are going, there is neither working nor planning nor knowledge nor wisdom"
Ecclesiastes 9:10
recommended reading. i really like the durability aspect of dynamic workflows. looked into how it's implemented, and while there are some minor footguns, it's smart!
392 Followers 2K FollowingWriter/Director - Creator of The Arrangement, a 10-episode crime drama currently being prepped for presentation to major streaming platforms.
90 Followers 3K FollowingPeter said to Jesus, “Lord, it is good for us to be here. If you wish, I will put up three shelters—one for you, one for Moses and one for Elijah.”
3K Followers 1K Following🧠 I ask simple questions… but they’re never really simple.💗
⚡ Your answers reveal your real IQ.
Think fast, reply honestly—only sharp minds can survive here.
25K Followers 7K Followingलिखना बेहद पसंद है- चाहे राजनीति हो या समाज या फिर इतिहास या चाहे जातिगत उत्पीड़न। (भीम आर्मी- आज़ाद समाज पार्टी) जिला मीडिया प्रभारी।
451 Followers 1K FollowingDesigner, ux/ui, illustrator, front-end, enjoyer of the journey, hater of hate, meme & pineapple lover among other giladitas.
26K Followers 317 FollowingThe official handle for #NVIDIAOmniverse. The platform for developing #OpenUSD applications for industrial digitalization and generative physical #AI.
39K Followers 4K Followingplaying with words. creating whimsical ai @ https://t.co/khnCIsqbhL. built iOS apps @xAI, @Spotify, and co-founded @imagilabs.
71K Followers 139 FollowingHave questions, or building something cool with Cloudflare's Developer products? We're here to help. For help with your account please try @CloudflareHelp
860 Followers 1K Followingexpect nothing and appreciate everything. opinions are solely my own and do not reflect the views or opinions of my employer. @neggl.es on bsky
15K Followers 19 FollowingBuilding a new class of safer, more capable AI systems we call Humanist Superintelligence: AI that is always aligned, controllable, and in service of humanity.
12K Followers 1K Followingcatholic, ai researcher, co-founder/cto of @NousResearch
alignment: whatever the opposite of yudkowsky + bryan johnson is.
blessed be God in all his designs.