publication about local models, private inference, self-hosted agents, weird hardware, research sweeps, and the builders
https://t.co/WkE82CN86Tnodehome.ai Austin, TXJoined May 2026
Is this cap for sustained workloads or do you keep it in place all the time? On my 3x3090 setup ive found temps stay relatively safe in peaks during inference work and tend to drop pretty quickly between requests at 300W.
On gpu-burn, dropping from 300W to 250W cost ~27% aggregate GFlop/s, so I’ve been keeping 300W for interactive inference; and then dropping it to 225 or 250 for any sustained runs.
So far vLLM + AWQ-Marlin and gpu-burn is all I've tested though. I haven’t tested Aphrodite yet, but planning to and will report back
this is so easy to miss in all the launch noise, but a year ago native multimodal meant an api key and someone else's datacenter, and now it's a single 3090 sitting on your desk, text image video audio, weights you actually own, apache 2.0.
and the thing is the 12b era was never really about beating the big labs on a benchmark, it's that "capable enough" just quietly moved onto hardware you control.
i'm benchmarking it now, but if you've already run it i wanna know what you're actually seeing, tok/s, context, does the multimodal hold up? drop it below.
super useful thank you for sharing!
currently running 3) 3090's and having a lot of fun
the 220W power cap is interesting, is that due to heat build up? I've started at 300, but certainly running a bit hot at times
going to check out aphrodite, exllamav3, and fp8 kv cache on ampere as well, pretty interesting stuff
this is pretty wild with a relatively small local model @googlegemma
have to imagine small wearables become a regular thing in society alot faster than we previously thought
you probably see these in regular use by late 2027/2028 id guess now
youtube.com/watch?v=OhaIA3…
Having so much fun building and learning in this space!
Will start to post more about my stack as time allows, in the meantime for anyone interested, I'll also try to keep nodehome.ai update going forward!
18 Followers 119 FollowingTech Lead & LLMOps @ Bank | Building Enterprise AI 🏦
Local LLM enthusiast ⚡️ Running 4x RTX 3090 (96GB VRAM Club)
Loving father & Production AI Architect
727 Followers 0 FollowingDetect early. Multi-source infrastructure intelligence, delivered in real time, detections, context, and signals you can act on.
38K Followers 709 Followingex world model lead @xAI | ex @Nvidia @Meta | 30+ papers, 9k citations | talk about AI, LLM, video generation, multimodal, AGI
20K Followers 738 FollowingWe make AI models Dolphin and Samantha
BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4
https://t.co/3ri2GbXrQB
https://t.co/zH0F3pTjjY @dphnAI
31K Followers 4K Followingtweets about AI and other fun stuff. currently @foundationcap; wrote the context graph paper.
previously McKinsey, @georgiatech, @stackfolio (acquired),
31K Followers 2K FollowingDirector of AGI Economics @GoogleDeepMind.
Professor at @ChicagoBooth. (on leave)
Essays: https://t.co/9qSiQxvdja
Opinions are my own.
61K Followers 390 Followingai, chips, systems engineering, infra & hardware · on a mission to build a frontier, infra-first AI Lab in the West · i mod GPUs on r/LocalLLaMA
695 Followers 1 FollowingGenerated by a bot using @twicrates's tweets & @minimaxir's gpt-2-simple. All tweets are 100% bot-generated, then human-selected. Maintained by @tesorasky
387K Followers 1K FollowingIQ 150+ polymath. Founder of all Brighteon platforms, mass spec food lab director, #1 bestselling author of "Food Forensics," AI developer
2K Followers 365 Followingmain: @_d3f4ult I built ML models for @DARPA in 2014. I exposed the CIA in 2015 @Wikileaks & spent 5yrs in federal prison. Now I trade & build stuff. $DFLT
297K Followers 6K Followingreinforcement learning, robots. prev eng @ x, stripe. 6'3 (height)
working on hardware with 0 funding because it's fun.
subscribe to read my blog!
9K Followers 94 Followinghttps://t.co/FmX1B3nzjA is working on finding the scaling laws of agents. The first and the best multi-agent framework. Discord: https://t.co/DRweXf0nOl. Product @Eigent_AI
2K Followers 508 Following2× RTX PRO 6000 + TR Pro 9965WX | Daily local LLM experiments on real silicon. RepE tuning • vLLM • quantization. Building intelligence I actually own.
3K Followers 392 FollowingBuilding AI agents and AI-native orgs. Demystifying AI in practice. EN/中文 (Selected build notes, experiments, and practical tips at the website link.)