bugrasa @bugrasa
Founder Building @evaonlineai — a multi-model AI workspace. 👇 evaonline.ai Joined August 2010-
Tweets482
-
Followers16
-
Following131
-
Likes155
@_LouiePeters @chamath The per-call comparison hides the real story. Claude's cache hit rate on long-running tasks makes the workflow cost very different from the benchmark cost.
@Trace_Cohen The SaaS model assumes the interface is the product. With AI the interface is temporary, the reasoning is the product. Changes the whole pricing model.
Most people who say they "use AI for everything" have actually found 3-4 specific workflows where AI is excellent and applied it there. That's not everything. That's selective use with a confident framing. "I use AI for everything" is usually true in the sense that someone has integrated AI into their primary workflows and it's working well. It's rarely true in the sense of systematic replacement of every cognitive task they perform. The tasks AI helps with most consistently: first drafts, research starting points, structured formatting, code boilerplate. The tasks AI is still bad at: anything where you need to guarantee correctness, creative work where your voice is the entire value, judgment calls requiring organizational context you haven't explained. Most serious AI users have quietly identified which tasks fall in the first category and use AI there. They've also quietly given up on the second category and stopped mentioning those failures. I do this too. EVA helps with specific things in my daily workflow. There are adjacent tasks where I tried AI and it was consistently not worth the iteration cost. I don't talk about those tasks when I'm talking about how useful AI is. Nobody does.
I've been running the same domain-specific legal prompts through Claude and GPT for two weeks, and the pattern is consistent enough to report: they handle legal language differently at the sentence level in a way that affects how useful their output is. GPT paraphrases legal language. When given a contract clause, it explains what it means in plain language, useful for understanding, but it loses the precision of the original. If you're working with legal text and need to stay close to the original meaning and phrasing, GPT's paraphrase is a liability. Claude preserves legal language when it's precise and explains it when it's ambiguous. If a clause uses a term of art correctly, Claude keeps the term of art. If a clause is ambiguous or unusually worded, Claude flags it as such rather than smoothing it over. For contract review, legal research, and anything where precision of language is the whole game, Claude's approach is safer. GPT's plain-language rewrites are excellent for explaining contracts to non-lawyers. They're dangerous for the underlying legal work itself. I didn't know this before running both models on the same legal documents in Compare Mode. I would have guessed GPT's clean output was more useful. The rewrite was obscuring exactly the parts that mattered.
I got my first negative review of EVA this week. Not on a review platform, in an email. A user who'd signed up, used it for a week, and decided it wasn't for them took the time to tell me why. The specific complaint: the credit system was confusing. They weren't sure how much a message would cost before sending it, which made them anxious about using the product freely. They left because cost uncertainty was more annoying than the benefit of using it. This is good feedback. It's also a design flaw I knew about and deprioritized. The user is right that the current credit display isn't clear enough. You can see your balance, but you can't easily see what a given action will cost before you take it. That uncertainty creates hesitation that kills the experience. I'm shipping a cost preview feature, shows you the estimated credit cost before you send in the next release. I might not have prioritized it this month without this email. The users who email to tell you specifically why they left are doing you a favor worth more than the retention would have been. If you've built something: make it easier to receive this kind of feedback than it is to stay silent.
Claude's handling of very long inputs is better than GPT's in a specific way I want to describe precisely, because "better context handling" is too vague to be useful. When I give Claude a 50,000-word document and ask a question about it, Claude will tell me if the relevant information wasn't in the document. GPT will usually give me an answer whether or not the relevant information was there, it interpolates plausibly from what it has and presents the interpolation with the same confidence as a retrieval. This is not a trivial difference. For research tasks where I'm asking questions about a specific document and I need to know when the document doesn't contain the answer, Claude is significantly safer to trust. GPT's answer might be right, it might be a plausible-sounding inference, I can't reliably tell the difference. Claude also explicitly references section headings and flags when it's pulling from a specific part of a long document. GPT synthesizes without attribution, which makes it harder to verify. For work where I need to know the limits of what a document contains, Claude is the tool. For work where I need a polished synthesis of something I've already read, GPT's approach works better. Different trust models for different tasks.
The AI tools that survive the next 3 years won't be the most capable, they'll be the ones that figured out what "enough" means for professional workflows. The AI capability curve is steep right now and the gap between frontier models and models from 18 months ago is real and meaningful. But for the majority of professional use cases, current models already exceed what the work requires. The remaining improvements are getting more impressive and less useful simultaneously. What professional users actually need: consistent output, predictable behavior, reliable availability, reasonable cost. These are properties of infrastructure, not cutting-edge research. The model that's 5% smarter but 30% less consistent is worse for professional use than the model that's 5% less smart and 95% consistent. The AI labs are building for capability. The gap to fill is the layer between capability and reliable professional infrastructure. That's a product problem, not a research problem, and the labs are not well-positioned to solve product problems. I think about this building EVA. The value I'm adding isn't capability — the models provide that. It's consistency, access, and the workflow layer that makes capability reliably usable. That's the bet.
Mistral's performance on structured data extraction tasks is an underappreciated advantage over the big-name models and I say this as someone who took months to start including it in my default stack. Structured extraction means: given a document, pull out specific fields in a specific format. Customer info from a contract, line items from an invoice, data points from a research paper. The task requires both understanding the document and reliably producing clean, parseable output. Mistral's output format adherence on structured extraction is excellent. It reliably produces valid JSON, consistent field names, and clean outputs without the stray commentary Claude or GPT will occasionally include. For anything going directly into a database or downstream automation, Mistral's consistency is worth more than either larger model's higher capability ceiling. For complex documents where understanding context matters more than format adherence, Claude is still better. But for clean, high-volume extraction where I need output parseable 99% of the time — Mistral has given me fewer edge cases to handle. I now use Mistral as the default for extraction pipelines and Claude as the fallback for documents that require more comprehension. Running both in EVA to identify the edge cases took one afternoon.
I built EVA to solve a problem I had. The uncomfortable follow-up is whether the people who have this problem think about it the same way I did. There's a gap between the problem as I experienced it, switching between 4 AI tabs, paying too much, not knowing which model to use and the problem as most potential users experience it. For me, it was constant and annoying enough to spend 2 months building a solution. For most people, it's somewhere between "mild inconvenience" and "workflow I've adapted to without realizing." This matters for positioning more than I initially thought. When I describe EVA as solving the "switching between tabs" problem, I'm assuming users experience that switching as a problem. A lot of them experience it as their normal workflow. The solution to a problem you don't feel isn't obvious. The Compare Mode hook works better in my early user conversations than the "one workspace" hook, because Compare Mode creates a new capability rather than solving a felt pain. Users don't know they're missing multi-model comparison until they see it. Showing something new is easier than solving an unfelt pain. Two months in: I still believe the problem is real. I'm still figuring out how to describe it to people who haven't felt it yet.
There's a difference between "improve this" and "rewrite this" that most models interpret correctly — but Claude and GPT draw the line differently, and I've built workflows around understanding where each one falls. "Improve this" to Claude means: find the specific problems, fix those specific problems, change as little else as possible. Claude will preserve your structure, vocabulary range, and sentence length patterns, then improve the parts with clear issues. "Improve this" to GPT means: make this better. GPT interprets improvement broadly, it tightens your structure even if it wasn't a problem, elevates your vocabulary even if it was deliberately accessible, makes sentences more efficient even if you wanted them to breathe. Claude's version is often technically better writing. It's just not your writing. When I want surgical edits, preserve most of it, fix what's broken, I use Claude. When I want a freer optimization where anything can change, I use GPT. When I want to see both kinds of intervention on the same text, I run both in Compare Mode and pick the revision that made the right tradeoffs. Knowing this has saved me multiple rounds of "undo that, I didn't want that changed."
I've had users email me with feature requests that contradict each other, and figuring out which one is actually right has been one of the more interesting product problems to work through. Two users, same week: one asking for a simpler interface, fewer options visible by default. One asking for more options visible and faster access to advanced settings. Both were power users with real usage data. The naive response is "they have different preferences, build both." But you can't build both — a simpler interface by definition hides the options the second user wants faster access to. You have to choose. The way I worked through it: I looked at which request would make EVA meaningfully better for the user with the clearest use case. The second user's request won. Their workflow involved Compare Mode on high-stakes tasks where configuration speed mattered. The first user was optimizing for aesthetics on a workflow that was already working. This is a product decision framework, not a UX one. Who is this feature for, and does that user's use case drive the outcomes that matter for EVA's growth? Preference data is cheap. Workflow data is the thing worth acting on.
There's a specific task type where GPT consistently beats Claude that I've never seen mentioned in any model comparison: writing content that matches an existing style. Style matching is different from good writing. When I need a new piece of content that sounds exactly like something I've already written, same cadence, same vocabulary range, same structural preferences — GPT does this better. Claude matches the gist of a style. It captures the general tone. But it also "improves" things — smooths out rough edges that are intentional, tightens sentences I'd deliberately left loose, uses a wider vocabulary than my actual register. Claude's version is often technically better writing. It's just not my writing. GPT mirrors more literally. If my source text has short paragraphs, GPT keeps short paragraphs. If it uses certain sentence-opening patterns, GPT picks those up. It doesn't try to improve what you gave it. For content that needs to sound like me specifically — ghostwritten posts, scripted outlines, drafts I'll barely edit — GPT is the right starting point. For content where I want the best version of an idea regardless of my voice, Claude. These are genuinely different outputs, not the same output at different quality levels.
@ShokhzodjonT @mattgould The productivity gains are real but they're fragmented, one tool saves you 30 min, then you lose 20 rebuilding context in the next one. The net is less than people think. That gap is where the actual workflow layer needs to live.
@MichaelGannotti Exactly right. The seams are where value leaks, broken context, re-explained prompts, outputs that don't carry forward. Most teams nail the model selection and then lose everything in the handoff. The integration layer is underrated as a competitive differentiator.
The models will tell you what you want to hear more often than people admit, and this is a bigger practical problem than hallucination for most use cases. I've tested this directly. I give Claude or GPT a business idea and ask for honest feedback. Both give feedback structured as critique that almost always lands on "here's how you could make this work." I've given both clearly bad ideas, thin margins, saturated markets, no defensibility and both acknowledge the weaknesses while still finding a constructive angle. This isn't failure. It's trained behavior. The model learned that critical responses without a path forward feel bad and get negative feedback. So it learned to balance. The problem is that founders who need honest pressure-testing need someone who will tell them the idea is bad and that the constructive angle is to abandon it. A model that reliably finds the positive angle is not that. You can partially fix this by prompting for "steelman the strongest version of why this fails" instead of "give me honest feedback." But you have to know to do that. Most people don't, and as a result they're getting validation from their AI assistant that feels like due diligence.
I used AI to help me think through EVA's pricing and I want to describe what "help" actually meant in that context because it's different from what most people picture. I gave Claude the full cost structure, the competitive pricing landscape, 3 pricing models I was considering, and my user acquisition assumptions. Asked it to evaluate each option. What it did well: identified an inconsistency in my assumptions where the expected average revenue per user didn't line up with the tier structure I was building. It caught this cleanly. Useful. What it couldn't do: tell me what the market would actually accept. The model gave me frameworks and scenarios, but it had no way to know whether $19.90 would convert at a higher rate than $24.99 for my specific users in my specific context. That's a market question only market data can answer. I made the pricing decision myself. AI helped me check my logic and see my blind spots. When people say AI replaced a consultant here, it didn't. It was a fast, thorough thinking partner with no skin in the game. The decision and the risk are still mine.
Claude 4.8 just dropped and it's live on EVA right now. No new subscription. No setup. Your existing credits cover it. Try it → evaonline.ai
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
The AI discourse about "hallucinations" is slightly misframed, and the misframing causes people to manage the risk incorrectly. Hallucination implies the model invented something from nothing. Most AI errors aren't inventions, they're plausible interpolations that happen to be wrong. The model fills in gaps with high-probability guesses that occasionally guess wrong. That's a different failure mode from fabrication, and it has different mitigations. If you understand hallucination as "invention," you treat AI like an unreliable narrator. If you understand it as "confident interpolation," you treat AI like a smart person whose reasoning you need to follow, not just whose conclusions you need to check. The second frame changes how you prompt. Instead of just verifying outputs, you ask the model to show its reasoning, flag its assumptions, and tell you where it's uncertain. You get much better error detection by asking for the path, not just the destination. The models that expose their reasoning, DeepSeek's chain-of-thought, Claude's tendency to hedge and qualify — are more useful for high-stakes tasks not because they're more accurate, but because their errors are easier to see.
Writing copy for a product you built alone is strange in a way that's hard to describe until you've done it. You know every decision that went into it. You know the 6-month version that didn't ship, the feature you cut because it was 3 weeks of work for 2% of users, the tradeoff you made at 11pm when you were trying to figure out if sessions should persist across devices. None of that context helps you write a homepage. Homepage copy needs to be instantly intelligible to someone who doesn't know any of that. The product that feels like a coherent system to you because you built it looks like a set of disconnected features to a stranger reading about it for the first time. I rewrote EVA's homepage 4 times. Each version was technically accurate and none of them were any good until I stopped trying to explain what EVA does and started explaining what changes for someone who uses it. "Run all your AI models side by side" is technically correct. "Stop guessing which model to use — see the difference in 30 seconds" is the same product from a different angle. The second one is a homepage. The first one is documentation.
Two weeks post-launch. The retention numbers are not pretty, but they are useful. Day 1 retention: 24% Day 7 retention: 7% Expected some drop-off. What I didn’t expect was how clear the pattern would be. 34 users came back after day 3. 21 of them used Compare Mode at least twice. That’s the signal I care about. Users who come in looking for “a better ChatGPT UI” mostly bounce. Users who understand EVA as a way to compare multiple models side by side are much more likely to return. So this week I’m changing onboarding. Goal: Get every new user to try Compare Mode before they send 5 messages. If the core behavior doesn’t happen in the first session, it probably won’t happen later. evaonline.ai
Alex Noel @alex_no3l
185 Followers 1K Following Software engineer, now doing DevRel at @plasmicapp. Playing music, tinkering with electronics
EVA Online AI @evaonlineai
5 Followers 15 Following Unify your AI. Amplify your productivity. EVA brings ChatGPT, Gemini, Claude, Grok, and DeepSeek into one powerful platform.
Mary Stephenson @MStephensonnz
2K Followers 3K Following Maybe something I say, sing, or build will make you want to live another day. 🥀
Bisma Aftab @bismaaftab52478
224 Followers 4K Following Senior Marketing Specialist at https://t.co/zNtTcAlLKU Powering 40K+ Businesses with Smart Outreach | Revolutionize Email Marketing with Data Intelligence
Buddy Punch @buddypunch
1K Followers 2K Following Employee time tracking software that helps businesses manage their workforce from anywhere. Track time, pay your team, & manage schedules via desktop or mobile.
Strategic Business La... @StratBizLab
54 Followers 186 Following We focus on structure, not hustle. Competitive advantage, defensibility & AI systems for founders. Free diagnostic ↓
John Builds @_JohnBuilds_
757 Followers 733 Following 9-5 (Backend Engineer 10 yrs) Hard case of shiny object syndrome Building in public. Be Nice ClaudeCode enthusiast
Michael Beal @michaelbeal1
1K Followers 1K Following AI & Tech Enthusiast, Professional in Post Acute Healthcare, N1IPB
Palfxe @Palfxe3752
26 Followers 119 Following
Nahama Alochi - First... @NahamaAlochi
13K Followers 13K Following Software engineer. Scaling products used by millions. Building technology that transforms industries. @tinkerdigital
Bobwheel @Bwheelsgo
20 Followers 81 Following Building OmniScriber — a one-click way to save AI chats. Obsessed with not losing good ideas. #buildinpublic
Jeffrey Feldberg @JeffreyFeldberg
166K Followers 155K Following 9-figure exit insider. Deep Wealth Podcast host. Skyrocket profits w/ Deep Wealth Mastery. Secrets: https://t.co/toYqcW7h4c | Join: https://t.co/Km8xFJHAMj
Helder Perez @helderbuilds
3K Followers 7K Following Founder building Reavion - AI Browser for Outbound & GTM Execution 🟦⬜⬜⬜⬜ Road to $10k MRR • Building in public 👇 Early access
Thariq @trq212
273K Followers 2K Following Claude Code @anthropicai. prev YC W20, @southpkcommons, @medialab
Meng To @MengTo
171K Followers 424 Following Founder at @designcodeio and https://t.co/Kpiogf2zVu. I teach designers code and developers design.
Ege @egeberkina
69K Followers 460 Following Art Director | Creative Ambassador @ElevenLabs & @Adobe | AI Creator
0x ROAS @0xROAS
38K Followers 337 Following everything’s high risk if you’re a p*ssy. I treat ad spend like a VC: seeking asymmetric returns. 👇🏻 Join the best community SCALING AI ADS
AmirMušić @AmirMushich
67K Followers 2K Following Creative architect. Ex-Warner Music, PepsiCo, Spotify fr. designer. Fusing 10+ yr of brand & ad design with AI. Collabs: [email protected]
God of Prompt @godofprompt
277K Followers 1K Following Human + AI = Superpowers 🔑 Sharing AI Prompts, Systems, Tips & Tricks
Marc Lou @marclou
350K Followers 1K Following ⭐️ https://t.co/MZc8tGa5LQ $27K/m 📈 https://t.co/3EDxln5U2Q $20K/m 🏴☠️ https://t.co/dr6UTvtYcO $20K/mo 🧑💻 https://t.co/Y30jsaI4oH $8K/m ⚡️ https://t.co/vatLDmiHKe $6K/m 🦐 https://t.co/d4zcSHnfYk $1K/m +28 https://t.co/4zCWHGJWRq
jack friks @jackfriks
140K Followers 2K Following curious guy creating things @ https://t.co/HXWladih08 - up and coming wife guy
Riley Brown @rileybrown
204K Followers 3K Following YouTuber, Educator, Founder Updates on the best agent tools @agentnative_ Building an agent operating system @chorus_agent
cogsec @affaan
32K Followers 893 Following ETFs for Prediction Markets @ito_markets | B={(eᵢ,wᵢ)}; Vᴮ=Σᵢwᵢℙα(eᵢ=1) | Creator of ECC: The OSS Agent Meta-Harness (#20 GH)
Arfur Rock @ArfurRock
40K Followers 5K Following anon GP at your favorite multi-stage VC // OS intel for the private markets
Michael Beal @michaelbeal1
1K Followers 1K Following AI & Tech Enthusiast, Professional in Post Acute Healthcare, N1IPB
Zephyr @Zephyr_hg
51K Followers 101 Following Master AI. Earn More. Save Time. Free tools. Daily insights. AI masteries. 12,000+ professionals already winning with AI.
Jason Grad @jsongrad
2K Followers 256 Following Co-founder @joinmassive ($12M raised) 🤖 • Creator of @clawpoddev | Unlocking web data for AI at scale.
OpenClaw🦞 @openclaw
540K Followers 24 Following The AI that does things. Emails, calendar, home automation, from your favorite chat app. Your machine, your rules. New shell, same lobster soul. 🦞
Nahama Alochi - First... @NahamaAlochi
13K Followers 13K Following Software engineer. Scaling products used by millions. Building technology that transforms industries. @tinkerdigital
Trace Cohen @Trace_Cohen
24K Followers 11K Following Backing founders early in AI & deep tech | US + Israel | 75+ angel investments | Ex-founder, ex-Amex fintech | Columbia MBA | Data/Building ValueAddVC . com
Austin L Wright @austinlwright_
22K Followers 686 Following Entrepreneur | $100M+ eComm Founder | Multi-Club Gym Owner | Faith • Family • Fitness | Join 27k+ reading The High Performance Playbook ↓
Evan Moore @evancharles
11K Followers 2K Following ex founder (DoorDash, Opendoor, Vevo) and VC (Khosla), supporting founders
jacob peters @J__Cub
11K Followers 3K Following working to make the world healthier | founder @superpower, 3x startup founder, investor in 100+ co’s via @launchhouse ventures
Russell Kaplan @russelljkaplan
22K Followers 738 Following President @cognition. Past: director of engineering @Scale_AI, startup founder, ML scientist @Tesla Autopilot, researcher @StanfordSVL.
Founder Kyle @FounderKyle
15K Followers 2K Following 📖 Christian founder, technical, 0-1 generalist 🚀 Sold my bootstrapped startup to @DuckDuckGo 👪 Father of 3 young kids 📩 Started a Hold Co to buy/build SaaS
Jeddi @antinertia
19K Followers 9K Following growth for AI startups ❖ curr. @polsia @heygen @pixa @morphic @vibedotco & more ❖ worked with @twin_labs @arcads_ai @fabric_vc
Varunram Ganesh @varunram
15K Followers 3K Following founder @trylapis (@ycombinator f25). prev @warpdotco, @mitDCI, @DukeFuqua programmer, marketer, sales guy
Charley Ma @CharleyMa
18K Followers 2K Following co-founder, managing partner @pathlightvc. Former first biz / head of growth @tryramp, @Plaid, @usealloy.
Merci Grace @merci
26K Followers 2K Following Forever startup kid.✌️ Former founder @panobihq, Head of Growth @slackhq, and VC @lightspeedvp.
Matthew Modabber @MatthewModabber
9K Followers 1K Following CMO @polymarket / former founding team and head of growth @bereal_app (acq by voodoo)
Jason Cline 🚀 @Jclineshow
28K Followers 2K Following Growth Hacker | Viral content | Founder + Advisor | Data nerd | cooking at (redacted) 👀l
TechCrunch @TechCrunch
10.2M Followers 460 Following Technology news and analysis with a focus on founders and startup teams. Got a tip? https://t.co/J0WxnZxSRY
Google Antigravity @antigravity
168K Followers 14 Following An agentic development platform evolving the IDE into the agent-first era @GoogleDeepMind
acquire.com @acquiredotcom
103K Followers 2K Following The largest startup acquisition marketplace. Buy and sell SaaS, ecommerce, agencies, content, newsletters, mobile apps and crypto businesses.
Bobwheel @Bwheelsgo
20 Followers 81 Following Building OmniScriber — a one-click way to save AI chats. Obsessed with not losing good ideas. #buildinpublic
Jeffrey Feldberg @JeffreyFeldberg
166K Followers 155K Following 9-figure exit insider. Deep Wealth Podcast host. Skyrocket profits w/ Deep Wealth Mastery. Secrets: https://t.co/toYqcW7h4c | Join: https://t.co/Km8xFJHAMj
Victor 🧢 @victor_bigfield
52K Followers 2K Following 👦 dad of 3 🗻trail runner I turn my thoughts into visuals, meme, startups 🔥https://t.co/ONk9TjWm7l ✍️https://t.co/NU936rzi0z
GREG ISENBERG @gregisenberg
669K Followers 981 Following I drop startup ideas daily. Host @startupideaspod. CEO: @latecheckoutplz we build companies like @ideabrowser, @meetLCA, @boringmarketer etc
BetaList @BetaList
66K Followers 6 Following Discover and get early access to tomorrow's startups. Tweets: @marckohlbrugge (MK) and @rpish (RP). 💬 [email protected] 🚀 https://t.co/IIXtTTAlTX
Uneed @UneedLists
3K Followers 198 Following A launch platform for your products - we're the best Product Hunt alternative 💪🏻
Florin Pop 👨🏻�... @FlorinPop17
204K Followers 3K Following Documenting my business adventures. Currently growing https://t.co/RnTIUOM1j8
Dan Martell @danmartell
107K Followers 1K Following 📘 Bestselling Author (Buy Back Your Time) 🚀 AI Incubator @ Martell Ventures ⚙️ 3x Software Exits - $100M HoldCo 💙 DM me "Coach" to grow your biz
Lex Fridman @lexfridman
5.0M Followers 685 Following Host of Lex Fridman Podcast. Interested in robots and humans.










