Many engineers get discouraged or give up when marketing a product they’ve been tirelessly developing for months
They’re jumping ship halfway through a journey because progress feels much slower than coding
Balance your life and stay afloat during the push from 0.5 - 1
Understanding and preventing misalignment generalization
Recent work has shown that a language model trained to produce insecure computer code can become broadly “misaligned.” This surprising effect is called “emergent misalignment.” We studied why this happens.
Through this research, we discovered a specific internal pattern in the model, similar to a pattern of brain activity, that becomes more active when this misaligned behavior appears. The model learned this pattern from training on data that describes bad behavior.
We found we can make a model more or less aligned, just by directly increasing or decreasing this pattern’s activity. This suggests emergent misalignment works by strengthening a misaligned persona pattern in the model.
We also showed that training the model again on correct information can push it back toward helpful behavior. Together, this means we might be able to detect misaligned activity patterns, and fix the problem before it spreads.
This work helps us understand why a model might start exhibiting misaligned behavior, and could give us a path towards an early warning system for misalignment during model training.
openai.com/index/emergent…
94 Followers 211 FollowingI write until I figure something out.
I Shall Reside Beneath The Stars.
I Wish To Understand Life, so that I dare live it.
To be incarnated.
Life goes on.
3K Followers 2K FollowingField notes from a research engineer shipping AI systems.
Real AI problem, what I tried, what I learned.
18+ yrs · 35+ patents · 17+ papers
📩 Weekly notebook ↓
2K Followers 1K FollowingI design apps and web products that solves real world problems and put a great lasting impression on users faces / Product Designer / HealthTech/AI/Edtech/SaaS
62K Followers 396 Followingai, chips, systems engineering, infra & hardware · on a mission to build a frontier, infra-first AI Lab in the West · i mod GPUs on r/LocalLLaMA
347K Followers 523 FollowingIf you feel like buying me a coffee subscribe ⤴️
I'm only on Twitter X. I'm not on other social media
Do ur own due diligence before making investment choices
18K Followers 3K FollowingBuilding iOS & macOS apps using AI agents (Codex) What works • what breaks • why • ex GoPro, Apple, Microsoft
☕️@BrewCoffeeApp
549K Followers 2K FollowingPolyagentmorous ClawFather. Came back from retirement to mess with AI and help a lobster take over the world.
@OpenClaw🦞 + @OpenAI
386K Followers 90 FollowingPentagon Pizza Report: Open-source tracking of pizza spot activity around the Pentagon (and other places). Frequent-ish updates on where the lines are long.
5K Followers 12 FollowingWhale trades. Smart money. Fresh bets. If something's moving on you'll see it here first → https://t.co/6ITPiXgTSr
Built by @caneleo55
17K Followers 578 FollowingMaking the greatest Christian app in the world, bootstrapped to $60k/month, scaling to $1M/month. Building @tryviewtrack | Partnered with @rork
5K Followers 86 FollowingI'm dropping stuff on the TL that others like to keep secret or paywalled.
AppStore Veteran.
60+ apps published over the years.
Follow for the juice 🧃