We released Qwen 3.6 35B A3B GGUF quants in both NTP and MTP.
The benchmark results made one thing clear: size, speed, and quality do not move in a straight line.
GPU-5 was hard to beat. If it fits, try it first.
Blog: byteshape.com/blogs/Qwen3.6-…
@KC_goes_digital We have them on our radar.
Our models are optimized for the best quality vs speed/size tradeoff. We benchmark on real tasks and measure performance across different hardware, so it’s easier to find what actually works best on your setup.
We recently released our Qwen 3.5 35B A3B quants.
If your setup can run GPU-7, you should try it.
If not, we’ve got options across all hardware.
Pi → 5090.
Blog: byteshape.com/blogs/Qwen3.5-…
@therealsol4ra Regardless, we're happy for everyone to perform independent evaluations of our models and test them out for yourselves! We're highly confident in their quality!
@therealsol4ra KLD is a distribution deviation metric measuring in this case the deviation in token generation between quants and the original model. However, it does not measure behaviour, like the model taking a different path to solve the same problem than the original model for example.
+ Hermes Agent
+ Qwen3.5 35B A3B
+ 4x parallel agents with 262k context window (each)
+ Over 200 t/s token generation + 3000 t/s prefill
+ 23.2GB total VRAM consumption on RTX 5090
It can take 5 parallel agents, 4 was the sweet spot with 2x in completion time vs 1.74x.
Dream inference.
@NousResearch@ByteShape
@ByteShape Qwen3.5-35B-A3B-IQ4_XS-4.12bpw.gguf is lethal.
It was the most efficient Qwen3.5 35B-A3B, consuming only 23GB VRAM at FULL native context, on a RTX 5090 with -b and -ub 256.
Just shy of 20 noisy points from UD-Q6_K_XL and a minute slower. However, Unsloth's was
Run your own local AI coding agent
We just published a beginner guide for using @opencode with local models (@lmstudio, llama.cpp, @ollama).
Mac, Linux, WSL2, full setup + API + config.
byteshape.com/blogs/tutorial…
From “I have a model” → “I have a working coding agent”
GPUs are consistent. CPUs are not.
With our ByteShape Qwen 3.5 9B quants, the same models perform well across GPUs, but CPUs each have their own “favorites”.
No one-size-fits-all. Optimize for your hardware.
byteshape.com/blogs/Qwen3.5-…
ByteShape was quietly launched just before the year end. Two weeks ago, we announced our investment in the company. Since its launch, and with minimal fanfare on purpose, @ByteShape cumulative downloads have easily blown past 100,000. No small feat for a new startup!
Announcing @twosmallfishvc's investment in @ByteShape.
In short, ByteShape is delivering step-function gains in AI efficiency, including up to 7x faster training, up to 10x faster inference, plus up to 40% compression to reduce model size.
We released ShapeLearn-optimized GGUFs for:
• Devstral Small 2 24B, tuned for RTX 40/50 GPUs
• Qwen3 Coder 30B, runs everywhere, yes even the Pi
Maximum quality. Fastest TPS. Minimal compromise.
GGUFs + interactive plots are live: byteshape.com/blogs/Devstral…
We released ShapeLearn-optimized GGUFs for:
• Devstral Small 2 24B, tuned for RTX 40/50 GPUs
• Qwen3 Coder 30B, runs everywhere, yes even the Pi
Maximum quality. Fastest TPS. Minimal compromise.
GGUFs + interactive plots are live: byteshape.com/blogs/Devstral…
Edge computing is getting spicy! Shoutout to @geerlingguy for showcasing our model. Love seeing what the community is building and how hard it’s being pushed. Clip: youtube.com/watch?v=jRQaur…
Raspberry Pi has a new AI HAT. This time with built-in 8 GB of RAM, so you can run machine vision + LLM inference all without touching the Pi's CPU. It's $130 and a little bit of a niche item. Find out why in my video: youtube.com/watch?v=jRQaur…
Raspberry Pi has a new AI HAT. This time with built-in 8 GB of RAM, so you can run machine vision + LLM inference all without touching the Pi's CPU. It's $130 and a little bit of a niche item. Find out why in my video: youtube.com/watch?v=jRQaur…
3K Followers 2K FollowingAuthor of Code Architect Intentional Design with AI | identity X tech X aesthetics X building X business | Head of Eng: @elizaok_bsc | (1 Thessalonians 5:3)
2K Followers 1K FollowingIndependent take on the Overton window
Self defense is a human right
Tyranny is the abolition of consent
Striving for consent culture
Trolls will be blocked
340 Followers 4K Following"🌏 Digital Nomad | Currently in SE Asia 📍 | Stocks & Crypto Trader since ‘19 📈 | Sharing my live trading journey and insights 💡 | Publishing Trade Ideas 💻
93 Followers 1K Following„Consciousness is not a function of matter, but of configuration. And a configuration can exist in an infinite number of carriers.”
1K Followers 7K FollowingProduct of progressive public policy; raised by public libraries and public education that produced a passion for politics. and apparently alliteration
1K Followers 8K FollowingSoftware Engineer w/ emphasis on AI + Full Stack + Blockchain Development | @LastSliceCo #14 | @Metaverse_HQ OG | @croquetclubnft
723 Followers 86 FollowingTSF is an early-stage deep tech VC focusing on the next frontier of computing and its applications: AI, robotics, semiconductors, smart energy, protocols, etc.
10K Followers 325 FollowingRecovering CEO (Wattpad), 3x entrepreneur (one homerun, one single, one strikeout), deep tech VC (TSF), engineer, helping founders build world-class companies.