I just spent months handwriting a 200 page guide on the entirety of ML foundations and math from scratch.
The guide features:
- Neural Nets (Backprop, Adam, SGD, Batch Norm)
- ML Algorithms (SVM, Grad Boosting, K-means, PCA)
- Hardware (Tensor Cores, Systolic Arrays, CUDA)
- Transformers (Multi-Head Attn, KV Cache, LoRA)
- Vision (ViT, Convolutions, MAE, IoU, NMS, VLM)
- Agents (OpenClaw, ReAct, Memory, Orchestration)
Everything I wish I had years ago, for free.
The fallacy of this is that more creates more. More hours, more hiring, more something.
And it is true in a sense. If you put in more work, more work will happen. But I think for most startups, the leverage is really in how differently you approach the problem, how well you cultivate your team, and the strategy.
Any large company can outspend you on hours. They have thousands or tens of thousands more people, spending more hours. If hours worked were the metric, every large company and government organization would always win and do the best work. More hours, better output.
This thinking is often representative of younger founders, where the startup becomes their identity and life. They have a hard time doing anything else, and cannot understand that your work is not the person that is you. But activities outside of work can grow you as a person too and make you do better work.
I’ve never worked this way. As a designer, I always saw the need to take a step back, to take a break. At times, I might work 12 hours or 16 hours, or whatever amount was needed, but it wasn’t the norm. You just can't grind design, you need inspiration. But taking that step away from the work, would give me more perspective, inspiration and I could approach the problem differently or I could just see the solution.
Grinding is never good for any creative problem, and startups or creating new products are often mostly about creative problem solving. Grinding works ok for email jobs, or where you just executing on very clear playbook.
With Linear, we’ve never worked this way. We work reasonable hours, 5 days a week. All of us founders have families. Many of our employees have families. I personally stop every evening, spend time with the family, cook dinner for the family, eat dinner together, and focus on things outside of work. Sometimes I work in the late evenings or weekends, but to me the pride is that I don’t need to. Company should be succesful without it.
My goal is to build a company that is sustainable in the long term, and doesn’t require heroics or personal sacrifices every single day.
There are times when our team is heroic. Launches, incidents, some other work that just needs to be done. They will work late into the night because they know it is the right thing. But we don’t require that every day or every week, and the more this happens, the more I think it is a failure of our company and leadership. The team and the leaders should always keep a reserve to use when something is needed.
Our thinking was also that quality, which we value, doesn’t emerge from working more or stressing people more. It emerges when you create the conditions for it to emerge. Often it is the appreciation, space, time, and how the person feels. A person who is rested will do better work.
I wouldn’t attribute much of our success to working a lot. The success came from having clear thinking, ideas, and focus to do the right things.
I sometimes wish we could move the culture more toward a Zen master.
Real mastery is not exerting the most effort. It is achieving the outcome with the least necessary effort.
"If you are not working 7 days per week, you are going to lose".
Corgi Insurance is the most intense workplace culture in startups.
- The company works 7 days per week.
- Founder (@nico_laqua) lives and sleeps in the office.
- He built a cafe in the office because there was
Medicine only works if it reaches people
That's why we're building agents that navigate the intricacies of healthcare end to end, accelerating medicine for those who need it, prescribe it, and create it
Proud to share Forus and $160M raised!
I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x.
All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework.
On a 4-token prompt with 252 generated tokens:
- Original: 0.76 tok/s
- KV cache fp32: 27.21 tok/s
- KV cache int8 (quantized): 27.29 tok/s
Try it out yourself here: mni-ml.github.io/demos/kv-cache/
In practice:
- KV caching gave us about a 35x end-to-end speedup
- INT8 KV cache kept roughly the same speed as fp32 but cut KV cache memory by 3.78x
FP32 cache used 4.5 MB in this run while the INT8 cache used only 1.19 MB
This simple change to inference created a huge impact on performance. To learn more about the KV cache and other optimizations like this, check out the blog at mni.ml!
I built a neural network from scratch without using PyTorch, TensorFlow, or any libraries for that matter. Instead, I implemented the core math myself.
I'm working on making my own machine learning framework from scratch with my friend @_reesechong. He previously trained a similar neural network, but using just scalars.
The next step was to use tensors instead. The benefit of this is clear: when using scalars each data point is looped through separately creating its own node in a computation graph.
With tensors, these separate nodes are stored together, allowing one forward and backward pass for the whole batch, greatly improving the efficiency of training.
We often hear the saying "don't reinvent the wheel", but in my experience rebuilding technologies that abstract away a lot of complexity gives you a better and more thorough understanding of how the system works.
Results of the training are shown below. Feel free to checkout the repo and read through the code, linked in the replies.
21 Followers 99 Followingcs/ml @uwaterloo | 6 courses over loading | 7 time hack the north finalist | prev. shopify & snapchat | incoming swe @meta | endorse my skills on linkedin pls
39K Followers 786 Followingworst engineer at the company, third coolest… created bootstrap 300 years ago… now building things at the Pierre Computer Company.