Haicheng Wu @asdf1234_0
https://t.co/IovvdTeNzl Joined July 2009-
Tweets78
-
Followers2K
-
Following5K
-
Likes69
@neurosp1ke @AuldEric Keep them coming! I’m watching the Flash Attention episode and it’s a good intro. It would be great to have a follow up episode where an expert does a walkthrough of @tri_dao implementation. Show optimized kernel code and talk through the decisions. github.com/Dao-AILab/flas…
CUDA-MODE 15: CUTLASS 🧮 Today @AuldEric will present CUTLASS 3.0 to us - a high-performance template linear algebra library from NVIDIA. Learn how to leverage the tensor core potential of your GPU from C++. Sat, Apr 20, 19:00 UTC discord.gg/BcKkKUPw?event…
🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention) github.com/karpathy/llm.c… On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,…
“NVIDIA cutlass kernels with solid compute throughput taking up a lot of the running time => nice.” 🥵❤️👀
“NVIDIA cutlass kernels with solid compute throughput taking up a lot of the running time => nice.” 🥵❤️👀
Just pushed out a new NATTEN release in over 10 months. It includes our new GEMM kernels for SM70 and above, Forward-mode AD support, support for nested tensors (inference only), 3D NA (naive kernels only), BF16 support for compatible devices, and more. shi-labs.com/natten/
If you're interested in the new kernels in NATTEN (GEMM and fused), refer to our new preprint: arxiv.org/abs/2403.04690
Major NATTEN update adds Fused Neighborhood Attention; memory-efficient 2-D and 3-D sliding window attention with support for dilation, causal masking, and fine-grained control over different axes. github.com/SHI-Labs/NATTE…
Check out Faster Neighborhood Attention: great effort led by @AliHassaniJr to optimize multi-dimensional local, causal, sparse global attention at Threadblock level, paving the way for building next-generation efficient multimodal AI systems. code/paper: github.com/SHI-Labs/NATTEN
Thank you, Manish Gupta! Thank you, Google! linkedin.com/posts/mguptaii…
CUTLASS 3.4 improved the CUTE documents a lot which answered most questions asked before. Rawn Henry made another heroic work to improve Hopper f16 x s8/s4 performance. Big thanks to Aleksandar Samardžić for contributing Sparse Epilogue Visitor Tree. linkedin.com/posts/aniketsh…
This adds Hopper WGMMA and TMA to FA2. linkedin.com/posts/colfax_i…
Big shout out to Manish Gupta from Google that contributed int8 x fp16 gemm on Ampere in newly released CUTLASS 3.3. CUTLASS 3.3 also allows non 128bit aligned gemm to use WGMMA on hopper. linkedin.com/posts/thakkarv…
Dear CUTLASS developers, we are looking for both full time and interns. If you already use CUTLASS to build something non trivial, it is easy for you to stand out from the crowd. nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx…
CUTLASS passed 20M downloads. The first 10M took 5 yrs. The 2nd took 5 months. BTW, we have an official CUTLASS discord channel now: discord.com/invite/Nh9hQhY7
fe9cca05 @52af7efb
29 Followers 5K FollowingTegan Jancik @teg_janc
96 Followers 5K FollowingElise Hildebrant @HildebranEli
55 Followers 5K FollowingFlavia Zlatkin @f_zlat
65 Followers 5K FollowingVincent @vincent4ml
121 Followers 666 Followingliuyong @forrestbing
282 Followers 5K Following I am a researcher in AIGC, Multi-modality and VitrualHuman tech directionSimon V @Simon_Vt
46 Followers 81 Followingmikail khona @KhonaMikail
992 Followers 1K Following Incoming intern @nvidia | ex-intern @NttResearch studying LLMs | Physics PhD candidate, comp. neuro and deep learning, @MIT @mitbrainandcog @MIT_PhysicsKenya Hershelman @Hershelman47419
70 Followers 5K Followingasdf @andrewcy
332 Followers 1K FollowingDmytro Ivchenko @divchenko
27 Followers 59 FollowingLeia Viker @LeiaViker7410
78 Followers 5K Followingลักษสุว.. @UXJi6h538etEe
81 Followers 1K Following ไม่อยากพลาดช่วงเวลาการออกเดทที่น่าตื่นเต้นที่สุดใช่ไหม ถ้าอย่างนั้นก็รีบตามฉันมา! จะอัปเดตข้อมูลการติดต่อของฉันได้ตลอดเวลาDelsie Deaver @deaver89966
66 Followers 5K FollowingZabir Al Nazi Nabil @PseudoEmpirical
80 Followers 391 Following Self-taught SWE, Open Source Enthusiast & Contributor, Sci-Fi Connoisseur. Interested in AGI, LLM, XAI. CS PhD Student @UCRiversidePhoenix Feldkamp @FeldkampPh59009
75 Followers 5K FollowingMachine Learning Rant.. @ml_visoft
183 Followers 90 Following Image Processing and Pattern Recognition. Not that AgI rubbish! Sometimes hardware and general computer programming.3me8dvc3hcfvce4v @rwyx64nrzhusq
2 Followers 236 Following The team is a company that provides short-term investment income in cryptocurrency. With a rigorous plan, you can make $500 to $5,000. Click the link to joinJob Oyebisi @joboyebisi
833 Followers 876 Following Work on "being" not on "having" - T.L. Osborn | Tech entrepreneur building the future of education @StanLabHQ | EngD @QMEECSJatin Nainani Z 🍃 @zephyr_wade
62 Followers 394 Following Trying to reverse engineer intelligence @ Umass CSAlexey Kuntsevich @jezzarax
156 Followers 452 Following LLM engineer. Building trust and cooperation between AI and humans. vi/vimVishal @eigenVectorizer
111 Followers 603 Following Mathematical Modelling, Applied Math, Cloud Architecture, Programmingtimaka46 @timaka46
2 Followers 22 Following Да, я пользуюсь твиттером только как источником артов, как ты узнал?Sumaiya Islam @islam_sumuu
5 Followers 16 Following Social Media #Manager And Advertiser || Digital #Marketer || Freelancer 💻 || Content Writer || Data Entry || Works at Fiverr and Upwork 📈✨ #seo #advertisinghuzaifa jawad @huzaifajaw25291
2 Followers 71 FollowingDonghyun Oh @DonghyunOh1
19 Followers 228 Following phd student at POSTECH. interested in ML, optimization, basketball, math etcVarun Bondalapati @bonvarun
19 Followers 325 FollowingWanda Fonger @FongWand
83 Followers 5K FollowingMae Caccamise @CaccamiM
60 Followers 5K FollowingSabinaJoan @OgUHvf1I8omRUQT
99 Followers 2K FollowingCody Nomos @CodyNomos
2 Followers 143 FollowingChief Computing Engin.. @nerdcoder1
454 Followers 5K Following problem solver and software developer. Thinking is the gateway to innovation and creativityVivek Gupta @VIVEKGU61712363
237 Followers 899 Following We can learn & improve from all, as all is by all for all; or so that we can also be useful for all, like all; (responsbilities)!#hbti #iitd #uprvunl #bhel #aaiSiddharth Joshi @s14joshi
539 Followers 428 Following Asst. professor @ND_CSE , PhD from @UCSanDiego. I figure out how to shuffle charge around wires to help robots see, hear, and think.Steffen Röcker @sroecker
1K Followers 5K Following OG local LLaMA shill. Sr. Solution Architect @RedHat, ex @DataRobot, @SAP, @CMSExperiment. Born @ 347 ppm CO₂. Personal account, potentially unaligned.Vishal @eigenVectorizer
111 Followers 603 Following Mathematical Modelling, Applied Math, Cloud Architecture, ProgrammingOscar Mojica @OscarMo84374409
51 Followers 146 Following Geologist and programmer focused on HPC 🌎+ 🖥 : Researcher at Senai Cimatec : 🇨🇴/🇧🇷 (tweets are my own)Umer Adil @UmerHAdil
715 Followers 317 Following Learning & providing value to OSS AI | Contributor @huggingface @diffuserslib, @LangChainAI, gpt engineer | https://t.co/BOR9cWbN8oMartin Fan @perfectoid_ai
401 Followers 8K FollowingAlexey Kuntsevich @jezzarax
156 Followers 452 Following LLM engineer. Building trust and cooperation between AI and humans. vi/vimmo @boustta_mo
63 Followers 1K Following x-Oracle | Curly hair creative technologist descending the gradient | Motivated CS student @1337FIL Coding School👨💻 | Self-taught |Job Oyebisi @joboyebisi
833 Followers 876 Following Work on "being" not on "having" - T.L. Osborn | Tech entrepreneur building the future of education @StanLabHQ | EngD @QMEECSDeping Zhang @joebradly
97 Followers 3K FollowingPiyush @CatAstro_Piyush
320 Followers 868 Following Physics Grad student| Computational Physics| Natural Language Processing| Hydrogen StorageWalker @walkerxian
25 Followers 571 Following Let Plato be your friend, and Aristotle, but more let your friend be truth.Rishabh @godfather01R
474 Followers 644 Following SEBI UNREGISTERED Momentum Trading/Investing IT+MBA(finance)Ali🇮🇳 @ReticentGaurd
86 Followers 3K Following Passionate Infosec guy - It's all about Secure Trustworthy AI###paric*** @_____rich______
331 Followers 3K FollowingJatin Nainani Z 🍃 @zephyr_wade
62 Followers 394 Following Trying to reverse engineer intelligence @ Umass CSAleksandar Dimov @Aleksan84679921
58 Followers 190 Followingachraf @miftahmoha_
16 Followers 237 Following👾 @the_dismal_tide
299 Followers 5K Following “Mathematics as currently practiced is a delicate interplay between monastic contemplation and blowing stuff up with dynamite.”Vijay Mocherla @psiepsilondelta
174 Followers 351 Following Graduate Student in Chemical Physics. In search for the NuminousRohit Asegaonkar 🇮.. @AR2632000
80 Followers 1K Following Engineer, Physics, Mathematics & Quantum Computing Enthusiast. AGI and Philosophy. तत्त्वमसि।🌌MK @mujhic_maniac
83 Followers 88 Followingbablu escobar @chandakya343
278 Followers 1K Following wordcel trying to rotate shapes | farming qualia | moving electrons to make them understand speechriku870 @riku_870
44 Followers 514 FollowingVedant Mahalle @VedantMahalle
60 Followers 608 Following Btech Developer 💻 Bibliophile📖 Hardcore gamer🎮aradhya n mathur @aradhyanmathur
257 Followers 2K Following PhD Candidate in Deep Learning and Graphics, IIITD | 28 | PhD Intern MDSR @adobe | using Twitter for bookmarking arxiv linksGpbhupinder @gpbhupinder
324 Followers 3K Following AI, Generative AI, Web Developer, Electronics Engineer, Space Nerd 🚀spoctone @sp_octone
442 Followers 4K Following word rotator | all things c̶r̶y̶p̶t̶o̶ | AGI doomer | here to share α l not financial/investment advice | like/rt ≠ my endorsement | blue checkxmmymmzmm @xmmymmzmm
119 Followers 99 FollowingSiddharth Joshi @s14joshi
539 Followers 428 Following Asst. professor @ND_CSE , PhD from @UCSanDiego. I figure out how to shuffle charge around wires to help robots see, hear, and think._ @sudoshoe_
477 Followers 2K Followingmaojialiang @maojialiang1206
2 Followers 78 FollowingNeila @CretRandomised
2 Followers 746 FollowingHalf-assing the epilogue of a kernel can be as bad as half-assing the finale episode of a tv show. Don't do that. Learn the art.
Zhaodong Chen is going to present his CUTLASS paper - EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree in ASPLOS'24 on May 1. EVT is a framework to fuse almost any combination in the epilogue. dl.acm.org/doi/10.1145/36…
@neurosp1ke @AuldEric Keep them coming! I’m watching the Flash Attention episode and it’s a good intro. It would be great to have a follow up episode where an expert does a walkthrough of @tri_dao implementation. Show optimized kernel code and talk through the decisions. github.com/Dao-AILab/flas…
CUDA-MODE 15: CUTLASS 🧮 Today @AuldEric will present CUTLASS 3.0 to us - a high-performance template linear algebra library from NVIDIA. Learn how to leverage the tensor core potential of your GPU from C++. Sat, Apr 20, 19:00 UTC discord.gg/BcKkKUPw?event…
🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention) github.com/karpathy/llm.c… On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,…
“NVIDIA cutlass kernels with solid compute throughput taking up a lot of the running time => nice.” 🥵❤️👀
🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention) github.com/karpathy/llm.c… On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,…
Just pushed out a new NATTEN release in over 10 months. It includes our new GEMM kernels for SM70 and above, Forward-mode AD support, support for nested tensors (inference only), 3D NA (naive kernels only), BF16 support for compatible devices, and more. shi-labs.com/natten/
If you're interested in the new kernels in NATTEN (GEMM and fused), refer to our new preprint: arxiv.org/abs/2403.04690
Major NATTEN update adds Fused Neighborhood Attention; memory-efficient 2-D and 3-D sliding window attention with support for dilation, causal masking, and fine-grained control over different axes. github.com/SHI-Labs/NATTE…
@DROP_ALL_TABLES @creed_humphrey Thanks and congrats @DROP_ALL_TABLES @asdf1234_0 and the entire Cutlass team, you all are phenomenal ! Cutlass is a blessing for the open-source HPC and AI community! 🔥 All credit of FNA goes to the @AliHassaniJr, our collaborators and Cutlass!👏
Built on CUTLASS! Congrats @creed_humphrey and thanks for using our stuff :)
Check out Faster Neighborhood Attention: great effort led by @AliHassaniJr to optimize multi-dimensional local, causal, sparse global attention at Threadblock level, paving the way for building next-generation efficient multimodal AI systems. code/paper: github.com/SHI-Labs/NATTEN
Check out Faster Neighborhood Attention: great effort led by @AliHassaniJr to optimize multi-dimensional local, causal, sparse global attention at Threadblock level, paving the way for building next-generation efficient multimodal AI systems. code/paper: github.com/SHI-Labs/NATTEN
@asdf1234_0 i Will help you Mr Wu one more Star
Cool project, give them stars.
I am asked to reach 4096 github stars for our CUTLASS project before GTC'24. It is 3853 now. Please help me. github.com/NVIDIA/cutlass…
CUTLASS is your favorite CUDA library's favorite library. Go give them a star!
I am asked to reach 4096 github stars for our CUTLASS project before GTC'24. It is 3853 now. Please help me. github.com/NVIDIA/cutlass…
CUTLASS is a mind blowing and beautiful piece of software engineering. Have been following it since its inception, and definitely in my top 3 of the last 5 years
I am asked to reach 4096 github stars for our CUTLASS project before GTC'24. It is 3853 now. Please help me. github.com/NVIDIA/cutlass…
Haicheng is doing an impressive work. Let's help a little.
I am asked to reach 4096 github stars for our CUTLASS project before GTC'24. It is 3853 now. Please help me. github.com/NVIDIA/cutlass…