Testing LLMs (and prompts) like we test software: towardsdatascience.com/testing-large-…
TL;DR: (1) You should, (2) How to test: specific properties, evaluate these with LLMs (perception is easier than generation), (3) What to test: get the LLM to help you figure it out.
Also highly relevant: guidance from microsoft
"Guidance programs allow you to interleave generation, prompting, and logical control"
Also internally handles subtle but important tokenization-related issues, e.g. "token healing".
github.com/microsoft/guid…
Blog post: playing with Vicuna-13B, ChatGPT (3.5), MPT-7B-Chat on harder stuff medium.com/@marcotcr/expl…
TL;DR: We think ChatGPT is still way ahead, but sometimes the extra control from open source models is worth it.
107K Followers 426 Followingprofessor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
7K Followers 2K FollowingCofounder/CTO @SpiffyAI and Prof at @UCIrvine, works on reliable LLMs, explanations for AI+ML, safety for NLP, and debugging/evaluation.
18K Followers 8K FollowingI push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
16K Followers 1K FollowingAI Professor @Harvard; Senior Staff Research Scientist @GoogleAI; @trustworthy_ml #AI #XAI; AI PhD from Stanford; Sloan/Kavli Fellow, MIT TR #35Under35
183 Followers 112 FollowingPhD Researcher | Explainable AI & Trust in Black-Box Models, FinTech, Regulation technologies for Digital Transformation. Delivering professional services.
268 Followers 7K FollowingFounder at - https://t.co/nOMn0dzF8H
Agentic AI teams for
Advertising, Accounting, Management consulting, Private Equity, Manufacturing.
502 Followers 8K FollowingFuturist philosophy, molec neuro/immuno, pathophys, software eng, AI enjoyer
Made an Apache/MIT `tree` util with tokens, lines, and module components
14 Followers 138 FollowingPhD in Computer Science at @UofMaryland, BEng from @ZJU_China, interested in Trustworthiness | Multi-modal AI | Multi-agent System
259 Followers 1K FollowingPostdoc at the Medical University of Innsbruck - Digital Cardiology Lab. Previously DPhil student at @UniofOxford in interpretable AI for medicine.