Cornelius Emde @CorEmde

AI Security | AI Agents | ML Robustness | PhD @UniofOxford and @OxfordTVG | ex RS @Wise Oxford Joined September 2020

Tweets

71
Followers

152
Following

614
Likes

422

Cornelius Emde @CorEmde

3 weeks ago

@OwainEvans_UK What do you think: where does an inductive bias towards true statements come from? Did you test this with open data models, Pythia or Olmo, where you can check negation in data mix? Might it be possible to run controlled experiments in <100M models?

1 0 0 27 0

View Details

Cornelius Emde @CorEmde

3 months ago

4. is the reason why we built github.com/parameterlab/M…

Maksym Andriushchenko @maksym_andr

3 months ago

It's interesting how the usage of LLMs has been quickly progressing to higher levels of abstraction: 1. prompt engineering 2. context engineering 3. agent scaffold engineering (we are here now) 4. multi-agent architecture engineering 5. ??? It's also curious how people don't

3 0 25 3K 4

0 0 2 196 0

View Details

Cornelius Emde @CorEmde

3 months ago

9/ Work done at @parameterlab with Alexander Rubinstein @a_rubique, Anmol Goel @anmgoel, Ahmed Heakl, Sangdoo Yun @oodgnas, Seong Joon Oh @coallaoh, and Martin Gubri @framart1

0 0 3 126 0

View Details

Cornelius Emde @CorEmde

3 months ago

8/ 🔗 Website: parameterlab.github.io/MASEval/ GitHub: github.com/parameterlab/M… Docs: maseval.readthedocs.io/en/stable/ arXiv: arxiv.org/abs/2603.08835

1 0 0 64 0

View Details

Cornelius Emde @CorEmde

3 months ago

1/ Evaluating a single agent harness is hard. Evaluating a multi-agent system? That's a whole different problem. Most eval tools treat the model as the unit of analysis. But in multi-agent systems, the system is what matters. That's why we built MASEval 🧵 #Agents #AI #Eval

3 1 7 763 0

View Details

Cornelius Emde @CorEmde

3 months ago

Great work lead by @anmgoel on how fragile contextual integrity can be in LLMs. This work shows that contextual privacy degrades easily during fine-tuning on benign data and common safety benchmarks don't pick this up. #AISecurity #AIAgents

Anmol Goel @anmgoel

4 months ago

🚨 Fine-tuning your model to be more helpful or empathetic might be making it less private, without you noticing. In our latest work, we show that benign fine-tuning can silently break contextual privacy in language models while safety & general capabilities appear intact. ⬇️

1 2 9 4K 3

0 2 4 538 0

View Details

Cornelius Emde @CorEmde

9 months ago

@aichberger Wow. That’s rough!

0 0 0 141 0

View Details

Cornelius Emde @CorEmde

12 months ago

@ELLISforEurope @DebOishi @UniofOxford @GoogleDeepMind @SkyUK Congrats @DebOishi

1 0 2 69 0

View Details

Cornelius Emde @CorEmde

12 months ago

@oanacamb @imperialcollege @ucl @UniofOxford Congrats!

1 0 1 126 0

View Details

Esra Şengül @esra_sngl

a year ago

Excited to share our preprint! We show that sustained macrophage and B cell responses are essential for heart regeneration in Mexican cavefish, helping uncover why surface fish heal but cavefish scar 🫀🐟. Check out the full story: biorxiv.org/content/10.110…

0 7 25 1K 2

View Details

Cornelius Emde @CorEmde

a year ago

@negar_rz I am very interested in working with you and would love to connect but l can’t message you on Twitter nor LinkedIn :)

0 0 0 1K 0

View Details

Cornelius Emde @CorEmde

a year ago

Come see our poster today. 🗓️ Poster session 1 @ 10am 📍 Hall 3 + Hall 2B #239

Cornelius Emde @CorEmde

a year ago

🚨 New paper alert: Our recent work on LLM safety has been accepted to ICLR 2025 🇸🇬 We propose a new framework for LLMs safety. 🧵 (1/7) #LLM #AISafety #ICLR2025 #Certification #AdversarialRobustness #NLP #Shhhhhh #DomainCertification #AI

1 2 12 2K 2

0 0 2 329 0

View Details

Cornelius Emde @CorEmde

a year ago

Read more: cemde.github.io/Domain-Certifi… Thanks to my amazing collaborators: - Alasdair Paren, @trojantiger88 (P. Arvind), @maximek3 (M Kayser), @tom_rainforth, @philiptorr, @Adel_Bibi at @UniofOxford - @BernardSGhanem at @KAUST - Thomas Lukasiewicz at @tu_wien (7/7)

0 0 4 160 0

View Details

Cornelius Emde @CorEmde

a year ago

To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a global bound in prompt space 🚀 (6/7)