| May 05, 2026 | Our Defense-to-Attack jailbreak study on VLMs is accepted by Pattern Recognition. |
| May 05, 2026 | Two papers accepted to ICML’26: Just Ask (curious code agents revealing system prompts in frontier LLMs) and STARE (step-wise temporal red-teaming of multi-modal toxicity). |
| Mar 31, 2026 | We release System-Prompt-Open, an open database of system prompts extracted from frontier LLMs. Check out the project website and GitHub repo. |
| Mar 28, 2026 | Our survey on embodied AI safety with 400+ papers is now available, covering risks, attacks, and defenses across perception, cognition, planning, interaction, and agentic systems. |
| Mar 09, 2026 | Joined HKAI-Sci as Research Assistant Professor on Mar 9. |
| Mar 04, 2026 | Our OpenRedRL benchmark for RL-based red teaming is published in Frontiers of Computer Science. |
| Feb 20, 2026 | Our work on red teaming text-to-image generators is accepted by CVPR’26. |
| Jan 29, 2026 | Released JustAsk, a framework where curious code agents reveal system prompts in frontier LLMs. |
| Jan 15, 2026 | Our survey on large model and agent safety is published in Foundations and Trends® in Privacy and Security. |
| Jan 23, 2025 | Our work on reinforced defense for VLMs is accepted by ICLR’25. |
| Dec 14, 2024 | Our work on RL-based auditing for LLMs is accepted by AAAI’25. |
| Apr 17, 2024 | Our work on intrinsic motivation for RL is accepted by IJCAI’24. |
| Mar 22, 2024 | Our work on adversarial policy learning in RL is accepted by DSN’24. |