News

May 05, 2026 Our Defense-to-Attack jailbreak study on VLMs is accepted by Pattern Recognition.
May 05, 2026 Two papers accepted to ICML’26: Just Ask (curious code agents revealing system prompts in frontier LLMs) and STARE (step-wise temporal red-teaming of multi-modal toxicity).
Mar 31, 2026 We release System-Prompt-Open, an open database of system prompts extracted from frontier LLMs. Check out the project website and GitHub repo.
Mar 28, 2026 Our survey on embodied AI safety with 400+ papers is now available, covering risks, attacks, and defenses across perception, cognition, planning, interaction, and agentic systems.
Mar 09, 2026 Joined HKAI-Sci as Research Assistant Professor on Mar 9.
Mar 04, 2026 Our OpenRedRL benchmark for RL-based red teaming is published in Frontiers of Computer Science.
Feb 20, 2026 Our work on red teaming text-to-image generators is accepted by CVPR’26.
Jan 29, 2026 Released JustAsk, a framework where curious code agents reveal system prompts in frontier LLMs.
Jan 15, 2026 Our survey on large model and agent safety is published in Foundations and Trends® in Privacy and Security.
Jan 23, 2025 Our work on reinforced defense for VLMs is accepted by ICLR’25.
Dec 14, 2024 Our work on RL-based auditing for LLMs is accepted by AAAI’25.
Apr 17, 2024 Our work on intrinsic motivation for RL is accepted by IJCAI’24.
Mar 22, 2024 Our work on adversarial policy learning in RL is accepted by DSN’24.