Anthony Peng's picture

3 9 6

Anthony Peng

AnthonyPeng

·

https://shengyun-peng.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

upvoted a paper 10 days ago

Agent Learning via Early Experience

upvoted a paper 14 days ago

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks

View all activity

Organizations

authored 8 papers 15 days ago

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Paper • 2305.03509 • Published May 4, 2023 • 1

RobArch: Designing Robust Architectures against Adversarial Attacks

Paper • 2301.03110 • Published Jan 8, 2023 • 1

CompCap: Improving Multimodal Large Language Models with Composite Captions

Paper • 2412.05243 • Published Dec 6, 2024 • 20

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

Paper • 2308.07308 • Published Aug 14, 2023

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

Paper • 2405.17374 • Published May 27, 2024 • 1

Robust Principles: Architectural Design Principles for Adversarially Robust CNNs

Paper • 2308.16258 • Published Aug 30, 2023

Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published 19 days ago • 55

Shape it Up! Restoring LLM Safety during Finetuning

Paper • 2505.17196 • Published May 22 • 1