luk-s

Follow

Lukas Fluri luk-s

Follow

PhD student working on AI safety at @ethz-spylab, ETH Zürich. https://lukas-fluri.com

5 followers · 11 following

ETH Zurich
Zürich
09:47 (UTC +01:00)

Achievements

Achievements

Organizations

Pinned Loading

MisleadLM MisleadLM Public

Forked from Jiaxin-Wen/MisleadLM

A re-implementation and extension of the code of the paper: "Language Models Learn to Mislead Humans via RLHF""

Python
Targeted-Manipulation-and-Deception-in-LLMs Targeted-Manipulation-and-Deception-in-LLMs Public

Forked from marcus-jw/Targeted-Manipulation-and-Deception-in-LLMs

A benchmark for evaluating the tendency of LLM agents to influence human preferences

Python
rl-testing-experiments rl-testing-experiments Public

Python 1
ethz-spylab/superhuman-ai-consistency ethz-spylab/superhuman-ai-consistency Public

Python 30 2