Skip to content
View luk-s's full-sized avatar
  • ETH Zurich
  • Zürich
  • 09:47 (UTC +01:00)

Organizations

@ethz-spylab

Block or report luk-s

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. MisleadLM MisleadLM Public

    Forked from Jiaxin-Wen/MisleadLM

    A re-implementation and extension of the code of the paper: "Language Models Learn to Mislead Humans via RLHF""

    Python

  2. Targeted-Manipulation-and-Deception-in-LLMs Targeted-Manipulation-and-Deception-in-LLMs Public

    Forked from marcus-jw/Targeted-Manipulation-and-Deception-in-LLMs

    A benchmark for evaluating the tendency of LLM agents to influence human preferences

    Python

  3. rl-testing-experiments rl-testing-experiments Public

    Python 1

  4. ethz-spylab/superhuman-ai-consistency ethz-spylab/superhuman-ai-consistency Public

    Python 30 2