Skip to content
View vtu81's full-sized avatar
🥯
Everything
🥯
Everything

Highlights

  • Pro

Block or report vtu81

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vtu81/README.md

Pinned Loading

  1. SORRY-Bench/sorry-bench SORRY-Bench/sorry-bench Public

    Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)

    Jupyter Notebook 83 7

  2. LLM-Tuning-Safety/LLMs-Finetuning-Safety LLM-Tuning-Safety/LLMs-Finetuning-Safety Public

    We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

    Python 356 38

  3. backdoor-toolbox backdoor-toolbox Public

    A compact toolbox for backdoor attacks and defenses.

    Python 191 23

  4. Unispac/Subnet-Replacement-Attack Unispac/Subnet-Replacement-Attack Public

    Official implementation of (CVPR 2022 Oral) Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks.

    Jupyter Notebook 27 7

  5. Unispac/Fight-Poison-With-Poison Unispac/Fight-Poison-With-Poison Public

    Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

    Python 30 2

  6. ain-soph/trojanzoo ain-soph/trojanzoo Public

    TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.

    Python 303 66