alignment-research

Star

Here are 6 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

Star

AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation

python ai-safety openrouter llm-evaluation adversarial-testing alignment-research epistemic-robustness

Updated Dec 18, 2025
Python

templetwo / RCT-Clean-Experiment

Star

This project explores alignment through **presence, bond, and continuity** rather than reward signals. No RLHF. No preference modeling. Just relational coherence.

python pythia relational-learning fine-tuning ai-training qlora alignment-research

Updated Dec 5, 2025
Python

ssrpw2 / human-ai-translation-dictionary

Star

A reference point for phenomena that have been reported to occur inside AI systems but have no direct mapping into natural language.

ontology cognitive-science representation-learning ai-safety interpretability latent-space embodied-ai model-behavior ai-mechanics alignment-research llm-internals translation-loss

Updated Dec 14, 2025

StarPolaris9 / Hoshimiya-script

Star

Hoshimiya Script / StarPolaris OS — internal multi-layer AI architecture for LLMs. Self-contained behavioral OS (Type-G Trinity).

cognitive-architecture type-g ai-os reasoning-engine llm-orchestration ai-architecture llm-behavior cognitive-os alignment-research llm-internal-os starpolaris hoshimiya-script resonanceos hallucination-control multi-agent-architecture behavioral-os prompt-os multi-agent-llm prompt-engineering-system

Updated Dec 23, 2025
HTML

ashioyajotham / eval-awareness-research

Star

Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored

evaluations language-modelling ai-safety linear-probing mechanistic-interpretability llama3 activation-steering alignment-research

Updated Dec 31, 2025
Jupyter Notebook

aeda-framework / AEDA-Framework

Star

8-layer framework for AI alignment with systemic awareness (Φ, Ω, T)

machine-learning agi artificial-intelligence ai-safety ethics ai-alignment ethical-ai systemic-coherence alignment-research

Updated Nov 12, 2025
HTML

Improve this page

Add a description, image, and links to the alignment-research topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the alignment-research topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment-research

Here are 6 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

templetwo / RCT-Clean-Experiment

ssrpw2 / human-ai-translation-dictionary

StarPolaris9 / Hoshimiya-script

ashioyajotham / eval-awareness-research

aeda-framework / AEDA-Framework

Improve this page

Add this topic to your repo