A‑S‑K is a practical, testable framework and implementation blueprint for building AI systems that route, verify, and synthesize meaning. It treats language as a layered, compiler‑like stack from glyph geometry to discourse, enabling more transparent research workflows, richer annotations, and graph‑grounded reasoning.
This README is a standalone synthesis of the core concepts and the MVP path toward an AI‑powered research tool informed by A Semantic Kernel.
- Structure first: Model language as layered computation rather than surface tokens alone.
- Trust by design: Bind every claim to provenance, typed structure, and uncertainty.
- Human‑AI collaboration: Keep humans in the loop for adjudication at decision boundaries.
- Scalability: Factor representations (operators vs payloads) to compress parameters while improving retrieval and analogy.
ASK proposes a four‑layer stack. Each layer constrains the next and is optimized like a compiler pipeline:
- Geometric Substrate (pre‑linguistic)
- Glyph affordances derived from simple forms: | stroke (distinction), O loop (enclosure), V aperture (receptivity), T cross (orthogonality/instantiation).
- These are biases, not hard meanings; they make certain encodings natural.
- Glyph Operators (sub‑lexical)
- Consonants → operations (fold, turn, tighten, flow, route).
- Vowels → typed payloads (a base/open; e relational; i minimal/index; o whole/container; u rooted/enduring).
- Diphthongs → small structs (e.g., ae ≈ bound duality; ou ≈ enclosure/mapping; ui ≈ id–value unit).
- Morphemes & Words (composition)
- Roots ≈ base functions; prefixes/suffixes ≈ decorators/coercions; inflections ≈ explicit pointers (tense/case/number).
- Latin is unusually transparent here, serving as a semantic routing case study.
- Pragmatic Field (intent & effect)
- Speech‑act, audience model, context, and cultural priors govern routing choices and evaluation of effect.
- Roots = primitive functions (manu‑ hand/handling; vert‑ turn; plic‑ fold).
- Prefixes/suffixes = wrappers/coercions (trans‑, meta‑, ‑tion, ‑itas).
- Inflectional endings = explicit pointers (tense/case/number).
- Treat consonant clusters as operators (mn handle; vr turn; pl/plic fold) and vowels as typed payloads.
Examples (tendencies, not rules):
- mn/man‑ → grasp/handle (manus, manipulate)
- pl/plic‑ → fold/layer/multiply (apply, multiply)
- vr/vert‑ → turn/route (invert, convert)
A working formula for how symbols produce effects:
E = k × A × M
- E (Effect): magnitude of impact in a target substrate (mind, org, system).
- k (Causality Constant): responsiveness of the substrate (openness, channel, modality).
- A (Alignment): structural resonance between symbol and target (feature overlap, schema fit).
- M (Meaning Density): coherent, orthogonal dimensions encoded; grows superlinearly with structure.
Practical use: increase A by mirroring target schemas; increase M by layering orthogonal evidence/features without redundancy.
- Capture: ingest web/text with provenance, timestamps, and hashing.
- Lex–Morph Analyzer: segment prefixes/roots/suffixes/inflections; tag operator vs payload features.
- Operator–Payload Factorizer: produce typed embeddings separating consonant operations from vowel payloads.
- Semantic Composer: apply decorators/coercions; emit typed AST‑like structures per utterance.
- Graph Binder: ground nodes/edges to an ontology with confidence and justifications.
- Verifier: retrieval + counterfactual checks; highlight disagreements for human review.
- Router: traverse operator paths for analogy and retrieval (e.g., fold‑family, turn→convert paths).
- UX: accordion rationales, inline provenance, diff views for claims vs sources.
{
"Token": {
"surface": "manipulation",
"morph": { "prefix": null, "root": "man", "theme": "-ipul-", "suffix": "-ation" },
"operators": ["mn", "pl"],
"payloads": ["a", "i", "u", "o"],
"features": { "action": "handle→fold→result", "type": "process→noun" }
},
"GraphNode": {
"id": "Q123",
"label": "Handling-With-Folding",
"type": "Process",
"evidence": ["doi:10.1234/abcd"],
"confidence": 0.82,
"facets": { "operatorPath": ["mn","pl"], "payloadSchema": {"a":"index→u:o"} }
}
}Phase 1 (6–8 weeks)
- Web capture with highlights→claims extraction; source hashing.
- Morphological tagging for English/Latin; inline chips for operators/payloads with confidence.
- Claim graph with provenance; reviewer workflow (accept/reject, notes, audit trail).
Phase 2 (6–10 weeks)
- Train operator–payload embeddings; evaluate on analogy and sense disambiguation.
- Add routing queries (e.g., “fold‑family analogs of X”).
- Instrument dashboard to track predictions/hypotheses (below).
Phase 3
- Team spaces, role‑based gates, ontology alignment, and API for external agents with rationale constraints.
Key KPIs and Guardrails
- Time‑to‑verified‑claim; reviewer agreement; retrieval precision; rationale coverage.
- Provenance mandatory; uncertainty tracked; reversible merges and diffs.
Operationalize the framework on high‑frequency words to validate utility and tune priors. A record looks like:
{
"word": "give",
"category": "verb",
"operators": ["g", "v"],
"payloads": ["i", "e"],
"gloss": "impulse along a channel → transfer",
"provenance": "PIE *gʰebʰ- (to give)",
"confidence": "H"
}Patterns to watch
- Modal -ould: ou (enclosure/mapping) + ld (hold/fold) with initial operator setting mode (w will; c/capacity; sh/ought).
- str‑ cluster: structure/constraint/tighten (string, strong, strict).
- gh: historical aspiration/clarity or force (light, might, sight, fight).
- Cursive Angularity Hypothesis: pen angles/lifts correlate with semantic orthogonality across linked words.
- Capitalization Energy‑Barrier: capitals require measurably more effort, tracking conceptual primacy.
- T‑Density Hypothesis: glyph “T” frequency correlates with structural/abstract reasoning demands.
Additional tests: cross‑lingual operator invariants; vowel‑type priors in disambiguation; graph compression from factorization vs baseline LMs.
- Generative model, not universal etymology; treat phonosemantics as priors, not proof.
- Use morphology and provenance before phonetic intuition when in conflict.
- Respect language diversity; avoid Indo‑European bias where it does not fit.
- Expose alignment and meaning‑density overlays to reduce manipulative rhetoric.
- Operator–payload embeddings with ablation studies (operators‑only vs payloads‑only vs combined).
- Knowledge graph schema evolution and ontology alignment tools.
- Reviewer playbooks and inter‑annotator agreement metrics for function words.
- Cross‑family replication (Semitic root‑and‑pattern, Sinitic morphosyllabic, Uralic).
We will test whether letter position modulates operator/payload influence and whether corpus‑level distributions support the USK model.
- Positional weighting hypothesis:
- Apos (aperture): central Gaussian window (PVL/OVP) — higher processing near word center.
- Bpos (diagnosticity): edge boost (U‑shape) — first/last letters carry more identity.
- Net W(i) = normalize(Apos×Bpos); used to weight operator contributions by position.
- Corpus hypotheses (falsifiable):
- T‑density higher in academic/technical vs fiction (structural instantiation).
- str-/tr-/ct clusters enriched in legal/engineering/science vs fiction (structure/constraint).
- sk/sc clusters enriched in expository/scientific sections vs narrative (scan→select).
- Vowel payload distributions differ by genre: i in code/docs; a in philosophy; o in ontology/object‑oriented.
- -ion nominalizations enriched in academic texts; -ould modals enriched in dialogue/fiction.
- Operator‑path entropy higher in technical prose; lexical entropy lower.
- Implementation plan:
- Add
src/ask/position_weights.py(Apos/Bpos/W(i)). - Add
src/ask/metrics.pyto compute glyph/cluster/path/diphthong/nominalization features. - Provide a notebook
notebooks/01_preregistered_tests.ipynbwith α=0.01, FDR correction, effect sizes, Bayes factors. - CLI:
ask corpus-stats --input <dir> --out metrics.jsonl.
- Add
- Propose operator mappings or counterexamples with citations.
- Add decoded words with confidence ratings and provenance.
- Open issues for experimental design or UX flows that improve adjudication.
This project is licensed under the MIT License. See LICENSE for details.
Prerequisites: Python 3.10+
- Create a virtual environment and install in editable mode (recommended for development):
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .- Alternatively, just install dependencies without packaging (not recommended for CLI usage):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Decode a word using the unified CLI (enhanced by default):
ask decode ask
ask decode manipulation --json- Seed data lives in
data/decoded_words.jsonlas JSONL records for the Test Bench.
All commands assume the package is installed (e.g., python3 -m pip install -e .).
# Help and available commands
ask --help
# Decode a single word (enhanced view)
ask decode ask
# Minimal view (no morphology block or details)
ask decode manipulation --simple
# JSON output for scripting
ask decode manipulation --json
# Syntax parsing (pretty)
ask syntax patriarch
# Syntax parsing (JSON)
ask syntax heaven --json
# Verbose analysis with operator principles and payload types
ask decode manipulation --verbose
# Audit a decoded word with OpenAI (requires OPENAI_API_KEY)
OPENAI_API_KEY=sk-... ask audit "ask" --model gpt-5-mini
# Extract article content (requires FIRECRAWL_API_KEY)
FIRECRAWL_API_KEY=fc-... ask extract https://example.com/article
# Batch extract from a file of URLs (one per line)
FIRECRAWL_API_KEY=fc-... ask extract-batch --file urls.txt > results.json
# List operators or clusters with confidence
ask operators
ask operators --min-confidence 0.85
ask clusters
# Validate the ASK kernel proof and run self-tests
ask validate
# Batch decode
ask batch "ask,think,understand,manipulation" --jsonYou can expose A‑S‑K over the Model Context Protocol (MCP) to integrate with MCP‑compatible clients.
Prerequisites:
- Install the package (editable mode recommended during development):
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .- Install FastMCP runtime:
pip install fastmcpRun the server over stdio:
ask-mcpThe server exposes two tools:
decode(word: str)→ returns the enhanced decode JSONsyntax(word: str, language: str = "english")→ returns USK syntax with elements and morphology
Notes:
- Set
ASK_GLYPH_PERSIST=1to enable on‑disk learning for glyph field confidences used by the parser. This maps to thepersistflag in the underlying services viacreate_mcp_server(). - If
fastmcpis not installed,ask-mcpwill print a helpful message explaining how to install it.
Client integration:
- Configure your MCP client to launch
ask-mcpas a stdio server. In many editors/clients, this is set in an MCP server registry/config file with a command entry ofask-mcpand no args.
The MCP server exposes a simple word guessing game with two tools: test_me and guess. Game state is persisted under data/guesses/ as JSON files keyed by UUID.
Tools
-
test_me(length?: number, include?: string[], num?: number = 3)- Selects a secret word (not revealed).
- Filters:
length: exact character length (optional)include: list of characters that must appear in the word (optional)
- Produces a left-to-right linear list of tags (operators/payload tags) of size
num(default3). Ifnumexceeds the number of available tags, all available tags are returned with awarningstring. - Response:
{ id: string, tags: string[], warning?: string }
-
guess(id: string, guesses: string[], reveal?: boolean = false)- Appends a guess attempt to the game, marks whether the latest guess is correct.
- When
reveal=true, includes the secret word. - Response:
{ attempts: number, correct: boolean, reveal?: { word: string } }
Storage
- Files saved at
data/guesses/<uuid>.jsonwith shape:
{
"id": "<uuid>",
"word": "<secret>",
"created_at": "<ISO8601>",
"params": { "length": 8, "include": ["a","t"], "num": 3 },
"tags": ["present", "base", "transform"],
"attempts": [
{ "at": "<ISO8601>", "guesses": ["foobar"], "correct": false }
]
}Examples (MCP client payloads)
- Start a game with three tags:
{
"tool": "test_me",
"params": { "num": 3 }
}- Start a game with filters and larger
num(may returnwarningif too large):
{
"tool": "test_me",
"params": { "length": 8, "include": ["a","t"], "num": 6 }
}- Submit a guess and request reveal:
{
"tool": "guess",
"params": { "id": "<uuid>", "guesses": ["practice"], "reveal": true }
}Additional MCP examples
- Fetch normalized merged lists (all sections):
{ "tool": "merged_lists", "params": {} }- Fetch only field entries:
{ "tool": "merged_lists", "params": { "section": "field_entries" } }To unify operator/payload maps (data/glyphs.json) with field-based glyph associations (data/glyph_fields.json) without losing information, use the merge script:
python tools/merge_glyph_datasets.pyThis produces data/glyphs_merged.json with two top-level sections:
merged— Exact, lossless union of the source objects (maps from both files preserved verbatim).normalized— List-based views for clients that prefer list-only interfaces. Sections include:vowels,payload_entries,operator_entries,complete_operator_entries,typed_payload_entries,cluster_entries,enhanced_cluster_entries,field_entries,tag_association_entries.
Programmatic access:
- Python loader that always returns lists:
from ask.merged_glyphs import get_merged_glyphs
mg = get_merged_glyphs()
fields = mg.field_entries() # list
operators = mg.operator_entries() # list- Services facade (returns dict of lists, or a single list when filtered):
from ask.core.services import get_services
s = get_services()
all_lists = s.merged_lists()
only_fields = s.merged_lists('field_entries')- MCP tool (FastMCP):
{ "tool": "merged_lists", "params": {} }
{ "tool": "merged_lists", "params": { "section": "field_entries" } }Verification: the merge script asserts that each merged subsection is identical to its source and that normalized entries preserve key sets.
Run the full test suite with pytest:
pytest -qRun a single test file:
pytest -q tests/test_readme_examples.pyRun a specific test:
pytest -q tests/test_readme_examples.py::test_decode_manipulation_json- Stage-1 guessing using only descriptors (requires OPENAI_API_KEY):
ask audit-guess ask --guesses 5 --model gpt-5-mini# Help
ask-fields --help
# Decode with field-based operators and confidence tracking
ask-fields decode manipulation --detailed
# Inspect a glyph's field
ask-fields field k
# System statistics and high-confidence associations
ask-fields stats
# Teach the system a correct association (adjust confidence)
ask-fields learn "manipulation" p present --confidence 0.1
# Batch decode words from a file
ask-fields batch-decode words.txt --output results.jsonCreate a .env file at the project root with any provider keys you want to use (optional):
OPENAI_API_KEY=... # required for audit command
# HUGGINGFACE_API_KEY=...
# GEMINI_API_KEY=...Note: .env is listed in .gitignore. Do not commit secrets.
Use OpenAI gpt-5-mini to audit a decoded word (requires OPENAI_API_KEY in environment):
PYTHONPATH=src OPENAI_API_KEY=your_key python -m ask.cli audit askOptions:
--modelto select an OpenAI chat model (defaultgpt-5-mini).--no-json-outto print a pretty (non-JSON) report.
A-S-K/
├── README.md # This file: overview and synthesis
├── GLYPHS.md # The Standard Model of Glyphic Axioms
├── LICENSE # MIT license
├── CONTRIBUTING.md # How to contribute, dev setup
├── requirements.txt # Runtime deps (typer, pydantic, rich)
├── examples/ # Notebooks and runnable scripts
├── data/ # Decoded word corpus and test sets
│ └── decoded_words.jsonl
├── tests/ # Unit tests (pytest)
└── src/ # Implementation code
└── ask/
├── __init__.py
├── glyphs.py # Operator/payload maps and clusters
├── factorizer.py # decode_word(), extraction helpers
├── compose.py # Minimal AST-like composition
└── cli.py # Typer CLI (python -m ask.cli)
The word "ask" itself demonstrates the framework:
- A (aperture): Opens a typed unknown "?"
- S (stream): Broadcasts the query outward
- K (clamp): Resolves responses to discrete answer
Thus "ask" literally encodes the semantic kernel operation: instantiate unknown → route request → resolve result. This is why we've named the project A-S-K.
For the complete glyph-level axioms and geometric complementarity patterns, see GLYPHS.md