Skip to content

research-developer/A-S-K

A‑S‑K — A Semantic Kernel (ASK)

A‑S‑K is a practical, testable framework and implementation blueprint for building AI systems that route, verify, and synthesize meaning. It treats language as a layered, compiler‑like stack from glyph geometry to discourse, enabling more transparent research workflows, richer annotations, and graph‑grounded reasoning.

This README is a standalone synthesis of the core concepts and the MVP path toward an AI‑powered research tool informed by A Semantic Kernel.

Why A‑S‑K

  • Structure first: Model language as layered computation rather than surface tokens alone.
  • Trust by design: Bind every claim to provenance, typed structure, and uncertainty.
  • Human‑AI collaboration: Keep humans in the loop for adjudication at decision boundaries.
  • Scalability: Factor representations (operators vs payloads) to compress parameters while improving retrieval and analogy.

A Semantic Kernel (ASK) — Core Model

ASK proposes a four‑layer stack. Each layer constrains the next and is optimized like a compiler pipeline:

  1. Geometric Substrate (pre‑linguistic)
  • Glyph affordances derived from simple forms: | stroke (distinction), O loop (enclosure), V aperture (receptivity), T cross (orthogonality/instantiation).
  • These are biases, not hard meanings; they make certain encodings natural.
  1. Glyph Operators (sub‑lexical)
  • Consonants → operations (fold, turn, tighten, flow, route).
  • Vowels → typed payloads (a base/open; e relational; i minimal/index; o whole/container; u rooted/enduring).
  • Diphthongs → small structs (e.g., ae ≈ bound duality; ou ≈ enclosure/mapping; ui ≈ id–value unit).
  1. Morphemes & Words (composition)
  • Roots ≈ base functions; prefixes/suffixes ≈ decorators/coercions; inflections ≈ explicit pointers (tense/case/number).
  • Latin is unusually transparent here, serving as a semantic routing case study.
  1. Pragmatic Field (intent & effect)
  • Speech‑act, audience model, context, and cultural priors govern routing choices and evaluation of effect.

Latin as a Semantic Compiler (Analogy)

  • Roots = primitive functions (manu‑ hand/handling; vert‑ turn; plic‑ fold).
  • Prefixes/suffixes = wrappers/coercions (trans‑, meta‑, ‑tion, ‑itas).
  • Inflectional endings = explicit pointers (tense/case/number).
  • Treat consonant clusters as operators (mn handle; vr turn; pl/plic fold) and vowels as typed payloads.

Examples (tendencies, not rules):

  • mn/man‑ → grasp/handle (manus, manipulate)
  • pl/plic‑ → fold/layer/multiply (apply, multiply)
  • vr/vert‑ → turn/route (invert, convert)

A Calculus of Information–Causality

A working formula for how symbols produce effects:

E = k × A × M

  • E (Effect): magnitude of impact in a target substrate (mind, org, system).
  • k (Causality Constant): responsiveness of the substrate (openness, channel, modality).
  • A (Alignment): structural resonance between symbol and target (feature overlap, schema fit).
  • M (Meaning Density): coherent, orthogonal dimensions encoded; grows superlinearly with structure.

Practical use: increase A by mirroring target schemas; increase M by layering orthogonal evidence/features without redundancy.

Reference Architecture (Semantic Routing Engine)

  1. Capture: ingest web/text with provenance, timestamps, and hashing.
  2. Lex–Morph Analyzer: segment prefixes/roots/suffixes/inflections; tag operator vs payload features.
  3. Operator–Payload Factorizer: produce typed embeddings separating consonant operations from vowel payloads.
  4. Semantic Composer: apply decorators/coercions; emit typed AST‑like structures per utterance.
  5. Graph Binder: ground nodes/edges to an ontology with confidence and justifications.
  6. Verifier: retrieval + counterfactual checks; highlight disagreements for human review.
  7. Router: traverse operator paths for analogy and retrieval (e.g., fold‑family, turn→convert paths).
  8. UX: accordion rationales, inline provenance, diff views for claims vs sources.

Minimal Data Sketch

{
  "Token": {
    "surface": "manipulation",
    "morph": { "prefix": null, "root": "man", "theme": "-ipul-", "suffix": "-ation" },
    "operators": ["mn", "pl"],
    "payloads": ["a", "i", "u", "o"],
    "features": { "action": "handle→fold→result", "type": "process→noun" }
  },
  "GraphNode": {
    "id": "Q123",
    "label": "Handling-With-Folding",
    "type": "Process",
    "evidence": ["doi:10.1234/abcd"],
    "confidence": 0.82,
    "facets": { "operatorPath": ["mn","pl"], "payloadSchema": {"a":"index→u:o"} }
  }
}

MVP for the AI Research & Annotation Tool

Phase 1 (6–8 weeks)

  • Web capture with highlights→claims extraction; source hashing.
  • Morphological tagging for English/Latin; inline chips for operators/payloads with confidence.
  • Claim graph with provenance; reviewer workflow (accept/reject, notes, audit trail).

Phase 2 (6–10 weeks)

  • Train operator–payload embeddings; evaluate on analogy and sense disambiguation.
  • Add routing queries (e.g., “fold‑family analogs of X”).
  • Instrument dashboard to track predictions/hypotheses (below).

Phase 3

  • Team spaces, role‑based gates, ontology alignment, and API for external agents with rationale constraints.

Key KPIs and Guardrails

  • Time‑to‑verified‑claim; reviewer agreement; retrieval precision; rationale coverage.
  • Provenance mandatory; uncertainty tracked; reversible merges and diffs.

Test Bench: Common Word Decodings

Operationalize the framework on high‑frequency words to validate utility and tune priors. A record looks like:

{
  "word": "give",
  "category": "verb",
  "operators": ["g", "v"],
  "payloads": ["i", "e"],
  "gloss": "impulse along a channel → transfer",
  "provenance": "PIE *gʰebʰ- (to give)",
  "confidence": "H"
}

Patterns to watch

  • Modal -ould: ou (enclosure/mapping) + ld (hold/fold) with initial operator setting mode (w will; c/capacity; sh/ought).
  • str‑ cluster: structure/constraint/tighten (string, strong, strict).
  • gh: historical aspiration/clarity or force (light, might, sight, fight).

Falsifiable Predictions (for empirical grounding)

  1. Cursive Angularity Hypothesis: pen angles/lifts correlate with semantic orthogonality across linked words.
  2. Capitalization Energy‑Barrier: capitals require measurably more effort, tracking conceptual primacy.
  3. T‑Density Hypothesis: glyph “T” frequency correlates with structural/abstract reasoning demands.

Additional tests: cross‑lingual operator invariants; vowel‑type priors in disambiguation; graph compression from factorization vs baseline LMs.

Limits, Caveats, Ethics

  • Generative model, not universal etymology; treat phonosemantics as priors, not proof.
  • Use morphology and provenance before phonetic intuition when in conflict.
  • Respect language diversity; avoid Indo‑European bias where it does not fit.
  • Expose alignment and meaning‑density overlays to reduce manipulative rhetoric.

Roadmap (selected)

  • Operator–payload embeddings with ablation studies (operators‑only vs payloads‑only vs combined).
  • Knowledge graph schema evolution and ontology alignment tools.
  • Reviewer playbooks and inter‑annotator agreement metrics for function words.
  • Cross‑family replication (Semitic root‑and‑pattern, Sinitic morphosyllabic, Uralic).

R&D: Positional Weighting and Corpus Tests (preregistered heuristics)

We will test whether letter position modulates operator/payload influence and whether corpus‑level distributions support the USK model.

  • Positional weighting hypothesis:
    • Apos (aperture): central Gaussian window (PVL/OVP) — higher processing near word center.
    • Bpos (diagnosticity): edge boost (U‑shape) — first/last letters carry more identity.
    • Net W(i) = normalize(Apos×Bpos); used to weight operator contributions by position.
  • Corpus hypotheses (falsifiable):
    • T‑density higher in academic/technical vs fiction (structural instantiation).
    • str-/tr-/ct clusters enriched in legal/engineering/science vs fiction (structure/constraint).
    • sk/sc clusters enriched in expository/scientific sections vs narrative (scan→select).
    • Vowel payload distributions differ by genre: i in code/docs; a in philosophy; o in ontology/object‑oriented.
    • -ion nominalizations enriched in academic texts; -ould modals enriched in dialogue/fiction.
    • Operator‑path entropy higher in technical prose; lexical entropy lower.
  • Implementation plan:
    • Add src/ask/position_weights.py (Apos/Bpos/W(i)).
    • Add src/ask/metrics.py to compute glyph/cluster/path/diphthong/nominalization features.
    • Provide a notebook notebooks/01_preregistered_tests.ipynb with α=0.01, FDR correction, effect sizes, Bayes factors.
    • CLI: ask corpus-stats --input <dir> --out metrics.jsonl.

Contributing

  • Propose operator mappings or counterexamples with citations.
  • Add decoded words with confidence ratings and provenance.
  • Open issues for experimental design or UX flows that improve adjudication.

License

This project is licensed under the MIT License. See LICENSE for details.

Install

Prerequisites: Python 3.10+

  • Create a virtual environment and install in editable mode (recommended for development):
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .
  • Alternatively, just install dependencies without packaging (not recommended for CLI usage):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Quickstart

  • Decode a word using the unified CLI (enhanced by default):
ask decode ask
ask decode manipulation --json
  • Seed data lives in data/decoded_words.jsonl as JSONL records for the Test Bench.

CLI Usage

All commands assume the package is installed (e.g., python3 -m pip install -e .).

ask (unified CLI)

# Help and available commands
ask --help

# Decode a single word (enhanced view)
ask decode ask

# Minimal view (no morphology block or details)
ask decode manipulation --simple

# JSON output for scripting
ask decode manipulation --json

# Syntax parsing (pretty)
ask syntax patriarch

# Syntax parsing (JSON)
ask syntax heaven --json

# Verbose analysis with operator principles and payload types
ask decode manipulation --verbose

# Audit a decoded word with OpenAI (requires OPENAI_API_KEY)
OPENAI_API_KEY=sk-... ask audit "ask" --model gpt-5-mini

# Extract article content (requires FIRECRAWL_API_KEY)
FIRECRAWL_API_KEY=fc-... ask extract https://example.com/article

# Batch extract from a file of URLs (one per line)
FIRECRAWL_API_KEY=fc-... ask extract-batch --file urls.txt > results.json

# List operators or clusters with confidence
ask operators
ask operators --min-confidence 0.85
ask clusters

# Validate the ASK kernel proof and run self-tests
ask validate

# Batch decode
ask batch "ask,think,understand,manipulation" --json

MCP Server (FastMCP)

You can expose A‑S‑K over the Model Context Protocol (MCP) to integrate with MCP‑compatible clients.

Prerequisites:

  • Install the package (editable mode recommended during development):
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e .
  • Install FastMCP runtime:
pip install fastmcp

Run the server over stdio:

ask-mcp

The server exposes two tools:

  • decode(word: str) → returns the enhanced decode JSON
  • syntax(word: str, language: str = "english") → returns USK syntax with elements and morphology

Notes:

  • Set ASK_GLYPH_PERSIST=1 to enable on‑disk learning for glyph field confidences used by the parser. This maps to the persist flag in the underlying services via create_mcp_server().
  • If fastmcp is not installed, ask-mcp will print a helpful message explaining how to install it.

Client integration:

  • Configure your MCP client to launch ask-mcp as a stdio server. In many editors/clients, this is set in an MCP server registry/config file with a command entry of ask-mcp and no args.

MCP Game

The MCP server exposes a simple word guessing game with two tools: test_me and guess. Game state is persisted under data/guesses/ as JSON files keyed by UUID.

Tools

  • test_me(length?: number, include?: string[], num?: number = 3)

    • Selects a secret word (not revealed).
    • Filters:
      • length: exact character length (optional)
      • include: list of characters that must appear in the word (optional)
    • Produces a left-to-right linear list of tags (operators/payload tags) of size num (default 3). If num exceeds the number of available tags, all available tags are returned with a warning string.
    • Response:
      • { id: string, tags: string[], warning?: string }
  • guess(id: string, guesses: string[], reveal?: boolean = false)

    • Appends a guess attempt to the game, marks whether the latest guess is correct.
    • When reveal=true, includes the secret word.
    • Response:
      • { attempts: number, correct: boolean, reveal?: { word: string } }

Storage

  • Files saved at data/guesses/<uuid>.json with shape:
{
  "id": "<uuid>",
  "word": "<secret>",
  "created_at": "<ISO8601>",
  "params": { "length": 8, "include": ["a","t"], "num": 3 },
  "tags": ["present", "base", "transform"],
  "attempts": [
    { "at": "<ISO8601>", "guesses": ["foobar"], "correct": false }
  ]
}

Examples (MCP client payloads)

  • Start a game with three tags:
{
  "tool": "test_me",
  "params": { "num": 3 }
}
  • Start a game with filters and larger num (may return warning if too large):
{
  "tool": "test_me",
  "params": { "length": 8, "include": ["a","t"], "num": 6 }
}
  • Submit a guess and request reveal:
{
  "tool": "guess",
  "params": { "id": "<uuid>", "guesses": ["practice"], "reveal": true }
}

Additional MCP examples

  • Fetch normalized merged lists (all sections):
{ "tool": "merged_lists", "params": {} }
  • Fetch only field entries:
{ "tool": "merged_lists", "params": { "section": "field_entries" } }

Merged Glyph Dataset

To unify operator/payload maps (data/glyphs.json) with field-based glyph associations (data/glyph_fields.json) without losing information, use the merge script:

python tools/merge_glyph_datasets.py

This produces data/glyphs_merged.json with two top-level sections:

  • merged — Exact, lossless union of the source objects (maps from both files preserved verbatim).
  • normalized — List-based views for clients that prefer list-only interfaces. Sections include:
    • vowels, payload_entries, operator_entries, complete_operator_entries, typed_payload_entries, cluster_entries, enhanced_cluster_entries, field_entries, tag_association_entries.

Programmatic access:

  • Python loader that always returns lists:
from ask.merged_glyphs import get_merged_glyphs
mg = get_merged_glyphs()
fields = mg.field_entries()  # list
operators = mg.operator_entries()  # list
  • Services facade (returns dict of lists, or a single list when filtered):
from ask.core.services import get_services
s = get_services()
all_lists = s.merged_lists()
only_fields = s.merged_lists('field_entries')
  • MCP tool (FastMCP):
{ "tool": "merged_lists", "params": {} }
{ "tool": "merged_lists", "params": { "section": "field_entries" } }

Verification: the merge script asserts that each merged subsection is identical to its source and that normalized entries preserve key sets.

Testing

Run the full test suite with pytest:

pytest -q

Run a single test file:

pytest -q tests/test_readme_examples.py

Run a specific test:

pytest -q tests/test_readme_examples.py::test_decode_manipulation_json

Advanced CLI

  • Stage-1 guessing using only descriptors (requires OPENAI_API_KEY):
ask audit-guess ask --guesses 5 --model gpt-5-mini

ask-fields (field-based glyph analysis CLI)

# Help
ask-fields --help

# Decode with field-based operators and confidence tracking
ask-fields decode manipulation --detailed

# Inspect a glyph's field
ask-fields field k

# System statistics and high-confidence associations
ask-fields stats

# Teach the system a correct association (adjust confidence)
ask-fields learn "manipulation" p present --confidence 0.1

# Batch decode words from a file
ask-fields batch-decode words.txt --output results.json

Environment

Create a .env file at the project root with any provider keys you want to use (optional):

OPENAI_API_KEY=...  # required for audit command
# HUGGINGFACE_API_KEY=...
# GEMINI_API_KEY=...

Note: .env is listed in .gitignore. Do not commit secrets.

Audit (Model-assisted review)

Use OpenAI gpt-5-mini to audit a decoded word (requires OPENAI_API_KEY in environment):

PYTHONPATH=src OPENAI_API_KEY=your_key python -m ask.cli audit ask

Options:

  • --model to select an OpenAI chat model (default gpt-5-mini).
  • --no-json-out to print a pretty (non-JSON) report.

Project Structure

A-S-K/
├── README.md           # This file: overview and synthesis
├── GLYPHS.md           # The Standard Model of Glyphic Axioms
├── LICENSE             # MIT license
├── CONTRIBUTING.md     # How to contribute, dev setup
├── requirements.txt    # Runtime deps (typer, pydantic, rich)
├── examples/           # Notebooks and runnable scripts
├── data/               # Decoded word corpus and test sets
│   └── decoded_words.jsonl
├── tests/              # Unit tests (pytest)
└── src/                # Implementation code
    └── ask/
        ├── __init__.py
        ├── glyphs.py          # Operator/payload maps and clusters
        ├── factorizer.py      # decode_word(), extraction helpers
        ├── compose.py         # Minimal AST-like composition
        └── cli.py             # Typer CLI (python -m ask.cli)

The ASK Proof: Self-Evidence

The word "ask" itself demonstrates the framework:

  • A (aperture): Opens a typed unknown "?"
  • S (stream): Broadcasts the query outward
  • K (clamp): Resolves responses to discrete answer

Thus "ask" literally encodes the semantic kernel operation: instantiate unknown → route request → resolve result. This is why we've named the project A-S-K.


For the complete glyph-level axioms and geometric complementarity patterns, see GLYPHS.md

About

No description, website, or topics provided.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE
MIT
LICENSE.MIT

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published