the devs agentic OSI model

this is just me trying to reason about how the different layers in the modern days AI engineer space, work and interact. you're welcome to join me on my journey.

Layer 1: the inference layer (the physical "brain")


  • what it is: raw llms (claude, GTP, Llama, etc...)
  • the engineers job: choose the right "horsepower". do you need a 70B model for reasoning or a 3B model for speed ?
  • the service: provides raw next-token perdiction capabilities. it's the systems "electricity"

Layer 2: the Knowledge layer (the "data link")


  • what it is: embedding models (i promise you most of us have no clue here) and vector stores (sqlite-vec, chromaDB, lanceDB, Pinceone, etc...)
  • the engineers job: turning unstructured data into a coordinate system. this is where you decide how to "chunk" your code and how to index it (RAG). tl;dr - there are many strategies, you don't have to pick one, nor is it sequential, it could be a DAG or a decision tree (in my oppinion the latter)
  • the service: provides semantic retreival. it can answer questions like: "what informatuion is relevant to this specific task"

Layer 3: context engineering (the "network")


  • what it is: this is where you fine tune the context for the llm, you build prompts, state management, memory, RAG injection.
  • ther engineers job: this is the "planning phase", this is where you gather proper context. you don't just send a prompt, you assemble a lens ( a fram of reference ). you decide what the model "sees".
  • the service: provides a coherent environment. it ensures the model does not get overwehlmed (or choke on the sheer context of it all) with noise, and is able to get the right signals on which to act upon.

Layer 4: the workflow layer (the "transport")


  • what it is: the orchestration layer (chains, loops, multi-agent coordination, etc...)
  • the engineers job: move away from single sessions with an llm to a working porcess. this is where you define "feedback loops" - if the model fails to run a test, it should go back to layer 3, update the context with this error and try again.
  • the service: provides reliability and task completion.

Layer 5: the action layer (the session/presentation)


  • what it is: tool calls, mcp's, file system access, api calls, static scripts (no, not everything is AI, nor should it be)
  • the engineers role: map the intent of the model to a real world function.
  • the service: this provides agency. the ability to "steer" meaning change the state in-flight and not just statically set up dominoes and hope they fall in the correct path.

Layer 6: the harness layer (the "application")


  • what it is: cli/web/mobile/desktop/IDE extention/browser extension app
  • the engineers role: user experience, how does the human interact with this tool.
  • the service: provides utility.



Some key takeaways

layer 2 & 3: the "context knowledge" handshake


  • pitfall: many engineers (so did i) think layer 2 (RAG) is just a search engine, find some text, dump it as is into layer 3 for context.
  • reality: layer 2 is a coordination system. if your "lens" in layer 3 is too wide, you model gets dumb (too much context and noise).
  • refinement: retreival could be a decision tree. use many strategies like BM25, semantic and symbol searches, as weall as import/export graphs. layer 2 isn't just "find similar text", its: "find the function definition, find the consumers, find the tests". i'd even have the retreival happen multi-phase - why ? i believe that tests should tell you the story of the code, so that is even step 1, if the tests exist use them first, then read the code once you get the idea. layer 3 will just assemble the distinct signals into a coherent "lens".

layer 4 & 5: the "agency loop"


this is currently one of the most unstable and unstandardized part in the industrys stack, in my oppinion.

  • the tension: layer 4 (workflow) is the logic, but layer 5 (action) is the reality
  • the "vibe" vs "code" moment: in vibe coding you hope the action layer works. in engineering layer 4 expects layer 5 to fail, and thats why it exists, to steer properly and handle that.
  • the "steering": like i mentioned earlier, the agency is not static dominoes. a true agentic OSI would mean layer 4 can re-route mid-flight based on layer 5's feedback. if a tool on layer 5 fails on a permission issue, layer 4 shouldn't crash, it should go back to layer 3 and ask for a different approach.

layer 1: the "electricity" problem


  • the realization: we often treat layer 1 as a god-like entity. be honest, you know you do. but in a layered model, layer 1 is just a comodity. as an example in my agentic workflows i'll run some bits as static code, and some bits as agents/sub-agents, and when they serve their purpose they "die".
  • the strategy: if your layer 2 (knowledge) & layer 3 (context/lens) are good enough, you can atcually downgrade layer 1 (not the goal here, but an observation). you could potentially run a cheaper and faster model to get the job done, becasue the surrounding environment you provided for your model is such good quality.


In conclusion

we need to stop thinking of AI as a mgic pill, and start treating it as just another node in our workflow. the agentic OSI isn't just theory to me, its an agnostic tool.

when your agent fails, you don't get mad, you loop back and ask yourself did your data-link layer (2) fail to find the proper signals ? did your network layer (3) drown the model in noise ? or perhaps the transport layer (4) failed to handle the reality of the action layer (5).

the next generation of software isn't build by the best prompt engineer, nor prompt based workflows (100% llms/ai), its buit by engineers who build the best environments around their models to compliment them. they tend to their llms with love and care, they know their shortcomings and help them overcome.

so invset more time in layers 2,3 & 4 and don't skip them for just writing code.

if you don't invest in:

  • layer 2: remember, grabage in, grabage out.
  • layer 3: an unfocused llm is like a millenium falcon space pilot at hyperdrive whos high on space cookies
  • layer 4: realiability happens in loops and layers and not by hoping you prompted the corret linear outcome into an agent/skill.

iv'e set out on this path, as I felt even though i'm building incredible tooling (my oppinon only, maybe some peers in our R&D), but it still felt like vibe coding in the begining, structured vibe coding.

and vibe coding is a horrible way to move forwards and scale, this is why we have the term "slop" and skills like "caveman" and others.

I've based myself of the teachings of Dexter Horthy (from HumanLayer ) as well as a lot of talks by Matt Pocock , and even though I may have strayed way off, this was my basis.

don't go complaining to them, these are my own thoughts. hopefully i'll have more to talk aabout in the future.

if you made it this far please drop a comment with your thoughts, lets have a healthy discussion so we can all progress.

To view or add a comment, sign in

Explore content categories