Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
Developing and testing the workflows within the toolkit often involves interactions between LLM agents and various tools (Google search, enterprise APIs, etc). While they are essential for production, these interactions can introduce some challenges during development, such as
- Higher operational cost (e.g. frequent LLM queries)
- Slower iterations (e.g. network latency from API calls or I/O latency from database queries)
- Edge cases (e.g. triggering rate limits, API timeouts, or API server unavailable).
etc.
Describe your ideal solution
A "dry run" mode for the aiq run
command, that can be activated via a --dry-run
flag and accompanied by a --mock-file
option that points to the user-defined workflow YAML file. This file can contain a pre-defined mock responses for some external components.
For example, the following YAML file demonstrates three possible situations on how an agent (or a tool) can respond.
llms:
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0
max_tokens: 1024
mocked: true
# Situation 1: An agent (or a tool) can return responses from a list in order for successive calls within a single dry run
mocked_responses_sequence:
- "<thinking>I need to search for AIQ Toolkit</thinking><action>wikipedia_search</action><action_input>AIQ Toolkit</action_input>"
- "<thinking>The search returned good info</thinking><answer>AIQ Toolkit is great (mocked).</answer>"
summary_llm:
_type: openai
mocked: true
conditional_responses:
# Situation 2: An agent (or a tool) can respond conditionally based on simple matching conditions against its input (e.g., "if input contains 'X', return Y").
- if_input_contains: "Please summarize:"
response: "This is a short mocked summary."
- default_response: "Default summary mocked."
functions:
wikipedia_search
_type: wikipedia_search
mocked: true
# Situation 3: An agent (or a tool) can always return the default response
mock_response: "Mocked search result: AIQ Toolkit is a framework for building AI agents."
query_long_term_memory:
_type: query_long_term_memory
mocked: true
mock_response: '{"id": "123", "data": "mocked_record_value", "error_code": null}'
Benefits
- Cost reduction. There will be no charges during mocked LLM agent/tool API calls.
- Speed. Instantaneous results.
- Determinism. Fewer edge cases. LLM agents or tools' behaviors will be expected.
- Easier Testing. Simulate error states more easily.
- Offline development. You can now develop and test core agent logic without having to have internet access
Additional context
Relation to Existing Features:
- This feature should natually integrate with the existing
aiq run
command and workflow configuration (workflow.yaml). - It complements the Profiler by allowing developers to isolate and measure the performance of the agent's internal logic separately from external latencies.
- It differs from aiq eval --skip_workflow, which reuses actual pre-computed workflow outputs, whereas dry run simulates them.
Inspirations
- LangChain's
FakeListLLM
- Terraform's
plan
command - API mocking tools like the Postman Mock Servers.
- Python's
unittest.mock
(as already used in this project's unit testings)
I think overall, this user-facing mode would bring benefits by making the development of complex and reliable AI agents faster and more cost-efficient.
Code of Conduct
- I agree to follow this project's Code of Conduct
- I have searched the open feature requests and have found no duplicates for this feature request