[FEA]: Dry Run Mode with Mocked Outputs

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

Developing and testing the workflows within the toolkit often involves interactions between LLM agents and various tools (Google search, enterprise APIs, etc). While they are essential for production, these interactions can introduce some challenges during development, such as

Higher operational cost (e.g. frequent LLM queries)
Slower iterations (e.g. network latency from API calls or I/O latency from database queries)
Edge cases (e.g. triggering rate limits, API timeouts, or API server unavailable).

etc.

Describe your ideal solution

A "dry run" mode for the aiq run command, that can be activated via a --dry-run flag and accompanied by a --mock-file option that points to the user-defined workflow YAML file. This file can contain a pre-defined mock responses for some external components.

For example, the following YAML file demonstrates three possible situations on how an agent (or a tool) can respond.

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    max_tokens: 1024
    mocked: true
    # Situation 1: An agent (or a tool) can return responses from a list in order for successive calls within a single dry run
    mocked_responses_sequence:  
      - "<thinking>I need to search for AIQ Toolkit</thinking><action>wikipedia_search</action><action_input>AIQ Toolkit</action_input>"
      - "<thinking>The search returned good info</thinking><answer>AIQ Toolkit is great (mocked).</answer>"
  summary_llm:
    _type: openai
    mocked: true
    conditional_responses:
      # Situation 2: An agent (or a tool) can respond conditionally based on simple matching conditions against its input (e.g., "if input contains 'X', return Y").
      - if_input_contains: "Please summarize:"
        response: "This is a short mocked summary."
      - default_response: "Default summary mocked."
functions:
  wikipedia_search
    _type: wikipedia_search
    mocked: true
    # Situation 3: An agent (or a tool) can always return the default response
    mock_response: "Mocked search result: AIQ Toolkit is a framework for building AI agents."
  query_long_term_memory:
    _type: query_long_term_memory
    mocked: true
    mock_response: '{"id": "123", "data": "mocked_record_value", "error_code": null}'

Benefits

Cost reduction. There will be no charges during mocked LLM agent/tool API calls.
Speed. Instantaneous results.
Determinism. Fewer edge cases. LLM agents or tools' behaviors will be expected.
Easier Testing. Simulate error states more easily.
Offline development. You can now develop and test core agent logic without having to have internet access

Additional context

Relation to Existing Features:

This feature should natually integrate with the existing aiq run command and workflow configuration (workflow.yaml).
It complements the Profiler by allowing developers to isolate and measure the performance of the agent's internal logic separately from external latencies.
It differs from aiq eval --skip_workflow, which reuses actual pre-computed workflow outputs, whereas dry run simulates them.

Inspirations

LangChain's FakeListLLM
Terraform's plan command
API mocking tools like the Postman Mock Servers.
Python's unittest.mock (as already used in this project's unit testings)

I think overall, this user-facing mode would bring benefits by making the development of complex and reliable AI agents faster and more cost-efficient.

Code of Conduct

I agree to follow this project's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA]: Dry Run Mode with Mocked Outputs #336

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe your ideal solution

Benefits

Additional context

Relation to Existing Features:

Inspirations

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA]: Dry Run Mode with Mocked Outputs #336

Description

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe your ideal solution

Benefits

Additional context

Relation to Existing Features:

Inspirations

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions