Skip to content

[FEA]: Dry Run Mode with Mocked Outputs #336

Open
@ZhongxuanWang

Description

@ZhongxuanWang

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

Developing and testing the workflows within the toolkit often involves interactions between LLM agents and various tools (Google search, enterprise APIs, etc). While they are essential for production, these interactions can introduce some challenges during development, such as

  1. Higher operational cost (e.g. frequent LLM queries)
  2. Slower iterations (e.g. network latency from API calls or I/O latency from database queries)
  3. Edge cases (e.g. triggering rate limits, API timeouts, or API server unavailable).

etc.

Describe your ideal solution

A "dry run" mode for the aiq run command, that can be activated via a --dry-run flag and accompanied by a --mock-file option that points to the user-defined workflow YAML file. This file can contain a pre-defined mock responses for some external components.

For example, the following YAML file demonstrates three possible situations on how an agent (or a tool) can respond.

llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    max_tokens: 1024
    mocked: true
    # Situation 1: An agent (or a tool) can return responses from a list in order for successive calls within a single dry run
    mocked_responses_sequence:  
      - "<thinking>I need to search for AIQ Toolkit</thinking><action>wikipedia_search</action><action_input>AIQ Toolkit</action_input>"
      - "<thinking>The search returned good info</thinking><answer>AIQ Toolkit is great (mocked).</answer>"
  summary_llm:
    _type: openai
    mocked: true
    conditional_responses:
      # Situation 2: An agent (or a tool) can respond conditionally based on simple matching conditions against its input (e.g., "if input contains 'X', return Y").
      - if_input_contains: "Please summarize:"
        response: "This is a short mocked summary."
      - default_response: "Default summary mocked."
functions:
  wikipedia_search
    _type: wikipedia_search
    mocked: true
    # Situation 3: An agent (or a tool) can always return the default response
    mock_response: "Mocked search result: AIQ Toolkit is a framework for building AI agents."
  query_long_term_memory:
    _type: query_long_term_memory
    mocked: true
    mock_response: '{"id": "123", "data": "mocked_record_value", "error_code": null}'

Benefits

  1. Cost reduction. There will be no charges during mocked LLM agent/tool API calls.
  2. Speed. Instantaneous results.
  3. Determinism. Fewer edge cases. LLM agents or tools' behaviors will be expected.
  4. Easier Testing. Simulate error states more easily.
  5. Offline development. You can now develop and test core agent logic without having to have internet access

Additional context

Relation to Existing Features:

  • This feature should natually integrate with the existing aiq run command and workflow configuration (workflow.yaml).
  • It complements the Profiler by allowing developers to isolate and measure the performance of the agent's internal logic separately from external latencies.
  • It differs from aiq eval --skip_workflow, which reuses actual pre-computed workflow outputs, whereas dry run simulates them.

Inspirations

  • LangChain's FakeListLLM
  • Terraform's plan command
  • API mocking tools like the Postman Mock Servers.
  • Python's unittest.mock (as already used in this project's unit testings)

I think overall, this user-facing mode would bring benefits by making the development of complex and reliable AI agents faster and more cost-efficient.

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions