- Table of Contents
- Overview
- Software components
- Target audience
- Prerequisites
- Hardware requirements
- API definition
- Use case description
- Getting started
- Running the workflow
- Customizing the Workflow
- Troubleshooting
- Testing and validation
- License
- Terms of Use
This repository is what powers the build experience, showcasing vulnerability analysis for container security using NVIDIA NIM microservices and NVIDIA NeMo Agent Toolkit.
The NVIDIA AI Blueprint demonstrates accelerated analysis on common vulnerabilities and exposures (CVE) at an enterprise scale, reducing mitigation from days and hours to just seconds. While traditional methods require substantial manual effort to pinpoint solutions for vulnerabilities, these technologies enable quick, automatic, and actionable CVE risk analysis using large language models (LLMs) and retrieval-augmented generation (RAG). With this blueprint, security analysts can expedite the process of determining whether a software package includes exploitable and vulnerable components using LLMs and event-driven RAG triggered by the creation of a new software package or the detection of a CVE.
The following are used by this blueprint:
This blueprint is for:
- Security analysts and IT engineers: People analyzing vulnerabilities and ensuring the security of containerized environments.
- AI practitioners in cybersecurity: People applying AI to enhance cybersecurity, particularly those interested in using the NeMo Agent Toolkit and NIMs for faster vulnerability detection and analysis.
- NVAIE developer licence
- API keys for vulnerability databases, search engines, and LLM model service(s).
- Details can be found in this later section: Obtain API keys
Below are the hardware requirements for each component of the vulnerability analysis workflow.
The overall hardware requirements depend on selected workflow configuration. At a minimum, the hardware requirements for workflow operation must be met. The LLM NIM and Embedding NIM hardware requirements only need to be met if self-hosting these components. See Using self-hosted NIMs, Customizing the LLM models and Customizing the embedding model sections for more information.
- (Optional) LLM NIM: Meta Llama 3.1 70B Instruct Support Matrix
- This workflow makes heavy use of parallel LLM calls to accelerate processing. For improved parallel performance (for example, in production workloads), we recommend 8x or more H100s for LLM inference.
- (Optional) Embedding NIM: NV-EmbedQA-E5-v5 Support Matrix
Determining the impact of a documented CVE on a specific project or container is a labor-intensive and manual task, especially as the rate of new reports into the CVE database accelerates. This process involves the collection, comprehension, and synthesis of various pieces of information to ascertain whether immediate remediation is necessary upon the identification of a new CVE.
Current challenges in CVE analysis:
- Information collection: The process involves significant manual labor to collect and synthesize relevant information.
- Decision complexity: Decisions on whether to update a library impacted by a CVE often hinge on various considerations, including:
- Scan false positives: Occasionally, vulnerability scans may incorrectly flag a library as vulnerable, leading to a false alarm.
- Mitigating factors: In some cases, existing safeguards within the environment may reduce or negate the risk posed by a CVE.
- Lack of required environments or dependencies: For an exploit to succeed, specific conditions must be met. The absence of these necessary elements can render a vulnerability irrelevant.
- Manual documentation: Once an analyst has determined the library is not affected, a Vulnerability Exploitability eXchange (VEX) document must be created to standardize and distribute the results.
The efficiency of this process can be significantly enhanced through the deployment of an automated LLM agent workflow, leveraging generative AI to improve vulnerability defense while decreasing the load on security teams.
The workflow operates using a Plan-and-Execute-style LLM pipeline for CVE impact analysis. The process begins with an LLM planner that generates a context-sensitive task checklist. This checklist is then executed by an LLM agent equipped with Retrieval-Augmented Generation (RAG) capabilities. The gathered information and the agent's findings are subsequently summarized and categorized by additional LLM nodes to provide a final verdict.
Tip
The workflow is adaptable, with support for NIM and OpenAI LLM APIs. NIM models can be hosted on build.nvidia.com or self-hosted.
The detailed architecture consists of the following components:
-
Security scan result: The workflow begins by inputting the identified CVEs from a container security scan as input. This can be generated from a container image scanner of your choosing such as Anchore.
-
PreProcessing: All the below actions are encapsulated by multiple NeMo Agent toolkit functions to prepare the data for use with the LLM engine.
- Code repository and documentation: The blueprint pulls code repositories and documentation provided by the user. These repositories are processed through an embedding model, and the resulting embeddings are stored in vector databases (VDBs) for the agent's reference.
- Vector database: Various vector databases can be used for the embedding. We currently utilize FAISS for the VDB because it does not require an external service and is simple to use. Any vector store can be used, such as NVIDIA cuVS, which would provide accelerated indexing and search.
- Lexical search: As an alternative, a lexical search is available for use cases where creating an embedding is impractical due to a large number of source files in the target container.
- Software Bill of Materials (SBOM): A Software Bill of Materials (SBOM) is a machine-readable manifest of all the dependencies of a software package or container. The blueprint cross-references every entry in the SBOM for known vulnerabilities and looks at the code implementation to see whether the implementation puts users at risk—just as a security analyst would do. For this reason, starting with an accurate SBOM is an important first step. SBOMs can be generated for any container using the open-source tool Syft. For more information on generating SBOMs for your containers, see the SBOM documentation.
- Web vulnerability intel: The system collects detailed information about each CVE through web scraping and data retrieval from various public security databases, including GHSA, Redhat, Ubuntu, and NIST CVE records, as well as tailored threat intelligence feeds.
- Code repository and documentation: The blueprint pulls code repositories and documentation provided by the user. These repositories are processed through an embedding model, and the resulting embeddings are stored in vector databases (VDBs) for the agent's reference.
-
Core LLM engine: The below actions comprise the core LLM engine and are each implemented as NeMo Agent toolkit functions within the workflow.
-
Checklist generation: Leveraging the gathered information about each vulnerability, the checklist generation node creates a tailored, context-sensitive task checklist designed to guide the impact analysis. (See
src/vuln_analysis/tools/cve_checklist.py
.) -
Task agent: At the core of the process is an LLM agent iterating through each item in the checklist. For each item, the agent answers the question using a set of tools which provide information about the target container. The tools tap into various data sources (web intel, vector DB, search etc.), retrieving relevant information to address each checklist item. The loop continues until the agent resolves each checklist item satisfactorily. (See
src/vuln_analysis/tools/cve_agent.py
.) -
Summarization: Once the agent has compiled findings for each checklist item, these results are condensed by the summarization node into a concise, human-readable paragraph. (See
src/vuln_analysis/tools/cve_summarize.py
.) -
Justification Assignment: Given the summary, the justification status categorization node then assigns a resulting VEX (Vulnerability Exploitability eXchange) status to the CVE. We provided a set of predefined categories for the model to choose from. (See
src/vuln_analysis/tools/cve_justify.py
.) If the CVE is deemed exploitable, the reasoning category is "vulnerable." If there is no vulnerable packages detected from the SBOM or insufficient intel gathered the agent is bypassed and an appropriate label is provided. If it is not exploitable, there are 10 different reasoning categories to explain why the vulnerability is not exploitable in the given environment:false_positive
code_not_present
code_not_reachable
requires_configuration
requires_dependency
requires_environment
protected_by_compiler
protected_at_runtime
protected_by_perimeter
protected_by_mitigating_control
-
-
Output: At the end of the workflow run, an output file including all the gathered and generated information is prepared for security analysts for a final review. (See
src/vuln_analysis/tools/cve_file_output.py
.)
Warning
All output should be vetted by a security analyst before being used in a cybersecurity application.
The NeMo Agent toolkit can utilize various embedding model and LLM endpoints, and is optimized to use NVIDIA NIM microservices (NIMs). NIMs are pre-built containers for the latest AI models that provide industry-standard APIs and optimized inference for the given model and hardware. Using NIMs enables easy deployment and scaling for self-hosted model inference.
The current default embedding NIM model is nv-embedqa-e5-v5
, which was selected to balance speed and overall workflow accuracy. The current default LLM model is the llama-3.1-70b-instruct
NIM, with specifically tailored prompt engineering and edge case handling. Other models are able to be substituted for either the embedding or LLM model, such as smaller, fine-tuned NIM LLM models or other external LLM inference services. Subsequent updates will provide more details about fine-tuning and data flywheel techniques.
Note
Within the NeMo Agent toolkit workflow, the LangChain framework is employed to deploy all LLMs and agents, and the LangGraph framework is used for orchestration, streamlining efficiency and reducing the need for duplicative efforts.
Tip
Routinely checked validation datasets are critical to ensuring proper and consistent outputs. Learn more about our test-driven development approach in the section on testing and validation.
- git
- git-lfs
- Since the workflow uses NVIDIA NeMo Agent Toolkit, the NeMo Agent toolkit requirements also need to be installed.
To run the workflow you need to obtain API keys for the following APIs. These will be needed in a later step to Set up the environment file.
-
Required API Keys: These APIs are required by the workflow to retrieve vulnerability information from databases, perform online searches, and execute LLM queries.
- GitHub Security Advisory (GHSA) Database
- Follow these instructions to create a personal access token. No repository access or permissions are required for this API.
- This will be used in the
GHSA_API_KEY
environment variable.
- National Vulnerability Database (NVD)
- Follow these instructions to create an API key.
- This will be used in the
NVD_API_KEY
environment variable.
- SerpApi
- Go to https://serpapi.com/ and create a SerpApi account. Once signed in, navigate to Your Account > Api Key.
- This will be used in the
SERPAPI_API_KEY
environment variable.
- NVIDIA Inference Microservices (NIM)
- There are two possible methods to generate an API key for NIM:
- Sign in to the NVIDIA Build portal with your email.
- Click on any model, then click "Get API Key", and finally click "Generate Key".
- Sign in to the NVIDIA NGC portal with your email.
- Select your organization from the dropdown menu after logging in. You must select an organization which has NVIDIA AI Enterprise (NVAIE) enabled.
- Click on your account in the top right, select "Setup" from the dropdown.
- Click the "Generate Personal Key" option and then the "+ Generate Personal Key" button to create your API key.
- Sign in to the NVIDIA Build portal with your email.
- This will be used in the
NVIDIA_API_KEY
environment variable.
- There are two possible methods to generate an API key for NIM:
- GitHub Security Advisory (GHSA) Database
The workflow can be configured to use other LLM services as well, see the Customizing the LLM models section for more info.
Clone the repository and set an environment variable for the path to the repository root.
export REPO_ROOT=$(git rev-parse --show-toplevel)
All commands are run from the repository root unless otherwise specified.
First we need to create an .env
file in the REPO_ROOT
, and add the API keys you created in the earlier Obtain API keys step.
cd $REPO_ROOT
cat <<EOF > .env
GHSA_API_KEY="your GitHub personal access token"
NVD_API_KEY="your National Vulnerability Database API key"
NVIDIA_API_KEY="your NVIDIA Inference Microservices API key"
SERPAPI_API_KEY="your SerpApi API key"
EOF
These variables need to be exported to the environment:
export $(cat .env | xargs)
In order to pull images required by the workflow from NGC, you must first authenticate Docker with NGC. You can use same the NVIDIA API Key obtained in the Obtain API keys section (saved as NVIDIA_API_KEY
in the .env
file).
echo "${NVIDIA_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
Next, build the vuln-analysis
container from source using the following command. This ensures that the container includes all the latest changes from the repository.
cd $REPO_ROOT
# Build the vuln-analysis container
docker compose build vuln-analysis
There are two supported configurations for starting the Docker containers. Both configurations utilize docker compose
to start the service:
- NVIDIA-hosted NIMs: The workflow is run with all computation being performed by NIMs hosted in NVIDIA GPU Cloud. This is the default configuration and is recommended for most users getting started with the workflow.
- When using NVIDIA-hosted NIMs, only the
docker-compose.yml
configuration file is required.
- When using NVIDIA-hosted NIMs, only the
- Self-hosted NIMs: The workflow is run using self-hosted LLM NIM services. This configuration is more advanced and requires additional setup to run the NIM services locally.
- When using self-hosted NIMs, both the
docker-compose.yml
anddocker-compose.nim.yml
configuration files are required.
- When using self-hosted NIMs, both the
These two configurations are illustrated by the following diagram:
Before beginning, ensure that the environment variables are set correctly. Both configurations require the same environment variables to be set. More information on setting these variables can be found in the Obtain API keys section.
Tip
The container binds to port 8080 by default. If you encounter a port collision error (for example, Bind for 0.0.0.0:8080 failed: port is already allocated
), you can set the environment variable NGINX_HOST_HTTP_PORT
to specify a custom port before launching docker compose
. For example:
export NGINX_HOST_HTTP_PORT=8081
#... docker compose commands...
When running the workflow in this configuration, only the vuln-analysis
service needs to be started since we will utilize NIMs hosted by NVIDIA. The vuln-analysis
container can be started using the following command:
cd ${REPO_ROOT}
docker compose up -d
The command above starts the container in the background using the detached mode, -d
. We can confirm the container is running via the following command:
docker compose ps
Next, we need to attach to the vuln-analysis
container to access the environment where the workflow command line tool and dependencies are installed.
docker compose exec -it vuln-analysis bash
Continue to the Running the workflow section to run the workflow.
To run the workflow using self-hosted NIMs, we use a second docker compose
configuration file, docker-compose.nim.yml
, which adds the self-hosted NIM services to the workflow. Utilizing a second configuration file allows for easy switching between the two configurations while keeping the base configuration file the same.
Note
The self-hosted NIM services require additional GPU resources to run. With this configuration, the LLM NIM, embedding model NIM, and the vuln-analysis
service will all be launched on the same machine. Ensure that you have the necessary hardware requirements for all three services before proceeding (multiple services can share the same GPU).
To use multiple configuration files, we need to specify both configuration files when running the docker compose
command. You will need to specify both configuration files for every docker compose
command. For example:
docker compose -f docker-compose.yml -f docker-compose.nim.yml [NORMAL DOCKER COMPOSE COMMAND]
For example, to start the vuln-analysis
service with the self-hosted NIMs, you would run:
cd ${REPO_ROOT}
docker compose -f docker-compose.yml -f docker-compose.nim.yml up -d
Next, we need to attach to the vuln-analysis
container to access the environment where the workflow command line tool and dependencies are installed.
docker compose -f docker-compose.yml -f docker-compose.nim.yml exec -it vuln-analysis bash
Continue to the Running the workflow section to run the workflow.
Once the services have been started, the workflow can be run using either the Quick start user guide notebook for an interactive step-by-step process, or directly from the command line.
To run the workflow in an interactive notebook, connect to the Jupyter notebook at http://localhost:8000/lab. Once connected, navigate to the notebook located at quick_start/quick_start_guide.ipynb
and follow the instructions.
Tip
If you are running the workflow on a remote machine, you can forward the port to your local machine using SSH. For example, to forward port 8000 from the remote machine to your local machine, you can run the following command from your local machine:
ssh -L 8000:127.0.0.1:8000 <remote_host_name>
The vulnerability analysis workflow is designed to be run using the aiq
command line tool installed within the vuln-analysis
container. This section describes how get started using the command line tool. For more detailed information about the command line interface, see the NeMo Agent toolkit Command Line Interface (CLI) documentation.
The workflow settings are controlled using configuration files. These are YAML files that define the functions, tools, and models to use in the workflow. Example configuration files are located in the configs/
folder.
Note
The configs/
and data/
directories are symlinks pointing to the actual file locations in the src/vuln_analysis/configs/
and src/vuln_analysis/data/
directories respectively. The symlinks are available for convenience.
A brief description of each configuration file is as follows:
config.yml
: This configuration file defines the functions, tools, and models used by the vulnerability analysis workflow, as described above in Key Components section.config-tracing.yml
: This configuration file is identical toconfig.yml
but adds configuration for observing traces of this workflow in Phoenix.
There are three main modalities that the workflow can be run in using the following commands:
aiq run
: The workflow will process the input data, then it will shut down after it is completed. This modality is suitable for rapid iteration during testing and development.aiq serve
: The workflow is turned into a microservice which will run indefinitely, which is suitable for using in production.aiq eval
: Similar to theaiq run
command. However, in addition to running the workflow, it is also used for profiling and evaluating accuracy of the workflow.
For a breakdown of the configuration file and available options, see the Configuration file reference section. To customize the configuration files for your use case, see Customizing the workflow.
The workflow can be started using the following command:
aiq run --config_file=${CONFIG_FILE} --input_file=data/input_messages/morpheus:23.11-runtime.json
In the command, ${CONFIG_FILE}
is the path to the configuration file you want to use. For example, to run the workflow with the config.yml
configuration file, you would run:
aiq run --config_file=configs/config.yml --input_file=data/input_messages/morpheus:23.11-runtime.json
When the workflow runs to completion, you should see logs similar to the following:
Vulnerability 'GHSA-3f63-hfp8-52jq' affected status: FALSE. Label: code_not_reachable
Vulnerability 'CVE-2023-50782' affected status: FALSE. Label: requires_configuration
Vulnerability 'CVE-2023-36632' affected status: FALSE. Label: code_not_present
Vulnerability 'CVE-2023-43804' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-cxfr-5q3r-2rc2' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-554w-xh4j-8w64' affected status: TRUE. Label: vulnerable
Vulnerability 'GHSA-3ww4-gg4f-jr7f' affected status: FALSE. Label: requires_configuration
Vulnerability 'CVE-2023-31147' affected status: FALSE. Label: code_not_present
--------------------------------------------------
Workflow Result:
{"input":{"scan":{"id":"8351fd75-4798-42c9-81d8-a43d7df838fd","type":null,"started_at":"2025-06-25T20:21:19.698253","completed_at":"2025-06-25T20:30:09.667598","vulns":[{"vuln_id":"GHSA-3f63-hfp8-52jq","description":null,"score":null,"severity":null,"published_date":null,"last_modified_date":null,"url":null,"feed_group":null,"package":null,"package_version":null,"package_name":null,"package_type":null},{"vuln_id":"CVE-2023-50782","description":null,"score":null,"severity":null,"published_date":null,"last_modified_date":null,"url":null,"feed_group":null,"package":null,"package_version":null,"package_name":null,"package_type":null},{"vuln_id":"CVE-2023-36632","description":null,"score":null,"severity":null,"published_date":null,"last_modified_date":null,"url":null,"feed_group":null,"package":null,"package_version":null,"package_name":null,
...
Warning
The output you receive from the workflow may not be identical as the output in the example above. The output may vary due to the non-deterministic nature of the LLM models.
The full workflow JSON output is also stashed by default at .tmp/output.json
. The output JSON includes the following top level fields:
input
: contains the inputs that were provided to the workflow, such as the container and repo source information, the list of vulnerabilities to scan, etc.info
: contains additional information collected by the workflow for decision making. This includes paths to the generated VDB files, intelligence from various vulnerability databases, the list of SBOM packages, and any vulnerable dependencies that were identified.output
: contains the output from the core LLM Engine, including the generated checklist, analysis summary, and justification assignment.
In addition to the raw JSON output, you can also view a Markdown-formatted report for each CVE in the .tmp/vulnerability_markdown_reports
directory. This view is helpful for human analysts reviewing the results.
Tip
To return detailed steps taken by the LLM agent in the output, set return_intermediate_steps
to true
for the cve_agent_executor
function in the configuration file. This can be helpful for explaining the output, and for troubleshooting unexpected results.
Similarly, to run the workflow with the aiq serve
command, you would run:
aiq serve --config_file=configs/config.yml --host 0.0.0.0 --port 26466
This command starts an HTTP server that listens on port 26466
and runs the workflow indefinitely, waiting for incoming data to process. This is useful if you want to trigger the workflow on demand via HTTP requests.
Once the server is running, you can send a POST
request to the /generate
endpoint with the input parameters in the request body. The workflow will process the input data and return the output in the terminal and the given output path in the config file.
Here's an example using curl
to send a POST
request. From a new terminal outside of the container, go to the root of the cloned git repository, and run:
curl -X POST --url http://localhost:26466/generate --header 'Content-Type: application/json' --data @data/input_messages/morpheus:23.11-runtime.json
In this command:
http://localhost:26466/generate
is the URL of the server and endpoint.- The
-d
option specifies the data file being sent in the request body. In this case, it's pointing to the input filemorpheus:23.11-runtime.json
under thedata/input_messages/
directory. You can refer to this file as an example of the expected data format.- Since it uses a relative path, it's important to run the
curl
command from the root of the git repository. Alternatively, you can modify the relative path in the command to directly reference the example json file.
- Since it uses a relative path, it's important to run the
Note that the results of the workflow are not returned to the curl request. After processing the request, the server will save the results to the output path specified in the configuration file. The server will also display log and summary results from the workflow as it's running. Additional submissions to the server will append the results to the specified output file.
The workflow configuration file, config.yml, includes settings in the eval
section for use with the NeMo Agent toolkit profiler. The profiler can be started using the following command:
aiq eval --config_file=configs/config.yml
The profiler collects usage statistics and stores them in the .tmp/eval/cve_agent
directory as configured in config.yml. In this directory, you will find the following files:
- all_requests_profiler_traces.json
- gantt_chart.png
- inference_optimization.json
- standardized_data_all.csv
- workflow_output.json
- workflow_profiling_metrics.json
- workflow_profiling_report.txt
More information about analyzing the profiling results can be found here in the NeMo Agent toolkit documentation.
A separate workflow configuration file, config-tracing.yml is provided that enables tracing in the workflow using Phoenix.
First, run the following command in separate terminal within the container to start a Phoenix server on port 6006:
phoenix serve
You can now use config-tracing.yml
to run the workflow with tracing enabled:
aiq run --config_file=configs/config-tracing.yml --input_file=data/input_messages/morpheus:23.11-runtime.json
Tracing will also be enabled with config-tracing.yml
when running workflow with aiq serve
and aiq eval
.
Open your browser and navigate to http://localhost:6006 to view the traces.
The main entrypoint is the aiq
CLI tool with built-in documentation using the --help
command. For example, to see what commands are available, you can run:
(.venv) root@04a3a12a687d:/workspace# aiq --help
Usage: aiq [OPTIONS] COMMAND [ARGS]...
Main entrypoint for the AIQ Toolkit CLI
Options:
--version Show the version and exit.
--log-level [debug|info|warning|error|critical]
Set the logging level [default: INFO]
--help Show this message and exit.
Commands:
configure Configure AIQ Toolkit developer preferences.
eval Evaluate a workflow with the specified dataset.
info Provide information about the local AIQ Toolkit environment.
mcp Run an AIQ Toolkit workflow using the mcp front end.
registry Utility to configure AIQ Toolkit remote registry channels.
run Run an AIQ Toolkit workflow using the console front end.
serve Run an AIQ Toolkit workflow using the fastapi front end.
start Run an AIQ Toolkit workflow using a front end configuration.
uninstall Uninstall an AIQ Toolkit plugin packages from the local...
validate Validate a configuration file
workflow Interact with templated workflows.
The configuration defines how the workflow operates, including functions, LLMs, and embedders along with general configuration settings. More details about the NeMo Agent toolkit workflow configuration file can be found here.
- General configuration (
general
): Thegeneral
section contains general configuration settings for NeMo Agent toolkit which is not specific to any workflow.use_uvloop
: Specifies whether to useuvloop
event loop which can provide significant speedup. For debugging purposes it is recommended to set this tofalse
.telemetry.logging
: Sets log level for logging.telemetry.tracing
: This is used inconfig-tracing.yml
whereendpoint
is set to a Phoenix server. Traces of workflow can then be viewed in Phoenix UI.
- Functions (
functions
): Thefunctions
section contains the tools used in the workflow.- Preprocessing functions:
cve_generate_vdbs
: Generates vector database from code repositories and documentation.agent_name
: Name of the agent executor (cve_agent_executor
). Used to determine which tools are enabled in the agent to conditionally generate vector databases or indexes.embedder_name
: Name of embedder (nim-embedder
) configured inembedders
section.base_vdb_dir
: The directory used for storing vector database files. Default is.cache/am_cache/vdb
.base_git_dir
: The directory for storing pulled git repositories used for code analysis. Default is.cache/am_cache/git
.base_code_index_dir
: The directory used for storing code index files. Default is./cache/am_cache/code_index
.
cve_fetch_intel
: Fetches details about CVEs from NIST and CVE Details websites.cve_process_sbom
: Prepares and validates input SBOM.cve_check_vuln_deps
: Cross-references every entry in the SBOM for known vulnerabilities.
- Core LLM engine functions:
cve_checklist
: Generates tailored, context-sensitive task checklist for impact analysis.Container Image Code QA System
: Retriever tool used bycve_agent_executor
to query source code vector database.Container Image Developer Guide QA System
: Retriever tool used bycve_agent_executor
to query documentation vector database.Lexical Search Container Image Code QA System
: Lexical search tool used bycve_agent_executor
to search source code. This tool is an alternative toContainer Image Code QA System
and can be useful for very large code bases that take a long time to embed as a vector database. Disabled by default, enable by uncommenting the tool incve_agent_executor
.Internet Search
: SerpApi Google search tool used bycve_agent_executor
.cve_agent_executor
: Iterates through checklist items using provided tools and gathered intel.llm_name
: Name of LLM (cve_agent_executor_llm
) configured inllms
section.tool_names
: Container Image Code QA System, Container Image Developer Guide QA System, (Optional) Lexical Search Container Image Code QA System, Internet Searchmax_concurrency
: Controls the maximum number of concurrent requests to the LLM. Default isnull
, which doesn't limit concurrency.max_iterations
: The maximum number of iterations for the agent. Default is 10.prompt_examples
: Whether to include examples in agent prompt. Default isfalse
.replace_exceptions
: Whether to replace exception message with custom message. Default istrue
.replace_exceptions_value
: Ifreplace_exceptions
istrue
, use this message. Default isI do not have a definitive answer for this checklist item."
return_intermediate_steps
: Controls whether to return intermediate steps taken by the agent, and include them in the output file. Helpful for troubleshooting agent responses. Default isfalse
.verbose
: Set to true for verbose output. Default isfalse
.
cve_summarize
: Generates concise, human-readable summarization paragraph from agent results.llm_name
: Name of LLM (summarize_llm
) configured inllms
section.
cve_justify
: Assigns justification label and reason to each CVE based on summary.llm_name
: Name of LLM (justify_llm
) configured inllms
section.
- Postprocessing/Output functions
cve_file_output
: Outputs workflow results to a file.file_path
: Defines the path to the file where the output will be saved.markdown_dir
: Defines the path to the directory where the output will be saved in individual navigable markdown files per CVE-ID.overwrite
: Indicates whether the output file should be overwritten when the workflow starts if it already exists. Will throw an error if set toFalse
and the file already exists. Note that the overwrite behavior only occurs on workflow initialization. For pipelines started in HTTP mode, each new request will append the existing file until the workflow is restarted.
- Preprocessing functions:
- LLMs (
llms
): Thellms
section contains the LLMs used by the workflow. Functions can reference LLMs in this section to use. The supported LLM API types in NeMo Agent toolkit arenim
andopenai
The models in this workflow usenim
.- Configured models in this workflow:
checklist_llm
,code_vdb_retriever_llm
,doc_vdb_retriever_llm
,cve_agent_executor_llm
,summarize_llm
,justify_llm
- Each
nim
model is configured with the following attributes defined in the NeMo Agent toolkit's NimModelConfig. Use OpenAIModelConfig foropenai
models.base_url
: Optional attribute to overridehttps://integrate.api.nvidia.com/v1
model_name
: The name of the LLM model used by the node.temperature
: Controls randomness in the output. A lower temperature produces more deterministic results.max_tokens
: Defines the maximum number of tokens that can be generated in one output step.top_p
: Limits the diversity of token sampling based on cumulative probability.
- Configured models in this workflow:
- Embedding models (
embedders
): Theembedders
section contains the embedding models used by the workflow. Functions can reference embedding models in this section to use. The supported embedding model API types in NeMo Agent toolkit arenim
andopenai
.- The models uses
nim
model,nvidia/nv-embedqa-e5-v5
. - Each
nim
embedding model is configured with the following attributes defined in the NeMo Agent Toolkit's NimEmbedderModelConfig. Use OpenAIEmbedderModelConfig foropenai
embedding models.base_url
: Optional attribute to overridehttps://integrate.api.nvidia.com/v1
model_name
: The name of the LLM model used by the node.truncate
: Specifies how inputs longer than the maximum token length of the model are handled. PassingSTART
discards the start of the input.END
discards the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. IfNONE
is selected, when the input exceeds the maximum input token length an error will be returned.max_batch_size
: Specifies the batch size to use when generating embeddings. We recommend setting this to 128 (default) or lower when using the cloud-hosted embedding NIM. When using a local NIM, this value can be tuned based on throughput/memory performance on your hardware.
- The models uses
- Workflow (
workflow
): Theworkflow
section ties the previous sections together by defining the tools and LLM models to use in the workflow._type
: This is set tocve_agent
indicating to NeMo Agent toolkit to use the function defined in register.py for the workflow.- The remaining configuration items correspond to attributes in CVEWorkflowConfig to specify the registered tools to use in the workflow.
- Evaluations and Profiling (
eval
): Theeval
section contains the evaluation settings for the workflow. Refer to Evaluating NVIDIA Agent Intelligence Toolkit Workflows for more information about NeMo Agent toolkit built-in evaluators as well as the plugin system to add custom evaluators. The CVE workflow uses theeval
section to configure a profiler that uses the NeMo Agent toolkit evaluation system to collect usage statistics and store them to the local file system. You can find more information about NeMo Agent toolkit profiling and performance monitoring here.general.output_dir
: Defines the path to the directory where profiling results will be saved.general.dataset
: Defines file path and format of dataset used to run profiling.profiler
: The profiler for this workflow is configured with the following options.token_uniqueness_forecast
: Compute inter query token uniquenessworkflow_runtime_forecast
: Compute expected workflow runtimecompute_llm_metrics
: Compute inference optimization metricscsv_exclude_io_text
: Avoid dumping large text into the output CSV (helpful to not break structure)prompt_caching_prefixes
: Identify common prompt prefixesbottleneck_analysis
: Enable bottleneck analysisconcurrency_spike_analysis
: Enable concurrency spike analysis. Set thespike_threshold
to 7, meaning that any concurrency spike above 7 will be raised to the user specifically.
The docker compose file includes an nginx-cache
proxy server container that enables caching for API requests made by the workflow. It is highly recommend to route API requests through the proxy server to reduce API calls for duplicate requests and improve workflow speed. This is especially useful when running the workflow multiple times with the same configuration (for example, for debugging) and can help keep costs down when using paid APIs.
The NGINX proxy server is started by default when running the vuln-analysis
service. However, it can be started separately using the following command:
cd ${REPO_ROOT}
docker compose up --detach nginx-cache
To use the proxy server for API calls in the workflow, you can set environment variables for each base URL used by the workflow to point to http://localhost:${NGINX_HOST_HTTP_PORT}/
. These are set automatically when running the vuln-analysis
service, but can be set manually in the .env
file as follows:
CVE_DETAILS_BASE_URL="http://localhost:8080/cve-details"
CWE_DETAILS_BASE_URL="http://localhost:8080/cwe-details"
DEPSDEV_BASE_URL="http://localhost:8080/depsdev"
FIRST_BASE_URL="http://localhost:8080/first"
GHSA_BASE_URL="http://localhost:8080/ghsa"
NGC_API_BASE="http://localhost:8080/nemo/v1"
NIM_EMBED_BASE_URL="http://localhost:8080/nim_embed/v1"
NVD_BASE_URL="http://localhost:8080/nvd"
NVIDIA_API_BASE="http://localhost:8080/nim_llm/v1"
OPENAI_API_BASE="http://localhost:8080/openai/v1"
OPENAI_BASE_URL="http://localhost:8080/openai/v1"
RHSA_BASE_URL="http://localhost:8080/rhsa"
SERPAPI_BASE_URL="http://localhost:8080/serpapi"
UBUNTU_BASE_URL="http://localhost:8080/ubuntu"
The primary method for customizing the workflow is to generate a new configuration file with new options. The configuration file defines the workflow settings, such as the functions, the LLM models used, and the output format. The configuration file is a YAML file that can be modified to suit your needs.
To use an SBOM from a URL (example), update the SBOM info configuration to:
"sbom_info": {
"_type": "http",
"url": "https://raw.githubusercontent.com/NVIDIA-AI-Blueprints/vulnerability-analysis/refs/heads/main/src/vuln_analysis/data/sboms/nvcr.io/nvidia/morpheus/morpheus%3Av23.11.01-runtime.sbom"
}
The workflow configuration file also allows customizing the LLM model and parameters for each component of the workflow, as well as which LLM API is used when invoking the model.
In any configuration file, locate the llms
section to see the current settings. For example, the following snippet defines the LLM used for the checklist model:
checklist_llm:
_type: nim
base_url: ${NVIDIA_API_BASE:-https://integrate.api.nvidia.com/v1}
model_name: ${CHECKLIST_MODEL_NAME:-meta/llama-3.1-70b-instruct}
temperature: 0.0
max_tokens: 2000
top_p: 0.01
_type
: specifies the LLM API type. Refer to the Supported LLM APIs table for available options.base_url
: Base URL for LLM. Sets to value ofNVIDIA_API_BASE
environment variable if set. Otherwise, sets to default NIM base URL.base_url
can also be omitted in which case, the default base URL is used for typenim
. We use an environment variable here so that we can easily set it to proxy server URL when running withdocker compose
.model_name
: specifies the model name within the LLM API. Sets to value ofCHECKLIST_MODEL_NAME
environment variable if set. Otherwise, sets to default checklist model,meta/llama-3.1-70b-instruct
. This is also applicable to the other LLM models in the workflow, each having their own environment variable for setting the model name. Refer to the LLM API documentation to determine the available models.temperature
,max_tokens
,top_p
, ...: specifies the model parameters. The available parameters can be found in the NeMo Agent Toolkit NIMModelConfig. Any non-supported parameters provided in the configuration will be ignored.
Name | _type |
Auth Env Var(s) | Base URL Env Var(s) | Proxy Server Route |
---|---|---|---|---|
NVIDIA Inference Microservices (NIMs) (Default) | nim |
NVIDIA_API_KEY |
NVIDIA_API_BASE |
/nim_llm/v1 |
OpenAI | openai |
OPENAI_API_KEY |
OPENAI_API_BASE (used by langchain )OPENAI_BASE_URL (used by openai ) |
/openai/v1 |
- Obtain an API key and any other required auth info for the selected service.
- Update the
.env
file with the auth and base URL environment variables for the service as indicated in the Supported LLM APIs table. If you choose not to use the default LLM models in your workflow (meta/llama-3.1-70b-instruct
), you can also add environment variables to override the model names to your.env
file. In addition toCHECKLIST_MODEL_NAME
, you can also setmodel_name
for the other LLM models usingCODE_VDB_RETRIEVER_MODEL_NAME
,DOC_VDB_RETRIEVER_MODEL_NAME
,CVE_AGENT_EXECUTOR_MODEL_NAME
,SUMMARIZE_MODEL_NAME
, andJUSTIFY_MODEL_NAME
. - Update the config file as described above. For example, if you want to use OpenAI's
gpt-4o
model for checklist generation, updatechecklist_llm
in thellms
section to:
checklist_llm:
_type: openai
model_name: ${CHECKLIST_MODEL_NAME:-gpt-4o}
temperature: 0.0
seed: 0
top_p: 0.01
max_retries: 5
Please note that the prompts have been tuned to work best with the Llama 3.1 70B NIM and that when using other LLM models it may be necessary to adjust the prompting.
Vector databases are used by the agent to fetch relevant information for impact analysis investigations. The embedding model used to vectorize your documents can significantly affect the agent's performance. The default embedding model used by the workflow is the NIM nvidia/nv-embedqa-e5-v5 model, but you can experiment with different embedding models of your choice.
To test a custom embedding model, modify the workflow configuration file (for example, config.yml
) in the embedders
section. For example, the following snippet defines the settings for the default embedding model. The full set of available parameters for a NIM embedder can be found here.
nim_embedder:
_type: nim
base_url: ${NVIDIA_API_BASE:-https://integrate.api.nvidia.com/v1}
model_name: ${EMBEDDER_MODEL_NAME:-nvidia/nv-embedqa-e5-v5}
truncate: END
max_batch_size: 128
_type
: specifies the LLM API type. Refer to the Supported LLM APIs table for available options.base_url
: Base URL for embedding model. Sets to value ofNVIDIA_API_BASE
environment variable if set. Otherwise, sets to default NIM base URL.base_url
can also be omitted in which case, the default basemodel_name
: specifies the model name for the embedding provider. Sets to value ofEMBEDDER_MODEL_NAME
environment variable if set. Otherwise, sets to default embedding model,nvidia/nv-embedqa-e5-v5
. Refer to the embedding provider's documentation to determine the available models.truncate
: specifies how inputs longer than the maximum token length of the model are handled. PassingSTART
discards the start of the input.END
discards the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. IfNONE
is selected, when the input exceeds the maximum input token length an error will be returned.max_batch_size
: specifies the batch size to use when generating embeddings. We recommend setting this to 128 (default) or lower when using the cloud-hosted embedding NIM. When using a local NIM, this value can be tuned based on throughput/memory performance on your hardware.
-
If using OpenAI embeddings, first obtain an API key, then update the
.env
file with the auth and base URL environment variables for the service as indicated in the Supported LLM APIs table. Otherwise, proceed to step 2. If you choose not to use the default embedding model (nvidia/nv-embedqa-e5-v5
), you can also addEMBEDDER_MODEL_NAME
to your.env
file to override the default. -
Update the
embedders
section of the config file as described above.Example OpenAI embedding configuration:
nim_embedder: _type: openai model_name: ${EMBEDDER_MODEL_NAME:-text-embedding-3-small} max_retries: 5
For OpenAI models, only a subset of parameters are supported. The full set of available parameters can be found in the config definitions here. Any non-supported parameters provided in the configuration will be ignored.
The current workflow uses FAISS to create the vector databases. Interested users can customize the source code to use other vector databases such as cuVS.
Currently, there are 3 types of outputs supported by the workflow:
- File output: The output data is written to a file in JSON format.
- HTTP output: The output data is posted to an HTTP endpoint.
- Print output: The output data is printed to the console.
To customize the output, modify the workflow configuration file accordingly. Locate the workflow
section to see the output destination used by the workflow. For example, in the configuration file configs/config.yml
, the following snippet from the functions
section defines the function that writes the workflow output as a single json file and individual markdown files per CVE-ID:
cve_file_output:
_type: cve_file_output
file_path: .tmp/output.json
markdown_dir: .tmp/vulnerability_markdown_reports
overwrite: True
The following snippet from the workflow
section then configures the workflow to use the above function for output:
workflow:
_type: cve_agent
...
cve_output_config_name: cve_file_output
To post the output to an HTTP endpoint, you can add the following to the functions
section of the config file, replacing the domain, port, and endpoint with the desired destination (note the trailing slash in the "url" field). The output will be sent as JSON data.
cve_http_output:
_type: cve_http_output
url: http://<domain>:<port>/
endpoint: "<endpoint>"
The workflow can then be updated to use the new function:
workflow:
_type: cve_agent
...
cve_output_config_name: cve_http_output
Additional output options will be added in the future.
Several common issues can arise when running the workflow. Here are some common issues and their solutions.
If you encounter issues with Git LFS, ensure that you have Git LFS installed and that it is enabled for the repository. You can check if Git LFS is enabled by running the following command:
git lfs install
Verifying that all files are being tracked by Git LFS can be done by running the following command:
git lfs ls-files
Files which are missing will show a -
next to their name. To ensure all LFS files have been pulled correctly, you can run the following command:
git lfs fetch --all
git lfs checkout *
When building containers for self-hosted NIMs, certain issues may occur. Below are common troubleshooting steps to help resolve them.
If you encounter an error resembling the following during the container build process for self-hosted NIMs:
nvidia-container-cli: device error: {n}: unknown device: unknown
This error typically indicates that the container is attempting to access GPUs that are either unavailable or non-existent on the host. To resolve this, verify the GPU count specified in the docker-compose.nim.yml configuration file:
- Navigate to the
deploy.resources.reservations.devices
section and check the count parameter. - Set the environment variable
NIM_LLM_GPU_COUNT
to the actual number of GPUs available on the host machine before building the container. Note that the default value is set to 4.
This adjustment ensures the container accurately matches the available GPU resources, preventing access errors during deployment.
If you encounter an error resembling the following during the container build process for self-hosted NIMs process:
1 error(s) decoding:
* error decoding 'Deploy.Resources.Reservations.devices[0]': invalid string value for 'count' (the only value allowed is 'all')
This is likely caused by an outdated Docker Compose version. Please upgrade Docker Compose to at least v2.21.0
.
Because the workflow makes such heavy use of the caching server to speed up API requests, it is important to ensure that the server is running correctly. If you encounter issues with the caching server, you can reset the cache.
To reset the entire cache, you can run the following command:
docker compose down -v
This will delete all the volumes associated with the containers, including the cache.
If you want to reset just the LLM cache or the services cache, you can run the following commands:
docker compose down
# To remove the LLM cache
docker volume rm ${COMPOSE_PROJECT_NAME:-vuln_analysis}_llm-cache
# To remove the services cache
docker volume rm ${COMPOSE_PROJECT_NAME:-vuln_analysis}_service-cache
We've integrated VDB and embedding creation directly into the workflow with caching included for expediency. However, in a production environment, it's better to use a separately managed VDB service.
NVIDIA offers optimized models and tools like NIMs (build.nvidia.com/explore/retrieval) and cuVS (github.com/rapidsai/cuvs).
These typically resolve on their own. Please wait and try running the workflow again later. Example errors:
404
Error requesting [1/10]: (Retry 0.1 sec) https://services.nvd.nist.gov/rest/json/cves/2.0: 404, message='', url=URL('https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2023-6709')
503
Error requesting [1/10]: (Retry 0.1 sec) https://services.nvd.nist.gov/rest/json/cves/2.0: 503, message='Service Unavailable', url=URL('https://services.nvd.nist.gov/rest/json/cves/2.0?cveId=CVE-2023-50447')
429 errors can occur when your requests exceed the rate limit for the model. Try setting the cve_agent_executor.max_concurrency
in the config.yml to a low value such as 5 to reduce the rate of requests.
Exception: [429] Too Many Requests
Authentication errors will occur if required API key(s) are invalid or have not been set as environment variables as described in Set up the environment file. For example, the following error will occur if NVIDIA_API_KEY
is not properly set:
Error: [401] Unauthorized
Authentication failed
Note that exporting the required environment variables in a container shell will not persist outside of that shell. Instead, we recommend shutting down the containers (docker compose down
), setting the required environment variables, and then starting the containers again.
If you run out of credits for the NVIDIA API Catalog, you will need to obtain more credits to continue using the API. Please contact your NVIDIA representative to get more credits added.
Test-driven development is essential for building reliable LLM-based agentic systems, especially when deploying or scaling them in production environments.
In our development process, we use the Morpheus public container as a case study. We perform security scans and collaborate with developers and security analysts to assess the exploitability of identified CVEs. Each CVE is labeled as either vulnerable or not vulnerable. For non-vulnerable CVEs, we provide a justification based on one of the ten VEX statuses. Team members document their investigative steps and findings to validate and compare results at different stages of the system.
We have collected labels for 38 CVEs, which serve several purposes:
- Human-generated checklists, findings, and summaries are used as ground truth during various stages of prompt engineering to refine LLM output.
- The justification status for each CVE is used as a label to measure end-to-end workflow accuracy. Every time there is a change to the system, such as adding a new agent tool, modifying a prompt, or introducing an engineering optimization, we run the labeled dataset through the updated workflow to detect performance regressions.
As a next step, we plan to integrate this process into our CI/CD pipeline to automate testing. While LLMs' non-deterministic nature makes it difficult to assert exact results for each test case, we can adopt a statistical approach, where we run the workflow multiple times and ensure that the average accuracy stays within an acceptable range.
We recommend that teams looking to test or optimize their CVE analysis system curate a similar dataset for testing and validation. Note that in test-driven development, it's important that the model has not achieved perfect accuracy on the test set, as this may indicate overfitting or that the set lacks sufficient complexity to expose areas for improvement. The test set should be representative of the problem space, covering both scenarios where the model performs well and where further refinement is needed. Investing in a robust dataset ensures long-term reliability and drives continued performance improvements.
Please consider citing our paper when using this code in a project. You can use the citation BibTeX:
@inproceedings{zemicheal2024llm,
title={LLM agents for vulnerability identification and verification of CVEs},
author={ZeMicheal, Tadesse and Chen, Hsin and Davis, Shawn and Allen, Rachel and Demoret, Michael and Song, Ashley},
booktitle={Proceedings of the Conference on Applied Machine Learning in Information Security (CAMLIS 2024)},
pages={161--173},
year={2024},
publisher={CEUR Workshop Proceedings},
volume={3920},
url={https://ceur-ws.org/Vol-3920/}
}
By using this software or microservice, you are agreeing to the terms and conditions of the license and acceptable use policy.
GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products; and use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement.
ADDITIONAL Terms: Meta Llama 3.1 Community License, Built with Meta Llama 3.1.