Skip to content

catio-tech/graphqa

Repository files navigation

GraphQA: Natural Language Graph Analysis Framework

Ask questions about any graph in natural language - no SQL, no graph query languages, just plain English

Python 3.8+ License LangChain

โš ๏ธ Software Disclaimer: GraphQA is provided "as is" without warranty of any kind. This is research/experimental software designed for graph analysis and exploration. Use at your own discretion and always validate results for production use cases.

๐Ÿ“‘ Table of Contents

๐ŸŽฏ What is GraphQA?

GraphQA is a comprehensive graph analysis framework that transforms complex graph datasets into natural language conversations. It's essentially a graph algorithm engine with an AI interface that understands what you're asking and automatically selects the right NetworkX algorithms to answer your questions.

๐Ÿงฎ The Graph Algorithm Problem

Traditional graph analysis requires:

  • Deep expertise in graph theory and algorithms
  • Programming skills to implement NetworkX code
  • Domain knowledge to know which algorithms apply when
  • Trial and error to find the right analysis approach

GraphQA solves this by providing an intelligent algorithm selection layer that:

  • Understands natural language questions about graph data
  • Automatically chooses appropriate graph algorithms
  • Executes analysis using proven NetworkX implementations
  • Presents results in human-readable format

๐Ÿ—๏ธ Core Architecture: NetworkX + AI

Foundation: GraphQA is built on NetworkX, the gold standard for graph analysis in Python. Your entire dataset lives in memory as a NetworkX MultiDiGraph, providing:

  • โšก Instant access to 500+ NetworkX algorithms
  • ๐Ÿ”ฌ Research-grade implementations (PageRank, Louvain, Dijkstra, etc.)
  • ๐Ÿ“Š Rich data structures supporting any graph topology
  • ๐Ÿ”„ Interactive analysis with sub-second response times
  • ๐Ÿงฎ Mathematical precision for complex computations

Intelligence Layer: AI agents intelligently orchestrate graph algorithms by:

  1. Schema Discovery: Embedding-based analysis of your graph structure
  2. Question Understanding: LangChain ReAct agents parse natural language
  3. Algorithm Selection: Smart routing to optimal NetworkX algorithms
  4. Execution: Running analysis with appropriate parameters
  5. Result Synthesis: Converting algorithm outputs to insights

๐ŸŽฏ Graph Analysis Made Accessible

Traditional NetworkX requires expertise:

# Manual algorithm selection and interpretation
communities = nx.community.greedy_modularity_communities(G)
centrality = nx.pagerank(G)
# ... complex result analysis ...

GraphQA makes it conversational:

๐Ÿค” Ask GraphQA: "Find communities and influential nodes"
๐Ÿ“Š Found 1,247 communities. Top nodes: B001, B002, B003...

๐Ÿง  Schema-Agnostic Intelligence

GraphQA makes zero assumptions about your data structure:

  • ๐Ÿ” Automatic Discovery: embeddings map attributes to semantic concepts
  • ๐Ÿค– Contextual Understanding: "expensive" automatically maps to price/cost attributes
  • ๐Ÿ“Š Algorithm Adaptation: Community detection parameters adjust to graph size/density

The Result: Whether you're analyzing protein interactions, supply chains, or social media networks, GraphQA provides the same powerful natural language interface backed by rigorous graph algorithms.

๐Ÿง  Graph Algorithm Intelligence

GraphQA's power comes from its intelligent algorithm selection system that automatically chooses and configures the right NetworkX algorithms for your questions. Here's what's under the hood:

๐Ÿ”ง The Five Universal Tools

GraphQA's ReAct agent uses exactly 5 specialized tools:

1. ๐Ÿ” Graph Explorer - Schema discovery and search operations

  • Embedding-based attribute discovery
  • Multi-attribute filtering and pattern matching
  • Sample data retrieval

2. ๐ŸŽฏ Graph Query - Targeted data retrieval and neighborhood analysis

  • Attribute-based node/edge filtering
  • Multi-hop neighborhood exploration (up to depth N)
  • Similarity searches and metric-based queries

3. ๐Ÿ“Š Graph Statistics - Network measurements and distributions

  • Centrality measures (PageRank, betweenness, degree)
  • Connectivity analysis and topology metrics
  • Attribute distributions and summary statistics

4. ๐Ÿ”ฌ Node Analyzer - Deep analysis of individual nodes

  • Node-specific centrality and importance metrics
  • Local neighborhood analysis and clustering
  • Similarity computation with other nodes

5. ๐Ÿง  Algorithm Selector - Intelligent algorithm selection

  • Community detection (Louvain, greedy modularity, label propagation)
  • Pathfinding algorithms (Dijkstra, shortest paths)
  • Network topology analysis and connectivity patterns

๐ŸŽ›๏ธ Intelligent Algorithm Selection

GraphQA doesn't just run algorithms randomly - it uses sophisticated decision logic:

Dataset-Aware Selection:

# GraphQA automatically chooses based on your graph characteristics:
if graph_size == "small" and preference == "comprehensive":
    use_betweenness_centrality()  # O(Vยณ) - thorough but slow
elif graph_size == "large" and preference == "fast":  
    use_degree_centrality()      # O(V) - fast approximation
else:
    use_pagerank()               # O(V+E) - balanced choice

Performance-Optimized Routing:

  • Small graphs (< 1K nodes): Use comprehensive algorithms (betweenness centrality, all-pairs shortest paths)
  • Medium graphs (1K-100K nodes): Balance quality vs speed (PageRank, greedy modularity)
  • Large graphs (100K+ nodes): Prioritize scalable algorithms (degree centrality, label propagation)

Context-Aware Configuration:

  • Community Detection: Louvain for undirected graphs, greedy modularity for directed
  • Centrality: Adapts to graph density and connectivity patterns
  • Pathfinding: Chooses between Dijkstra, A*, or bidirectional search

๐ŸŽฏ Question-to-Algorithm Mapping

Here's how natural language maps to specific graph algorithms:

Question Type Selected Algorithms NetworkX Functions
"Most connected nodes" Degree, PageRank centrality nx.degree_centrality(), nx.pagerank()
"Find communities" Louvain, greedy modularity nx.community.louvain_communities()
"Shortest path" Dijkstra, A* pathfinding nx.shortest_path(), nx.astar_path()
"Important nodes" Betweenness, closeness centrality nx.betweenness_centrality()
"Network structure" Clustering, connectivity analysis nx.average_clustering(), nx.connected_components()
"Similar products" Structural similarity, attribute matching nx.jaccard_coefficient(), custom similarity

๐Ÿ”ฌ Algorithm Quality & Validation

GraphQA uses research-grade implementations:

  • NetworkX algorithms: Peer-reviewed, mathematically validated
  • Performance tested: Benchmarked on real-world datasets
  • Parameter tuned: Automatically optimized for different graph types
  • Error handling: Graceful fallbacks for edge cases

Example Output Quality:

๐Ÿค” Ask GraphQA: "Find communities in this network"

๐Ÿง  Algorithm Selection:
   โ€ข Graph: 29,091 nodes, 60,168 edges (medium, directed)
   โ€ข Chosen: Greedy modularity optimization
   โ€ข Rationale: Best balance of quality vs performance for this size

๐Ÿ“Š Results:
   โ€ข Found 2,743 communities (modularity = 0.892)
   โ€ข Largest community: 150 nodes
   โ€ข Average community size: 10.6 nodes
   โ€ข Execution time: 49.8 seconds

This intelligent algorithm selection is what makes GraphQA special - it brings expert-level graph analysis to anyone who can ask a question in English.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+ (recommended: Python 3.10+)
  • OpenAI API Key for LLM functionality

Installation

Option 1: Interactive Setup (Recommended for New Users)

# Clone the repository
git clone https://github.com/catio-tech/graphqa.git
cd graphqa

# Run interactive setup - guides you through everything
python quickstart.py

Option 2: Automated Setup (For Experienced Users)

# Clone the repository
git clone https://github.com/catio-tech/graphqa.git
cd graphqa

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Automated setup
python quickstart.py --auto

# Or use the wrapper script
./setup

Option 3: Manual Install

# Clone and install manually
git clone https://github.com/catio-tech/graphqa.git
cd graphqa

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install GraphQA (core dependencies)
pip install -e .

# Or install with development tools
pip install -e ".[dev]"

# Or install with observability
pip install -e ".[observability]"

# Or install everything
pip install -e ".[all]"

First Run

Set your OpenAI API key:

Option 1 (Recommended): Use .env file

# Copy the example environment file
cp env.example .env

# Edit .env and add your API key
# OPENAI_API_KEY=your-api-key-here

Option 2: Set environment variable

export OPENAI_API_KEY="your-api-key-here"

Test the installation:

from graphqa import GraphQA

# Initialize GraphQA
agent = GraphQA(dataset_name="amazon")
print("โœ… GraphQA ready!")

โšก Key Technical Features

  • ๐Ÿ”— NetworkX Integration: Full compatibility with NetworkX ecosystem and algorithms
  • ๐Ÿง  Zero Configuration: Automatic schema discovery - no manual setup required
  • ๐Ÿ” Embedding-Based Search: 384-dimensional semantic vectors for intelligent attribute matching
  • ๐Ÿค– LangChain ReAct Agents: Multi-step reasoning with tool selection and execution
  • ๐Ÿ“Š Universal Graph Algorithms: Community detection, centrality, clustering, pathfinding
  • ๐Ÿ’พ In-Memory Performance: Sub-second analysis on graphs with 100K+ nodes
  • ๐Ÿ”ง Production Ready: Built-in observability, error handling, and memory management

๐Ÿ—ฃ๏ธ Interactive Chat

Start the Interactive Demo

The easiest way to use GraphQA is through the interactive chat interface:

# Start interactive GraphQA chat
python -m graphqa.demo

# Or with specific dataset
python -m graphqa.demo --dataset amazon

This launches an interactive session where you can:

  • Ask questions in natural language
  • Get immediate responses about your graph data
  • Explore data without writing code

Example Interactive Session:

๐Ÿ” GraphQA Interactive Demo - Amazon Products Dataset
๐Ÿ’ก Ask questions about your graph data in plain English!

๐Ÿค” Ask GraphQA: What are the highest rated products?

๐Ÿ“Š I found the highest rated products in the Amazon dataset...
[Results with top-rated products including ASINs like B0001234...]

๐Ÿค” Ask GraphQA: What categories do these products belong to?

๐Ÿ“‚ Based on the previous results, here are the categories...
[Category information for the highly rated products]

๐Ÿค” Ask GraphQA: Find products under $50 with good ratings

๐Ÿ›๏ธ I found products under $50 with high ratings...
[Budget-friendly alternatives with ratings above 4.0]

CLI Usage

# Quick interactive demo
python -m graphqa.cli

๐Ÿ“– Programmatic Usage

Basic Natural Language Interface

from graphqa import GraphQA

# Load any graph dataset into memory as NetworkX graph
agent = GraphQA(dataset_name="amazon")
agent.load_dataset()  # Loads into NetworkX MultiDiGraph

# Ask questions in natural language  
response = agent.ask("What are the most connected nodes?")
print(response)

# Follow-up questions use conversation context
response = agent.ask("Show me their categories")
print(response)

agent.shutdown()

Direct NetworkX Access

from graphqa import GraphQA

# Load dataset
agent = GraphQA(dataset_name="amazon")
agent.load_dataset()

# Access the underlying NetworkX graph directly
graph = agent.graph  # NetworkX MultiDiGraph object
print(f"Nodes: {graph.number_of_nodes()}")
print(f"Edges: {graph.number_of_edges()}")

# Use any NetworkX algorithms directly
import networkx as nx
centrality = nx.degree_centrality(graph)
communities = nx.community.greedy_modularity_communities(graph)

# Or let GraphQA's AI choose the right algorithms
response = agent.ask("Find the most central nodes and detect communities")
print(response)

Custom Data Loading

import networkx as nx
from graphqa import GraphQA

# Create your own NetworkX graph
graph = nx.MultiDiGraph()
graph.add_node("A", type="server", region="us-west-2")
graph.add_node("B", type="database", region="us-east-1")  
graph.add_edge("A", "B", relationship="connects_to", weight=1.0)

# Load into GraphQA for natural language analysis
agent = GraphQA()
agent.graph = graph
agent._discover_and_setup_schema()  # Auto-discover schema

# Now ask questions about your custom graph
response = agent.ask("What nodes are in the us-west-2 region?")
print(response)

๐Ÿ”ง Custom Data Loaders

GraphQA is designed to work with any graph dataset. You can create custom data loaders to import your specific data format into the GraphQA ecosystem.

๐Ÿ—๏ธ Data Loader Architecture

All data loaders inherit from BaseGraphLoader, which provides:

  • Automatic schema discovery using embedding-based analysis
  • Validation and error handling for robust data loading
  • Metadata generation for intelligent tool configuration
  • Universal interface that works with all GraphQA tools

๐Ÿ“– BaseGraphLoader Interface

from graphqa.loaders import BaseGraphLoader
import networkx as nx

class MyCustomLoader(BaseGraphLoader):
    """Custom loader for your specific data format"""
    
    def __init__(self, config: Dict[str, Any] = None):
        super().__init__(config)
        self.data_source = self.config.get('data_source', 'my_data.json')
    
    def load_graph(self) -> nx.MultiDiGraph:
        """
        REQUIRED: Load your data and return a NetworkX MultiDiGraph
        This is the core method you must implement
        """
        graph = nx.MultiDiGraph()
        
        # Your custom data loading logic here
        # Example: loading from JSON, CSV, database, API, etc.
        
        return graph
    
    def get_dataset_description(self) -> str:
        """REQUIRED: Describe what this dataset contains"""
        return "My custom dataset containing..."
    
    def get_sample_queries(self) -> List[str]:
        """REQUIRED: Provide example questions for this dataset"""
        return [
            "What are the most connected entities?",
            "Find clusters in the data",
            "Show me the network structure"
        ]
    
    # Optional: Override these methods for custom behavior
    def get_dataset_name(self) -> str:
        return "MyCustomDataset"
    
    def get_data_sources(self) -> List[str]:
        return [self.data_source]

๐ŸŽฏ Detailed Implementation Guide

Step 1: Set Up Your Loader Class

import json
import pandas as pd
import networkx as nx
from graphqa.loaders import BaseGraphLoader
from pathlib import Path

class SocialNetworkLoader(BaseGraphLoader):
    """Example: Load social network data from CSV files"""
    
    def __init__(self, config: Dict[str, Any] = None):
        super().__init__(config)
        
        # Configure data sources
        self.users_file = self.config.get('users_file', 'users.csv')
        self.connections_file = self.config.get('connections_file', 'connections.csv')
        self.max_users = self.config.get('max_users', None)
        
        self.logger.info(f"Social network loader initialized")

Step 2: Implement the Core Loading Logic

    def load_graph(self) -> nx.MultiDiGraph:
        """Load social network data into NetworkX graph"""
        graph = nx.MultiDiGraph()
        
        try:
            # Load user data (nodes)
            users_df = pd.read_csv(self.users_file)
            if self.max_users:
                users_df = users_df.head(self.max_users)
            
            # Add nodes with attributes
            for _, user in users_df.iterrows():
                graph.add_node(
                    user['user_id'],
                    name=user.get('name', ''),
                    age=user.get('age', 0),
                    location=user.get('location', ''),
                    followers=user.get('followers', 0),
                    # Add any other attributes from your data
                )
            
            # Load connections (edges)
            connections_df = pd.read_csv(self.connections_file)
            
            for _, conn in connections_df.iterrows():
                source = conn['user_id']
                target = conn['friend_id']
                
                # Only add edge if both nodes exist
                if graph.has_node(source) and graph.has_node(target):
                    graph.add_edge(
                        source, target,
                        relationship_type='friendship',
                        weight=conn.get('strength', 1.0),
                        created_date=conn.get('date', '')
                    )
            
            self.logger.info(f"Loaded {graph.number_of_nodes()} users, {graph.number_of_edges()} connections")
            return graph
            
        except Exception as e:
            raise LoaderError(f"Failed to load social network data: {e}")

Step 3: Provide Metadata and Examples

    def get_dataset_description(self) -> str:
        return (
            "Social Network dataset containing user profiles and friendship connections. "
            "Includes user demographics, follower counts, and relationship data for "
            "analyzing social influence and community structures."
        )
    
    def get_sample_queries(self) -> List[str]:
        return [
            "Who are the most popular users?",
            "Find friend groups and communities",
            "What's the average number of connections?",
            "Show me users in the same location",
            "Which users have the most influence?",
            "Find the shortest path between two users",
            "Analyze age distribution in the network",
            "Identify users with unusual connection patterns"
        ]
    
    def get_dataset_name(self) -> str:
        return "Social Network"

๐Ÿ”ง Advanced Data Loader Features

Handle Multiple Data Sources

class MultiSourceLoader(BaseGraphLoader):
    """Example: Combine data from multiple sources"""
    
    def load_graph(self) -> nx.MultiDiGraph:
        graph = nx.MultiDiGraph()
        
        # Load from database
        graph = self._load_from_database(graph)
        
        # Enrich with API data
        graph = self._enrich_with_api(graph)
        
        # Add computed features
        graph = self._add_computed_features(graph)
        
        return graph
    
    def _load_from_database(self, graph):
        # Your database loading logic
        return graph
    
    def _enrich_with_api(self, graph):
        # API enrichment logic
        return graph
    
    def _add_computed_features(self, graph):
        # Add computed node/edge attributes
        for node in graph.nodes():
            graph.nodes[node]['degree'] = graph.degree(node)
        return graph

Data Validation and Error Handling

    def load_graph(self) -> nx.MultiDiGraph:
        graph = nx.MultiDiGraph()
        
        # Validate data sources exist
        for source in self.get_data_sources():
            if not Path(source).exists():
                raise LoaderError(f"Data source not found: {source}")
        
        try:
            # Load with progress tracking
            total_steps = 3
            
            self.logger.info("Step 1/3: Loading nodes...")
            self._load_nodes(graph)
            
            self.logger.info("Step 2/3: Loading edges...")
            self._load_edges(graph)
            
            self.logger.info("Step 3/3: Validating graph...")
            self._validate_graph(graph)
            
            return graph
            
        except Exception as e:
            self.logger.error(f"Data loading failed: {e}")
            raise LoaderError(f"Cannot load dataset: {e}")
    
    def _validate_graph(self, graph):
        """Custom validation logic"""
        if graph.number_of_nodes() == 0:
            raise LoaderError("No nodes loaded - check your data")
        
        # Add your specific validation rules
        isolated_nodes = [n for n in graph.nodes() if graph.degree(n) == 0]
        if len(isolated_nodes) > graph.number_of_nodes() * 0.5:
            self.logger.warning(f"Many isolated nodes: {len(isolated_nodes)}")

๐Ÿš€ Using Your Custom Loader

Option 1: Direct Usage

from graphqa import GraphQA
from my_loaders import SocialNetworkLoader

# Create custom config
config = {
    'users_file': 'my_social_data/users.csv',
    'connections_file': 'my_social_data/connections.csv',
    'max_users': 10000  # Limit for testing
}

# Initialize GraphQA with custom loader
loader = SocialNetworkLoader(config)
agent = GraphQA()
agent.loader = loader
agent.load_dataset()

# Now use natural language on your data!
response = agent.ask("Find the most influential users in the network")
print(response)

Option 2: Register Your Loader

# Register your loader with GraphQA
from graphqa.loaders import register_loader

register_loader("social_network", SocialNetworkLoader)

# Now use it like built-in loaders
agent = GraphQA(dataset_name="social_network", config=config)
agent.load_dataset()

๐Ÿ“‹ Data Format Requirements

Your loader can handle any data format, but the output must be:

โœ… Required:

  • nx.MultiDiGraph object (supports any graph topology)
  • Nodes with string IDs (can be any string: "user123", "product_B001", etc.)
  • Basic error handling for missing files/invalid data

๐ŸŽฏ Recommended:

  • Node attributes: Add meaningful properties (name, category, price, etc.)
  • Edge attributes: Include relationship types, weights, timestamps
  • Data validation: Check for duplicates, missing values, format issues
  • Progress logging: Use self.logger.info() for status updates
  • Memory management: Handle large datasets with chunking/streaming

๐Ÿ”ง Best Practices:

  • Attribute naming: Use descriptive names (product_title vs title)
  • Data types: Keep consistent types (all prices as float, all dates as strings)
  • Missing data: Handle nulls gracefully (empty string vs None vs 0)
  • Performance: Use pandas/numpy for large data processing
  • Documentation: Provide clear dataset descriptions and sample queries

๐Ÿ’ก Example Loaders

GraphQA includes reference implementations you can learn from:

  • AmazonProductLoader: JSON files, product relationships, review data

Check src/graphqa/loaders/ for complete working examples!

๐Ÿ”‹ Memory & Performance

In-Memory Processing Benefits

GraphQA loads your entire graph into system memory as a NetworkX object, providing:

  • โšก Sub-second queries: No database round-trips or network latency
  • ๐Ÿงฎ Rich algorithms: Full NetworkX algorithm library available
  • ๐Ÿ”„ Interactive analysis: Immediate follow-up questions and exploration
  • ๐Ÿ“Š Complex analytics: Multi-step analysis without data movement

Memory Requirements

Dataset Size Memory Usage Load Time Recommended For
Small (< 1K nodes) < 50MB < 5s Development, testing
Medium (1K-10K nodes) 50-200MB 5-30s Demos, prototypes
Large (10K-100K nodes) 200MB-2GB 30s-5min Research, production
Very Large (100K+ nodes) 2GB+ 5+ min High-memory servers

Performance Characteristics

  • Amazon Dataset (Full): ~200K products, ~1M reviews โ†’ 2GB RAM, 2-5min load
  • Amazon Dataset (Demo): ~5K products, ~10K reviews โ†’ 200MB RAM, 10-30s load
  • Query Performance: Most questions answered in < 5 seconds
  • Memory Management: Automatic cleanup, configurable limits

Scaling Considerations

GraphQA is optimized for interactive analysis rather than big data processing:

โœ… Perfect for: Research, prototyping, dashboard analytics, graph exploration
โœ… Good for: Production analytics on medium-large graphs (< 1M nodes)
โš ๏ธ Consider alternatives for: Massive graphs (> 10M nodes), streaming data, production OLTP

For larger datasets, consider:

  • Sampling strategies (built-in test modes)
  • Graph databases (Neo4j, Amazon Neptune) for storage + GraphQA for analysis
  • Distributed processing (Apache Spark GraphX) for preprocessing

โš™๏ธ Configuration

Default Settings (Optimized for Demos)

GraphQA ships with test mode enabled by default to ensure fast loading and prevent memory issues:

  • Amazon dataset: Loads 5,000 products (vs 200,000+ full dataset)
  • Loading time: 10-30 seconds (vs 2-5 minutes for full dataset)
  • Memory usage: 100-200MB (vs 1-2GB for full dataset)

Configuration File

GraphQA automatically loads config.yaml from the current directory. Edit this file to customize behavior:

datasets:
  amazon_products:
    config:
      # DEMO MODE (default - fast & safe)
      test_mode: true
      max_products: 5000
      max_reviews: 10000
      
      # FULL DATASET (requires more resources)
      # test_mode: false
      # max_products: null
      # max_reviews: null

Configuration Templates

Choose the right template for your use case:

# For demos and quick testing (recommended)
cp config-templates/demo.yaml config.yaml

# For research and full analysis
cp config-templates/full-analysis.yaml config.yaml
Template Products Load Time Memory Use Case
demo.yaml 5,000 10-30s 200MB Demos, tutorials, quick tests
full-analysis.yaml 200,000+ 2-5min 2GB Research, production analysis

Custom Configuration

from graphqa import GraphQA

# Option 1: Use custom config file
agent = GraphQA(dataset_name="amazon", config_file="my_config.yaml")

# Option 2: Programmatic configuration
from graphqa.config import UniversalRetrieverConfig
config = UniversalRetrieverConfig()
config.datasets['amazon_products'].config['test_mode'] = False  # Full dataset
agent = GraphQA(dataset_name="amazon", config=config)

agent.load_dataset()

๐Ÿ› ๏ธ Troubleshooting

Common Issues

Import Error: "No module named 'graphqa'"

# Make sure you're in the virtual environment
source venv/bin/activate
pip install -e .

OpenAI API Error

# Option 1: Check your .env file
cat .env  # Should contain: OPENAI_API_KEY=your-key-here

# Option 2: Set environment variable
export OPENAI_API_KEY="your-key-here"

Memory Issues with Large Graphs

# Option 1: Edit config.yaml to reduce limits
# datasets.amazon_products.config.max_products: 2000

# Option 2: Use smaller test mode
python -c "
from graphqa import GraphQA
from graphqa.config import UniversalRetrieverConfig
config = UniversalRetrieverConfig()
config.datasets['amazon_products'].config['max_products'] = 1000
agent = GraphQA(dataset_name='amazon', config=config)
"

Slow Loading Performance

# Check your current settings
grep -A 5 "test_mode" config.yaml

# For faster loading, ensure test mode is enabled
# test_mode: true (default)
# max_products: 5000 (vs 200,000+ full)

Missing Dependencies

# Install optional dependencies
pip install langfuse  # For observability

Environment Variables Not Loading

# Check if .env file exists and has correct format
cat .env
# Should contain: OPENAI_API_KEY=your-actual-key-here

# Reload .env in current session
python -c "from dotenv import load_dotenv; load_dotenv(override=True); import os; print('Loaded:', bool(os.getenv('OPENAI_API_KEY')))"

Performance Issues / Out of Memory

# Switch to demo mode (lighter load)
cp config-templates/demo.yaml config.yaml

# Or manually reduce limits in config.yaml
# test_mode: true
# max_products: 1000  # Even smaller

Getting Help

  1. Check the logs: GraphQA provides detailed logging
  2. Try the examples: Run files in examples/ directory
  3. Open an issue: GitHub Issues

๐Ÿ“š Documentation

๐Ÿค Contributing

# Development setup
git clone https://github.com/catio-tech/graphqa.git
cd graphqa

# Interactive setup (recommended)
python quickstart.py

# Or manual setup
python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"  # Installs dev dependencies from pyproject.toml

# Run tests
pytest

# Format code
black src/
isort src/

๐Ÿ“ Project Files Explained

Setup Files

File Purpose When to Use
quickstart.py Interactive guided setup with options New users, need help with API keys/config
setup Automated setup script CI/CD, experienced users
setup.py Python package definition (legacy) Used by pip (don't run directly)

Dependencies

File Purpose Usage
pyproject.toml Modern Python dependencies & project config pip install -e ".[dev]"

Why no requirements.txt? We use pyproject.toml (modern Python standard) instead of legacy requirements.txt files. This eliminates redundancy and follows current best practices.

๐Ÿ“„ License

Apache License 2.0. See LICENSE for details.


Need help? Open an issue or check our troubleshooting guide.

About

GraphQA: Natural Language Graph Analysis Framework - Ask questions about any graph in natural language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages