Skip to content

[REQUEST] Kotaemon MCP Server #792

@PATAKAMURIVENKATAGANESH

Description

@PATAKAMURIVENKATAGANESH

Feature Request: Model Context Protocol (MCP) Server Implementation

Summary

This feature request proposes implementing a comprehensive Model Context Protocol (MCP) server for kotaemon that exposes its powerful RAG capabilities as MCP tools. This would allow AI assistants and applications to leverage kotaemon's document processing, question answering, and knowledge management features through a standardized protocol.

Technical Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   MCP Client    │────│   MCP Server    │────│   Kotaemon      │
│  (Claude, etc.) │    │                 │    │   Core Library  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                       ┌─────────────────┐
                       │   Tool Layer    │
                       │ • Document Mgmt │
                       │ • Question QA   │
                       │ • GraphRAG      │
                       │ • Config Mgmt   │
                       └─────────────────┘

Proposed Implementation

Core Features

🔧 Document Management Tools

  • index_documents - Index various document types with advanced options
  • retrieve_documents - Hybrid search with vector and keyword capabilities
  • list_files - Manage files within collections
  • delete_file - Remove specific documents

🤖 Question Answering Tools

  • answer_question - RAG-based QA with multiple reasoning strategies
  • graphrag_query - Advanced GraphRAG queries for complex reasoning
  • Support for simple, decomposed, ReAct, and ReWOO reasoning modes

📁 Collection Management Tools

  • create_collection - Create and configure document collections
  • list_collections - View all available collections with statistics
  • get_collection_info - Detailed collection information
  • delete_collection - Remove collections safely
  • export_collection - Export to JSON, CSV, Markdown, or PDF

🕸️ GraphRAG Tools

  • graphrag_query - Local, global, and community-based GraphRAG queries
  • analyze_document_graph - Knowledge graph structure analysis
  • Support for Microsoft GraphRAG, NanoGraphRAG, and LightRAG

⚙️ Configuration Management Tools

  • configure_llm - Set up LLM providers (OpenAI, Anthropic, Google, etc.)
  • configure_embeddings - Configure embedding models
  • list_available_models - View supported models and providers

🔍 System Management Tools

  • get_server_status - Comprehensive server status and configuration
  • health_check - System health validation
  • Runtime model switching and configuration updates

Example Usage

Document Indexing

{
  "tool": "index_documents",
  "arguments": {
    "file_paths": ["/path/to/document.pdf", "/path/to/folder/"],
    "collection_name": "research_papers",
    "index_type": "graphrag",
    "chunk_size": 1000,
    "chunk_overlap": 200
  }
}

Question Answering

{
  "tool": "answer_question",
  "arguments": {
    "question": "What are the key findings in the research papers?",
    "collection_name": "research_papers", 
    "reasoning_type": "graphrag",
    "include_sources": true,
    "language": "en"
  }
}

GraphRAG Analysis

{
  "tool": "graphrag_query",
  "arguments": {
    "query": "How do machine learning and neural networks relate?",
    "collection_name": "research_papers",
    "query_type": "global",
    "community_level": 2
  }
}

Implementation Details

File Structure

mcp_server/
├── __init__.py
├── server.py                 # Basic MCP server implementation
├── enhanced_server.py        # Full-featured MCP server
├── requirements.txt          # Python dependencies
├── pyproject.toml           # Package configuration
├── README.md                # Comprehensive documentation
├── .env.example             # Environment configuration template
└── examples/
    ├── client_demo.py       # Usage demonstrations
    ├── test_server.py       # Test suite
    ├── claude_config_helper.py  # Configuration helper
    └── README.md            # Examples documentation

Key Components

  1. MCP Server Core (enhanced_server.py)

    • Implements MCP protocol
    • Tool registration and handling
    • Error handling and validation
    • Async support for non-blocking operations
  2. Kotaemon Integration

    • Imports kotaemon libraries
    • Initializes reasoning pipelines
    • Manages index and retrieval systems
    • Handles configuration management
  3. Tool Implementations

    • Document indexing with multiple backends
    • Question answering with various strategies
    • GraphRAG integration and analysis
    • Collection and file management
    • System configuration and monitoring

Configuration

Environment Variables

  • LLM provider configurations (OpenAI, Anthropic, etc.)
  • Local model support (Ollama)
  • GraphRAG feature flags
  • Storage backend selection
  • Security and performance settings

Client Configuration (Claude Desktop)

{
  "mcpServers": {
    "kotaemon": {
      "command": "python",
      "args": ["/path/to/kotaemon/mcp_server/enhanced_server.py"],
      "env": {
        "KOTAEMON_DATA_DIR": "/path/to/kotaemon/ktem_app_data",
        "OPENAI_API_KEY": "your_api_key"
      }
    }
  }
}

Benefits

For End Users

  • AI Assistant Integration: Use kotaemon features directly through Claude Desktop or other MCP clients
  • Standardized Interface: Familiar MCP tool paradigm
  • Rich Functionality: Access to advanced RAG, GraphRAG, and document management

For Developers

  • Programmatic Access: Clean API for building applications
  • Language Agnostic: Any language with MCP client support
  • Extensible: Easy to add custom tools and functionality
  • Well Documented: Comprehensive examples and documentation

For the Kotaemon Ecosystem

  • Broader Adoption: Reach new user bases through MCP clients
  • Integration Opportunities: Enable kotaemon in diverse workflows
  • Community Growth: Attract developers to the MCP ecosystem
  • Standards Compliance: Align with emerging AI tool standards

Implementation Plan

Phase 1: Core Infrastructure ✅

  • Basic MCP server structure
  • Tool registration framework
  • Kotaemon integration layer
  • Error handling and validation

Phase 2: Essential Tools ✅

  • Document indexing tools
  • Question answering tools
  • Basic collection management
  • Configuration management

Phase 3: Advanced Features ✅

  • GraphRAG integration
  • Advanced search and retrieval
  • System monitoring tools
  • Export capabilities

Phase 4: Documentation & Examples ✅

  • Comprehensive README
  • Example scripts and demonstrations
  • Configuration helpers
  • Test suites

Phase 5: Testing & Refinement

  • Integration testing with real MCP clients
  • Performance optimization
  • Security review
  • Community feedback incorporation

Testing Strategy

Automated Testing

  • Unit tests for individual tools
  • Integration tests with mock kotaemon components
  • Performance tests with larger datasets
  • Error handling validation

Manual Testing

  • Claude Desktop integration
  • VS Code MCP extension compatibility
  • Real-world document processing workflows
  • GraphRAG analysis scenarios

Example Test Script

# Comprehensive test suite included
python mcp_server/examples/test_server.py

# Configuration validation
python mcp_server/examples/claude_config_helper.py

# Client demonstrations  
python mcp_server/examples/client_demo.py

Future Enhancements

Advanced Features

  • Streaming Responses: Real-time processing feedback
  • Batch Operations: Efficient bulk document processing
  • Custom Tool Framework: Easy addition of domain-specific tools
  • Multi-User Support: User isolation and permission management

Integration Opportunities

  • Web Search Integration: Combine with web search tools
  • Database Connectors: Direct database query capabilities
  • API Integrations: Connect with external services
  • Workflow Automation: Multi-step document processing

Performance Optimizations

  • Caching Layer: Intelligent result caching
  • Parallel Processing: Concurrent document indexing
  • Resource Management: Memory and CPU optimization
  • Load Balancing: Horizontal scaling support

Conclusion

This MCP server implementation would significantly expand kotaemon's reach and usability by providing standardized programmatic access to its powerful RAG capabilities. The proposed implementation is comprehensive, well-documented, and designed for both immediate usability and future extensibility.

The implementation is already complete and ready for integration into the main kotaemon repository.

Files Included

The complete implementation includes:

  • Core Server: mcp_server/enhanced_server.py (600+ lines)
  • Documentation: mcp_server/README.md (comprehensive guide)
  • Dependencies: mcp_server/requirements.txt & pyproject.toml
  • Examples: Multiple demonstration and test scripts
  • Configuration: Environment templates and helpers
  • Package Structure: Proper Python package setup

All files are production-ready and follow best practices for MCP server development.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions