Skip to content

Conversation

@liqiongyu
Copy link
Contributor

@liqiongyu liqiongyu commented Dec 26, 2025

Summary

Adds first-class Excel (.xlsx/.xls) file ingestion for Knowledge by routing spreadsheets to the CSV reader and parsing workbooks per-sheet.

Rationale

Excel is treated as a spreadsheet/tabular source similar to CSV for Knowledge ingestion. CSVReader already provides row-oriented text extraction and integrates with existing chunking strategies (e.g. RowChunking), so routing .xlsx/.xls through it keeps behavior consistent and avoids introducing a new reader key / API surface. If we later need richer spreadsheet semantics (formulas, formatting, table region detection), we can extract a dedicated ExcelReader.

Changes

  • Parse .xlsx via openpyxl and .xls via xlrd in CSVReader (per-sheet Document with sheet metadata; rows become content lines).
  • Route .xlsx/.xls (+ common MIME types) to the csv reader in ReaderFactory.
  • Add openpyxl/xlrd to the existing agno[csv] extra.
  • Fix agno[memori] extra to depend on memori==3.0.5 (instead of memorisdk==3.0.5) so agno[tests] dependencies are resolvable in CI.

Tests

  • ./scripts/format.sh
  • ./scripts/validate.sh
  • pytest libs/agno/tests/unit/knowledge/test_excel_reader.py -q

Fixes #4872

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@liqiongyu
Copy link
Contributor Author

Follow-up: addressed whitespace preservation for Excel ingestion.

  • Removed the final .strip() in _row_values_to_csv_line() so leading/trailing spaces in the first/last cell are preserved when chunk=False (closer to the CSV path).
  • Added a unit test to cover this behavior.

Note: we still trim trailing empty cells to avoid producing long ", , ," tails when Excel sheets have a large max_column with mostly empty cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant