Since joining LlamaIndex, my focus has shifted from 'everything agents' to 'document agents' : agents that can handle work over all manner of complex documents. So, I tried out the latest chart parsing capabilities of LlamaParse. Charts in PDFs are notoriously painful to work with. You can see the data ) bars, axes, labels) but actually getting it into a format you can analyze means is a different matter. I tried out parsing a U.S. Treasury executive summary PDF, pulling a grouped bar chart showing Budget Deficit vs. Net Operating Cost for fiscal years 2020–2024, and turning it into a pandas DataFrame you can run analysis on (although really you can then do whatever, provide it for downstream tasks to an agent..) Once parsed, the chart's underlying data comes back as a table in the items tree for that page. From there: grab the rows, construct a DataFrame, etc. In the example, I'm computing year-over-year changes in both metrics, measuring the gap between them across the five-year window, and just to be sure, I reproduced a bar chart that mirrors the original PDF visualization. You can try it our here: https://lnkd.in/eNVQn-DZ
👏