|
1 | 1 | --- |
2 | 2 | title: "Agent" |
3 | | -description: "Add few-shot learning to your PandasAI agent" |
| 3 | +description: "Build multi-turn PandasAI agents with clarifications, explanations, query rephrasing, optional sandboxed execution, and enterprise training via local vector stores." |
4 | 4 | --- |
5 | 5 |
|
6 | | -<Note title="Beta Notice"> |
7 | | - Release v3 is currently in beta. This documentation reflects the features and |
8 | | - functionality in progress and may change before the final release. |
9 | | -</Note> |
10 | | - |
11 | | -It is possible also to use PandasAI with a few-shot learning agent, thanks to the "train with local vector store" enterprise feature (requiring an enterprise license). The agent can also be used in a sandbox. This guide shows you both how to train the agent and how to use it in a sandbox. |
| 6 | +## PandasAI Agent Overview |
12 | 7 |
|
13 | | -## Training the Agent with local Vector stores |
| 8 | +While the `pai.chat()` method is meant to be used in a single session and for exploratory data analysis, an agent can be used for multi-turn conversations. |
14 | 9 |
|
15 | | -If you want to train the agent with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it: |
16 | | -An enterprise license is required for using the vector stores locally, ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). |
17 | | -If you plan to use it in production, [contact us](https://pandas-ai.com/pricing). |
| 10 | +To instantiate an agent, you can use the following code: |
18 | 11 |
|
19 | 12 | ```python |
| 13 | +import os |
20 | 14 | from pandasai import Agent |
21 | | -from pandasai.ee.vectorstores import ChromaDB |
22 | | -from pandasai.ee.vectorstores import Qdrant |
23 | | -from pandasai.ee.vectorstores import Pinecone |
24 | | -from pandasai.ee.vector_stores import LanceDB |
| 15 | +import pandas as pd |
25 | 16 |
|
26 | | -# Instantiate the vector store |
27 | | -vector_store = ChromaDB() |
28 | | -# or with Qdrant |
29 | | -# vector_store = Qdrant() |
30 | | -# or with LanceDB |
31 | | -vector_store = LanceDB() |
32 | | -# or with Pinecone |
33 | | -# vector_store = Pinecone( |
34 | | -# api_key="*****", |
35 | | -# embedding_function=embedding_function, |
36 | | -# dimensions=384, # dimension of your embedding model |
37 | | -# ) |
| 17 | +# Sample DataFrames |
| 18 | +sales_by_country = pd.DataFrame({ |
| 19 | + "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], |
| 20 | + "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000], |
| 21 | + "deals_opened": [142, 80, 70, 90, 60, 50, 40, 30, 110, 120], |
| 22 | + "deals_closed": [120, 70, 60, 80, 50, 40, 30, 20, 100, 110] |
| 23 | +}) |
38 | 24 |
|
39 | | -# Instantiate the agent with the custom vector store |
40 | | -agent = Agent("data.csv", vectorstore=vector_store) |
| 25 | +agent = Agent(sales_by_country) |
| 26 | +agent.chat('Which are the top 5 countries by sales?') |
| 27 | +# Output: China, United States, Japan, Germany, Australia |
| 28 | +``` |
41 | 29 |
|
42 | | -# Train the model |
43 | | -query = "What is the total sales for the current fiscal year?" |
44 | | -# The following code is passed as a string to the response variable |
45 | | -response = '\n'.join([ |
46 | | - 'import pandas as pd', |
47 | | - '', |
48 | | - 'df = dfs[0]', |
49 | | - '', |
50 | | - '# Calculate the total sales for the current fiscal year', |
51 | | - 'total_sales = df[df[\'date\'] >= pd.to_datetime(\'today\').replace(month=4, day=1)][\'sales\'].sum()', |
52 | | - 'result = { "type": "number", "value": total_sales }' |
53 | | -]) |
| 30 | +Contrary to the `pai.chat()` method, an agent will keep track of the state of the conversation and will be able to answer multi-turn conversations. For example: |
54 | 31 |
|
55 | | -agent.train(queries=[query], codes=[response]) |
| 32 | +```python |
| 33 | +agent.chat('And which one has the most deals?') |
| 34 | +# Output: United States has the most deals |
| 35 | +``` |
| 36 | + |
| 37 | +### Clarification questions |
| 38 | + |
| 39 | +An agent will also be able to ask clarification questions if it does not have enough information to answer the query. For example: |
| 40 | + |
| 41 | +```python |
| 42 | +agent.clarification_questions('What is the GDP of the United States?') |
| 43 | +``` |
| 44 | + |
| 45 | +This will return up to 3 clarification questions that the agent can ask the user to get more information to answer the query. |
| 46 | + |
| 47 | +### Explanation |
| 48 | + |
| 49 | +An agent will also be able to explain the answer given to the user. For example: |
| 50 | + |
| 51 | +```python |
| 52 | +response = agent.chat('What is the GDP of the United States?') |
| 53 | +explanation = agent.explain() |
| 54 | + |
| 55 | +print("The answer is", response) |
| 56 | +print("The explanation is", explanation) |
| 57 | +``` |
| 58 | + |
| 59 | +### Rephrase Question |
| 60 | + |
| 61 | +Rephrase question to get accurate and comprehensive response from the model. For example: |
| 62 | + |
| 63 | +```python |
| 64 | +rephrased_query = agent.rephrase_query('What is the GDP of the United States?') |
| 65 | + |
| 66 | +print("The rephrased query is", rephrased_query) |
56 | 67 |
|
57 | | -response = agent.chat("What is the total sales for the last fiscal year?") |
58 | | -print(response) |
59 | | -# The model will use the information provided in the training to generate a response |
60 | 68 | ``` |
61 | 69 |
|
62 | 70 | ## Using the Agent in a Sandbox Environment |
63 | 71 |
|
| 72 | +<Note> |
| 73 | + The sandbox works offline and provides an additional layer of security for |
| 74 | + code execution. It's particularly useful when working with untrusted data or |
| 75 | + when you need to ensure that code execution is isolated from your main system. |
| 76 | +</Note> |
| 77 | + |
64 | 78 | To enhance security and protect against malicious code through prompt injection, PandasAI provides a sandbox environment for code execution. The sandbox runs your code in an isolated Docker container, ensuring that potentially harmful operations are contained. |
65 | 79 |
|
66 | 80 | ### Installation |
@@ -107,45 +121,57 @@ sandbox = DockerSandbox( |
107 | 121 | ) |
108 | 122 | ``` |
109 | 123 |
|
| 124 | +## Training the Agent with local Vector stores |
| 125 | + |
110 | 126 | <Note> |
111 | | - The sandbox works offline and provides an additional layer of security for |
112 | | - code execution. It's particularly useful when working with untrusted data or |
113 | | - when you need to ensure that code execution is isolated from your main system. |
| 127 | + Training agents with local vector stores requires a PandasAI Enterprise license. See [Enterprise Features](/v3/enterprise-features) for more details or [contact us](https://pandas-ai.com/) for production use. |
114 | 128 | </Note> |
115 | 129 |
|
| 130 | +It is possible also to use PandasAI with a few-shot learning agent, thanks to the "train with local vector store" enterprise feature (requiring an enterprise license). |
116 | 131 |
|
117 | | -## Custom Head |
118 | | - |
119 | | -In some cases, you might want to provide custom data samples to the conversational agent to improve its understanding and responses. For example, you might want to: |
120 | | - |
121 | | -- Provide better examples that represent your data patterns |
122 | | -- Avoid sharing sensitive information |
123 | | -- Guide the agent with specific data scenarios |
124 | | - |
125 | | -You can do this by passing a custom head to the agent: |
| 132 | +If you want to train the agent with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it: |
| 133 | +An enterprise license is required for using the vector stores locally. See [Enterprise Features](/v3/enterprise-features) for licensing information. |
| 134 | +If you plan to use it in production, [contact us](https://pandas-ai.com). |
126 | 135 |
|
127 | 136 | ```python |
128 | | -import pandas as pd |
129 | | -import pandasai as pai |
| 137 | +from pandasai import Agent |
| 138 | +from pandasai.ee.vectorstores import ChromaDB |
| 139 | +from pandasai.ee.vectorstores import Qdrant |
| 140 | +from pandasai.ee.vectorstores import Pinecone |
| 141 | +from pandasai.ee.vector_stores import LanceDB |
130 | 142 |
|
131 | | -# Your original dataframe |
132 | | -df = pd.DataFrame({ |
133 | | - 'sensitive_id': [1001, 1002, 1003, 1004, 1005], |
134 | | - 'amount': [150, 200, 300, 400, 500], |
135 | | - 'category': ['A', 'B', 'A', 'C', 'B'] |
136 | | -}) |
| 143 | +# Instantiate the vector store |
| 144 | +vector_store = ChromaDB() |
| 145 | +# or with Qdrant |
| 146 | +# vector_store = Qdrant() |
| 147 | +# or with LanceDB |
| 148 | +vector_store = LanceDB() |
| 149 | +# or with Pinecone |
| 150 | +# vector_store = Pinecone( |
| 151 | +# api_key="*****", |
| 152 | +# embedding_function=embedding_function, |
| 153 | +# dimensions=384, # dimension of your embedding model |
| 154 | +# ) |
137 | 155 |
|
138 | | -# Create a custom head with anonymized data |
139 | | -head_df = pd.DataFrame({ |
140 | | - 'sensitive_id': [1, 2, 3, 4, 5], |
141 | | - 'amount': [100, 200, 300, 400, 500], |
142 | | - 'category': ['A', 'B', 'C', 'A', 'B'] |
143 | | -}) |
| 156 | +# Instantiate the agent with the custom vector store |
| 157 | +agent = Agent("data.csv", vectorstore=vector_store) |
144 | 158 |
|
145 | | -# Use the custom head |
146 | | -smart_df = pai.SmartDataframe(df, config={ |
147 | | - "custom_head": head_df |
148 | | -}) |
149 | | -``` |
| 159 | +# Train the model |
| 160 | +query = "What is the total sales for the current fiscal year?" |
| 161 | +# The following code is passed as a string to the response variable |
| 162 | +response = '\n'.join([ |
| 163 | + 'import pandas as pd', |
| 164 | + '', |
| 165 | + 'df = dfs[0]', |
| 166 | + '', |
| 167 | + '# Calculate the total sales for the current fiscal year', |
| 168 | + 'total_sales = df[df[\'date\'] >= pd.to_datetime(\'today\').replace(month=4, day=1)][\'sales\'].sum()', |
| 169 | + 'result = { "type": "number", "value": total_sales }' |
| 170 | +]) |
| 171 | + |
| 172 | +agent.train(queries=[query], codes=[response]) |
150 | 173 |
|
151 | | -The agent will use your custom head instead of the default first 5 rows of the dataframe when analyzing and responding to queries. |
| 174 | +response = agent.chat("What is the total sales for the last fiscal year?") |
| 175 | +print(response) |
| 176 | +# The model will use the information provided in the training to generate a response |
| 177 | +``` |
0 commit comments