Skip to content

Commit f076a24

Browse files
authored
Merge pull request #1832 from sinaptik-ai/docs/v2migratev3
Updated docs and added migration guide
2 parents 09a56cf + 5846f07 commit f076a24

26 files changed

+1427
-875
lines changed

‎README.md‎

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
[![Downloads](https://static.pepy.tech/badge/pandasai)](https://pepy.tech/project/pandasai) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
99
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing)
1010

11-
PandasAI is a Python platform that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.
11+
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.
1212

1313
# 🔧 Getting started
1414

@@ -28,13 +28,15 @@ You can install the PandasAI library using pip or poetry.
2828
With pip:
2929

3030
```bash
31-
pip install "pandasai>=3.0.0b2"
31+
pip install pandasai
32+
pip install pandasai-litellm
3233
```
3334

3435
With poetry:
3536

3637
```bash
37-
poetry add "pandasai>=3.0.0b2"
38+
poetry add pandasai
39+
poetry add pandasai-litellm
3840
```
3941

4042
### 💻 Usage
@@ -187,9 +189,6 @@ If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offeri
187189

188190
## Resources
189191

190-
> **Beta Notice**
191-
> Release v3 is currently in beta. The following documentation and examples reflect the features and functionality in progress and may change before the final release.
192-
193192
- [Docs](https://docs.pandas-ai.com/) for comprehensive documentation
194193
- [Examples](examples) for example notebooks
195194
- [Discord](https://discord.gg/KYKj9F2FRH) for discussion with the community and PandasAI team

‎docs/mint.json‎

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,28 +58,28 @@
5858
"version": "v3"
5959
},
6060
{
61-
"group": "Data layer",
62-
"pages": ["v3/semantic-layer", "v3/semantic-layer/new", "v3/semantic-layer/views", "v3/data-ingestion", "v3/transformations"],
61+
"group": "Natural Language",
62+
"pages": ["v3/overview-nl", "v3/large-language-models", "v3/chat-and-output"],
6363
"version": "v3"
6464
},
6565
{
66-
"group": "Natural Language",
67-
"pages": ["v3/overview-nl", "v3/large-language-models", "v3/chat-and-output"],
66+
"group": "Data layer",
67+
"pages": ["v3/semantic-layer/semantic-layer", "v3/semantic-layer/new", "v3/semantic-layer/data-ingestion"],
6868
"version": "v3"
6969
},
7070
{
7171
"group": "Advanced Usage",
72-
"pages": ["v3/agent", "v3/skills"],
72+
"pages": ["v3/agent", "v3/skills", "v3/semantic-layer/views","v3/semantic-layer/transformations"],
7373
"version": "v3"
7474
},
7575
{
76-
"group": "Backwards Compatibility",
77-
"pages": ["v3/migration-v2-to-v3", "v3/smart-dataframes", "v3/smart-datalakes"],
76+
"group": "PandasAI v2 to v3",
77+
"pages": ["v3/migration-guide", "v3/migration-backwards-compatibility", "v3/migration-troubleshooting"],
7878
"version": "v3"
7979
},
8080
{
8181
"group": "About",
82-
"pages": ["v3/contributing", "v3/license"],
82+
"pages": ["v3/contributing", "v3/license", "v3/enterprise-features"],
8383
"version": "v3"
8484
},
8585
{

‎docs/v2/connectors.mdx‎

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ sql_connector = SQLConnector(
147147
## Snowflake connector
148148

149149
The Snowflake connector allows you to connect to Snowflake. It is very similar to the SQL connectors, but it is tailored for Snowflake.
150-
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
150+
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com).
151151

152152
To use the Snowflake connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object:
153153

@@ -179,7 +179,7 @@ df.chat("How many records has status 'F'?")
179179
## DataBricks connector
180180

181181
The DataBricks connector allows you to connect to Databricks. It is very similar to the SQL connectors, but it is tailored for Databricks.
182-
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
182+
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com).
183183

184184
To use the DataBricks connector, you only need to import it into your Python code and pass it to a `Agent`, `SmartDataframe` or `SmartDatalake` object:
185185

@@ -206,7 +206,7 @@ databricks_connector = DatabricksConnector(
206206
## GoogleBigQuery connector
207207

208208
The GoogleBigQuery connector allows you to connect to GoogleBigQuery datasests. It is very similar to the SQL connectors, but it is tailored for Google BigQuery.
209-
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
209+
The usage of this connector in production is subject to a license ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)). If you plan to use it in production, [contact us](https://pandas-ai.com).
210210

211211
To use the GoogleBigQuery connector, you only need to import it into your Python code and pass it to a `Agent`, `SmartDataframe` or `SmartDatalake` object:
212212

‎docs/v2/intro.mdx‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ PandasAI is designed for data scientists, analysts, and engineers who want to in
2525

2626
## How to get started with PandasAI?
2727

28-
PandasAI is available as a Python library and a web-based platform. You can install the library using pip or poetry and use it in your Python code. You can also use the web-based platform to interact with your data in a more visual way.
28+
PandasAI is available as a Python library. You can install the library using pip or poetry and use it in your Python code.
2929

3030
### 📚 Using the library
3131

@@ -60,7 +60,7 @@ If you have any questions or need help, please join our **[discord server](https
6060

6161
PandasAI is available under the MIT expat license, except for the `pandasai/ee` directory, which has its [license here](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE) if applicable.
6262

63-
If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, [contact us](https://pandas-ai.com/pricing).
63+
If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, [contact us](https://pandas-ai.com).
6464

6565
## Analytics
6666

‎docs/v2/semantic-agent.mdx‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ description: "Enhance the PandasAI library with the Semantic Agent for more accu
88
The `SemanticAgent` (currently in beta) extends the capabilities of the PandasAI library by adding a semantic layer to its results. Unlike the standard `Agent`, the `SemanticAgent` generates a JSON query, which can then be used to produce Python or SQL code. This approach ensures more accurate and interpretable outputs.
99

1010
> **Note:** Usage of the Semantic Agent in production is subject to a license. For more details, refer to the [license documentation](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE).
11-
> If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
11+
> If you plan to use it in production, [contact us](https://pandas-ai.com).
1212
1313
## Instantiating the Semantic Agent
1414

‎docs/v2/train.mdx‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ You can train PandasAI to understand your data better and to improve its perform
88

99
If you want to train the model with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it:
1010
An enterprise license is required for using the vector stores locally, ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)).
11-
If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
11+
If you plan to use it in production, [contact us](https://pandas-ai.com).
1212

1313
```python
1414
from pandasai import Agent

‎docs/v3/agent.mdx‎

Lines changed: 103 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,80 @@
11
---
22
title: "Agent"
3-
description: "Add few-shot learning to your PandasAI agent"
3+
description: "Build multi-turn PandasAI agents with clarifications, explanations, query rephrasing, optional sandboxed execution, and enterprise training via local vector stores."
44
---
55

6-
<Note title="Beta Notice">
7-
Release v3 is currently in beta. This documentation reflects the features and
8-
functionality in progress and may change before the final release.
9-
</Note>
10-
11-
It is possible also to use PandasAI with a few-shot learning agent, thanks to the "train with local vector store" enterprise feature (requiring an enterprise license). The agent can also be used in a sandbox. This guide shows you both how to train the agent and how to use it in a sandbox.
6+
## PandasAI Agent Overview
127

13-
## Training the Agent with local Vector stores
8+
While the `pai.chat()` method is meant to be used in a single session and for exploratory data analysis, an agent can be used for multi-turn conversations.
149

15-
If you want to train the agent with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it:
16-
An enterprise license is required for using the vector stores locally, ([check it out](https://github.com/Sinaptik-AI/pandas-ai/blob/master/pandasai/ee/LICENSE)).
17-
If you plan to use it in production, [contact us](https://pandas-ai.com/pricing).
10+
To instantiate an agent, you can use the following code:
1811

1912
```python
13+
import os
2014
from pandasai import Agent
21-
from pandasai.ee.vectorstores import ChromaDB
22-
from pandasai.ee.vectorstores import Qdrant
23-
from pandasai.ee.vectorstores import Pinecone
24-
from pandasai.ee.vector_stores import LanceDB
15+
import pandas as pd
2516

26-
# Instantiate the vector store
27-
vector_store = ChromaDB()
28-
# or with Qdrant
29-
# vector_store = Qdrant()
30-
# or with LanceDB
31-
vector_store = LanceDB()
32-
# or with Pinecone
33-
# vector_store = Pinecone(
34-
# api_key="*****",
35-
# embedding_function=embedding_function,
36-
# dimensions=384, # dimension of your embedding model
37-
# )
17+
# Sample DataFrames
18+
sales_by_country = pd.DataFrame({
19+
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
20+
"sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000],
21+
"deals_opened": [142, 80, 70, 90, 60, 50, 40, 30, 110, 120],
22+
"deals_closed": [120, 70, 60, 80, 50, 40, 30, 20, 100, 110]
23+
})
3824

39-
# Instantiate the agent with the custom vector store
40-
agent = Agent("data.csv", vectorstore=vector_store)
25+
agent = Agent(sales_by_country)
26+
agent.chat('Which are the top 5 countries by sales?')
27+
# Output: China, United States, Japan, Germany, Australia
28+
```
4129

42-
# Train the model
43-
query = "What is the total sales for the current fiscal year?"
44-
# The following code is passed as a string to the response variable
45-
response = '\n'.join([
46-
'import pandas as pd',
47-
'',
48-
'df = dfs[0]',
49-
'',
50-
'# Calculate the total sales for the current fiscal year',
51-
'total_sales = df[df[\'date\'] >= pd.to_datetime(\'today\').replace(month=4, day=1)][\'sales\'].sum()',
52-
'result = { "type": "number", "value": total_sales }'
53-
])
30+
Contrary to the `pai.chat()` method, an agent will keep track of the state of the conversation and will be able to answer multi-turn conversations. For example:
5431

55-
agent.train(queries=[query], codes=[response])
32+
```python
33+
agent.chat('And which one has the most deals?')
34+
# Output: United States has the most deals
35+
```
36+
37+
### Clarification questions
38+
39+
An agent will also be able to ask clarification questions if it does not have enough information to answer the query. For example:
40+
41+
```python
42+
agent.clarification_questions('What is the GDP of the United States?')
43+
```
44+
45+
This will return up to 3 clarification questions that the agent can ask the user to get more information to answer the query.
46+
47+
### Explanation
48+
49+
An agent will also be able to explain the answer given to the user. For example:
50+
51+
```python
52+
response = agent.chat('What is the GDP of the United States?')
53+
explanation = agent.explain()
54+
55+
print("The answer is", response)
56+
print("The explanation is", explanation)
57+
```
58+
59+
### Rephrase Question
60+
61+
Rephrase question to get accurate and comprehensive response from the model. For example:
62+
63+
```python
64+
rephrased_query = agent.rephrase_query('What is the GDP of the United States?')
65+
66+
print("The rephrased query is", rephrased_query)
5667

57-
response = agent.chat("What is the total sales for the last fiscal year?")
58-
print(response)
59-
# The model will use the information provided in the training to generate a response
6068
```
6169

6270
## Using the Agent in a Sandbox Environment
6371

72+
<Note>
73+
The sandbox works offline and provides an additional layer of security for
74+
code execution. It's particularly useful when working with untrusted data or
75+
when you need to ensure that code execution is isolated from your main system.
76+
</Note>
77+
6478
To enhance security and protect against malicious code through prompt injection, PandasAI provides a sandbox environment for code execution. The sandbox runs your code in an isolated Docker container, ensuring that potentially harmful operations are contained.
6579

6680
### Installation
@@ -107,45 +121,57 @@ sandbox = DockerSandbox(
107121
)
108122
```
109123

124+
## Training the Agent with local Vector stores
125+
110126
<Note>
111-
The sandbox works offline and provides an additional layer of security for
112-
code execution. It's particularly useful when working with untrusted data or
113-
when you need to ensure that code execution is isolated from your main system.
127+
Training agents with local vector stores requires a PandasAI Enterprise license. See [Enterprise Features](/v3/enterprise-features) for more details or [contact us](https://pandas-ai.com/) for production use.
114128
</Note>
115129

130+
It is possible also to use PandasAI with a few-shot learning agent, thanks to the "train with local vector store" enterprise feature (requiring an enterprise license).
116131

117-
## Custom Head
118-
119-
In some cases, you might want to provide custom data samples to the conversational agent to improve its understanding and responses. For example, you might want to:
120-
121-
- Provide better examples that represent your data patterns
122-
- Avoid sharing sensitive information
123-
- Guide the agent with specific data scenarios
124-
125-
You can do this by passing a custom head to the agent:
132+
If you want to train the agent with a local vector store, you can use the local `ChromaDB`, `Qdrant` or `Pinecone` vector stores. Here's how to do it:
133+
An enterprise license is required for using the vector stores locally. See [Enterprise Features](/v3/enterprise-features) for licensing information.
134+
If you plan to use it in production, [contact us](https://pandas-ai.com).
126135

127136
```python
128-
import pandas as pd
129-
import pandasai as pai
137+
from pandasai import Agent
138+
from pandasai.ee.vectorstores import ChromaDB
139+
from pandasai.ee.vectorstores import Qdrant
140+
from pandasai.ee.vectorstores import Pinecone
141+
from pandasai.ee.vector_stores import LanceDB
130142

131-
# Your original dataframe
132-
df = pd.DataFrame({
133-
'sensitive_id': [1001, 1002, 1003, 1004, 1005],
134-
'amount': [150, 200, 300, 400, 500],
135-
'category': ['A', 'B', 'A', 'C', 'B']
136-
})
143+
# Instantiate the vector store
144+
vector_store = ChromaDB()
145+
# or with Qdrant
146+
# vector_store = Qdrant()
147+
# or with LanceDB
148+
vector_store = LanceDB()
149+
# or with Pinecone
150+
# vector_store = Pinecone(
151+
# api_key="*****",
152+
# embedding_function=embedding_function,
153+
# dimensions=384, # dimension of your embedding model
154+
# )
137155

138-
# Create a custom head with anonymized data
139-
head_df = pd.DataFrame({
140-
'sensitive_id': [1, 2, 3, 4, 5],
141-
'amount': [100, 200, 300, 400, 500],
142-
'category': ['A', 'B', 'C', 'A', 'B']
143-
})
156+
# Instantiate the agent with the custom vector store
157+
agent = Agent("data.csv", vectorstore=vector_store)
144158

145-
# Use the custom head
146-
smart_df = pai.SmartDataframe(df, config={
147-
"custom_head": head_df
148-
})
149-
```
159+
# Train the model
160+
query = "What is the total sales for the current fiscal year?"
161+
# The following code is passed as a string to the response variable
162+
response = '\n'.join([
163+
'import pandas as pd',
164+
'',
165+
'df = dfs[0]',
166+
'',
167+
'# Calculate the total sales for the current fiscal year',
168+
'total_sales = df[df[\'date\'] >= pd.to_datetime(\'today\').replace(month=4, day=1)][\'sales\'].sum()',
169+
'result = { "type": "number", "value": total_sales }'
170+
])
171+
172+
agent.train(queries=[query], codes=[response])
150173

151-
The agent will use your custom head instead of the default first 5 rows of the dataframe when analyzing and responding to queries.
174+
response = agent.chat("What is the total sales for the last fiscal year?")
175+
print(response)
176+
# The model will use the information provided in the training to generate a response
177+
```

‎docs/v3/chat-and-output.mdx‎

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,8 @@
11
---
2-
title: "Chat and output formats"
2+
title: "Chat and Output Formats"
33
description: "Learn how to use PandasAI's powerful chat functionality and the output formats for natural language data analysis"
44
---
55

6-
<Note title="Beta Notice">
7-
Release v3 is currently in beta. This documentation reflects the features and functionality in progress and may change before the final release.
8-
</Note>
9-
106
## Chat
117

128
The `.chat()` method is PandasAI's core feature that enables natural language interaction with your data. It allows you to:
@@ -19,7 +15,7 @@ The `.chat()` method is PandasAI's core feature that enables natural language in
1915
```python
2016
import pandasai as pai
2117

22-
df_customers = pai.load("company/customers")
18+
df_customers = pai.read_csv("customers.csv")
2319

2420
response = df_customers.chat("Which are our top 5 customers?")
2521
```
@@ -29,9 +25,9 @@ response = df_customers.chat("Which are our top 5 customers?")
2925
```python
3026
import pandasai as pai
3127

32-
df_customers = pai.load("company/customers")
33-
df_orders = pai.load("company/orders")
34-
df_products = pai.load("company/products")
28+
df_customers = pai.read_csv("customers.csv")
29+
df_orders = pai.read_csv("orders.csv")
30+
df_products = pai.read_csv("products.csv")
3531

3632
response = pai.chat('Who are our top 5 customers and what products do they buy most frequently?', df_customers, df_orders, df_products)
3733
```
@@ -64,7 +60,7 @@ Example:
6460
```python
6561
import pandasai as pai
6662

67-
df = pai.load("my-org/users")
63+
df = pai.read_csv("users.csv")
6864

6965
response = df.chat("Who is the user with the highest age?") # Returns a String response
7066
response = df.chat("How many users in total?") # Returns a Number response

0 commit comments

Comments
 (0)