You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -30,13 +30,15 @@ This post will address those challenges. First, we will include the meta-data of
30
30
6. Create a glue database "imdb_stg". Create a glue crawler and set the database name to be "imdb_stg" . Start the glue crawler to crawl the S3 bucket KB-<ACCOUNT_ID>/input location. It should create 2 tables in Glue catalog.
31
31
If you use another database name instead of "imdb_stg", update the file "idmb_schema.jsonl" the field of "database_name" to the exact name of the new glue database.
32
32
7. Query the 2 tables via Athena to see that the data exists.
33
+
33
34
8. Create another folder in the S3 bucket KB-<ACCOUNT_ID> "/metadata".
34
35
- Upload the file "imdb_schema.jsonl" into the metadata folder.
35
36
9. From the Bedrock console,
36
37
- Create a datasource with name = 'knowledge-base-movie-details-data-source' , type = 'Amazon S3', pointing to the S3 foldercreated in step #8. Retain the 'Default chunking and parsing configuration'
37
38
- Sync the 'knowledge-base-movie-details-data-source'.
38
39
Anytime new database changes are applied, dont forget to upload the revised "imdb_schema.jsonl" file to the S3 folder created in step #8 and do a sync .
39
40
10. Run the jupyter notebook with the following caveats
41
+
- In the file of athena_execution.py replace 'ATHENA-OUTPUT-BUCKET' with the name of the bucket where Athena has actual write permissions to.
40
42
- In the step 2 of this process walkthru, if the values for the index name, vector field , metadata field value are different substitute the new values in the step "4.1 Update the variables" of the jupyter notebook.
41
43
- If you are running the jupyter notebook using [Amazon Sagemaker - option 1](https://studiolab.sagemaker.aws/) or [Amazon Sagemaker - option 2](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html) or VSCode , ensure the role or the user has the right set of permissions .
42
44
11. Continue with rest of the steps till Step 6 . At this stage, the process is ready to receive the query in natural language.
@@ -48,7 +50,7 @@ This post will address those challenges. First, we will include the meta-data of
48
50
17. [Correction loop, if applicable] The new prompt now adds the Athena’s response.
49
51
18. [Correction loop, if applicable] Create the corrected SQL and continue the process. This iteration can be performed multiple times.
50
52
19. Finally, execute SQL using Athena and generate output. Here, the output is presented to the user. For the sake of architectural simplicity, we did not show this step.
51
-
Since the # of records in the movie file are large and there is no athena partitioning , the queries can take upto 2 mins to execute. This can be optimized in many ways and its not described here.
53
+
Since the # of records in the title file are > 10M and there is no athena partitioning , the queries can take upto 1-2 mins to execute. This can be optimized in many ways and its not described here.
52
54
53
55
## Using the repo
54
56
Please start with [the notebook](https://github.com/aws-samples/text-to-sql-for-athena/blob/main/BedrockTextToSql_for_Athena.ipynb)
0 commit comments