Skip to content

Commit 3720614

Browse files
added some extra logs to help folks understand the flow
1 parent cb1b487 commit 3720614

9 files changed

+384
-115
lines changed

‎BedrockTextToSql_for_Athena.ipynb

Lines changed: 374 additions & 106 deletions
Large diffs are not rendered by default.

‎README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,15 @@ This post will address those challenges. First, we will include the meta-data of
3030
6. Create a glue database "imdb_stg". Create a glue crawler and set the database name to be "imdb_stg" . Start the glue crawler to crawl the S3 bucket KB-<ACCOUNT_ID>/input location. It should create 2 tables in Glue catalog.
3131
If you use another database name instead of "imdb_stg", update the file "idmb_schema.jsonl" the field of "database_name" to the exact name of the new glue database.
3232
7. Query the 2 tables via Athena to see that the data exists.
33+
3334
8. Create another folder in the S3 bucket KB-<ACCOUNT_ID> "/metadata".
3435
- Upload the file "imdb_schema.jsonl" into the metadata folder.
3536
9. From the Bedrock console,
3637
- Create a datasource with name = 'knowledge-base-movie-details-data-source' , type = 'Amazon S3', pointing to the S3 foldercreated in step #8. Retain the 'Default chunking and parsing configuration'
3738
- Sync the 'knowledge-base-movie-details-data-source'.
3839
Anytime new database changes are applied, dont forget to upload the revised "imdb_schema.jsonl" file to the S3 folder created in step #8 and do a sync .
3940
10. Run the jupyter notebook with the following caveats
41+
- In the file of athena_execution.py replace 'ATHENA-OUTPUT-BUCKET' with the name of the bucket where Athena has actual write permissions to.
4042
- In the step 2 of this process walkthru, if the values for the index name, vector field , metadata field value are different substitute the new values in the step "4.1 Update the variables" of the jupyter notebook.
4143
- If you are running the jupyter notebook using [Amazon Sagemaker - option 1](https://studiolab.sagemaker.aws/) or [Amazon Sagemaker - option 2](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html) or VSCode , ensure the role or the user has the right set of permissions .
4244
11. Continue with rest of the steps till Step 6 . At this stage, the process is ready to receive the query in natural language.
@@ -48,7 +50,7 @@ This post will address those challenges. First, we will include the meta-data of
4850
17. [Correction loop, if applicable] The new prompt now adds the Athena’s response.
4951
18. [Correction loop, if applicable] Create the corrected SQL and continue the process. This iteration can be performed multiple times.
5052
19. Finally, execute SQL using Athena and generate output. Here, the output is presented to the user. For the sake of architectural simplicity, we did not show this step.
51-
Since the # of records in the movie file are large and there is no athena partitioning , the queries can take upto 2 mins to execute. This can be optimized in many ways and its not described here.
53+
Since the # of records in the title file are > 10M and there is no athena partitioning , the queries can take upto 1-2 mins to execute. This can be optimized in many ways and its not described here.
5254

5355
## Using the repo
5456
Please start with [the notebook](https://github.com/aws-samples/text-to-sql-for-athena/blob/main/BedrockTextToSql_for_Athena.ipynb)
4.09 KB
Binary file not shown.
2.64 KB
Binary file not shown.
1.24 KB
Binary file not shown.
Binary file not shown.

‎athena_execution.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
class AthenaQueryExecute:
1717
def __init__(self):
18-
self.glue_databucket_name='vishal-bucket103'
18+
self.glue_databucket_name='ATHENA-OUTPUT-BUCKET'
1919
self.athena_client=Clientmodules.createAthenaClient()
2020
self.s3_client=Clientmodules.createS3Client()
2121

@@ -47,7 +47,7 @@ def execute_query(self, query_string):
4747
return df
4848

4949
def syntax_checker(self,query_string):
50-
print("Inside yntax_checker", query_string)
50+
print("Inside syntax_checker", query_string)
5151
query_result_folder='athena_query_output/'
5252
query_config = {"OutputLocation": f"s3://{self.glue_databucket_name}/{query_result_folder}"}
5353
query_execution_context = {

‎llm_basemodel.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ def __init__(self,client):
1111
# Anthropic Claude
1212
# Bedrock LLM
1313
inference_modifier = {
14-
"max_tokens_to_sample": 3000,
14+
### "max_tokens_to_sample": 3000,
1515
"temperature": 0,
1616
"top_k": 20,
1717
"top_p": 1,

‎openSearchVCEmbedding.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -186,23 +186,22 @@ def get_data(self,metadata):
186186
def main():
187187
print('main() executed')
188188
index_name1 = 'bedrock-knowledge-base-default-index'
189-
##index_name1 = 'bedrock-knowledge-base-zttfoy'
190-
domain = 'https://wi3kkhxignse60pcjop5.us-east-1.aoss.amazonaws.com'
189+
domain = 'https://SAMPLE.us-east-1.aoss.amazonaws.com'
191190
vector_field = 'bedrock-knowledge-base-default-vector'
192191
fieldname = 'id'
193192
try:
194193
ebropen = EmbeddingBedrockOpenSearch (domain, vector_field, fieldname)
195194
ebropen.check_if_index_exists(index_name=index_name1, region='us-east-1',host=domain,http_auth=awsauth )
196-
logger.info("now trying getdocument*************")
195+
197196
vcindxdoc=ebropen.getDocumentfromIndex(index_name=index_name1)
198-
logger.info("now getting the title**************")
197+
199198
user_query='show me all the titles in US region'
200199
document=ebropen.getSimilaritySearch(user_query,vcindex = vcindxdoc )
201200
##print(document)
202201

203202
#result = ebropen.format_metadata(document)
204203
result = ebropen.get_data(document)
205-
print("\n\n****************888888888************")
204+
206205
print(result)
207206
except Exception as e:
208207
print(e )

0 commit comments

Comments
 (0)