-
Notifications
You must be signed in to change notification settings - Fork 57
load_csv/feat: support multi csv field #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Avelino <31996+avelino@users.noreply.github.com>
src/load_csv.py
Outdated
|
|
||
|
|
||
| def load_csv_and_generate_embeddings(path, cleardb=False): | ||
| def load_csv_and_generate_embeddings(path, cleardb=False, columns=["content"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def load_csv_and_generate_embeddings(path, cleardb=False, columns=["content"]): | |
| def load_csv_and_generate_embeddings(path, cleardb=False, columns=("content",)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@walison17 why are we fallbacking to a tuple here?
|
Actually this part needs to be changed to support this possibility of embedding multiple columns at once (typical case for "question+answer" in the same column (aka a document). dialog/src/dialog/llm/default.py Line 41 in f97728a
|
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
@lgabs the goal of this PR is to generate embeddings with more fields, and not to change the way, we work with prompt generation for OpenAI.
|
I've said that because we'll probably create confusion: when we allow the embedding of multiple fields, the typical motivation case will be question+answer together, which usually gives much better results in RAG from user queries, and keeping this behavior in |
This is still served as multiple-fields embedding, so we do not need to address to a specific solution, if a user decides the question is not relevant to the context, he should be able to change it easily without editing our source code.
The user can set this into the prompt settings. |
|
@avelino @walison17 LGTM, but i'd rather have the columns tuple as a list instead of a tuple. |
vmesel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fixed: #146