Skip to content

Conversation

@avelino
Copy link
Member

@avelino avelino commented Mar 7, 2024

fixed: #146

Signed-off-by: Avelino <31996+avelino@users.noreply.github.com>
@avelino avelino requested review from vmesel and walison17 March 7, 2024 22:24
src/load_csv.py Outdated


def load_csv_and_generate_embeddings(path, cleardb=False):
def load_csv_and_generate_embeddings(path, cleardb=False, columns=["content"]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def load_csv_and_generate_embeddings(path, cleardb=False, columns=["content"]):
def load_csv_and_generate_embeddings(path, cleardb=False, columns=("content",)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walison17 why are we fallbacking to a tuple here?

@lgabs
Copy link
Collaborator

lgabs commented Mar 8, 2024

Actually this part needs to be changed to support this possibility of embedding multiple columns at once (typical case for "question+answer" in the same column (aka a document).

[f"{c.question}\n{c.content}\n" for c in self.relevant_contents]

avelino and others added 4 commits March 8, 2024 11:49
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
Co-authored-by: Walison Filipe <walisonfilipe@hotmail.com>
@avelino
Copy link
Member Author

avelino commented Mar 8, 2024

Actually this part needs to be changed to support this possibility of embedding multiple columns at once (typical case for "question+answer" in the same column (aka a document).

[f"{c.question}\n{c.content}\n" for c in self.relevant_contents]

@lgabs the goal of this PR is to generate embeddings with more fields, and not to change the way, we work with prompt generation for OpenAI.

We can discuss this matter in an issue and understand why this behavior should be changed, but this is not the place for that discussion.

@avelino avelino requested a review from walison17 March 8, 2024 16:40
@lgabs
Copy link
Collaborator

lgabs commented Mar 8, 2024

Actually this part needs to be changed to support this possibility of embedding multiple columns at once (typical case for "question+answer" in the same column (aka a document).

[f"{c.question}\n{c.content}\n" for c in self.relevant_contents]

@lgabs the goal of this PR is to generate embeddings with more fields, and not to change the way, we work with prompt generation for OpenAI.

We can discuss this matter in an issue and understand why this behavior should be changed, but this is not the place for that discussion.

I've said that because we'll probably create confusion: when we allow the embedding of multiple fields, the typical motivation case will be question+answer together, which usually gives much better results in RAG from user queries, and keeping this behavior in default.py will make the prompt generation duplicates the question. We could avoid this confusion explaining that content so far means "answer to a question" and later discuss towards a more concise solution.

@vmesel
Copy link
Member

vmesel commented Mar 9, 2024

we'll probably create confusion: when we allow the embedding of multiple fields, the typical motivation case will be question+answer together

This is still served as multiple-fields embedding, so we do not need to address to a specific solution, if a user decides the question is not relevant to the context, he should be able to change it easily without editing our source code.

We could avoid this confusion explaining that content so far means "answer to a question" and later discuss towards a more concise solution.

The user can set this into the prompt settings.

@vmesel
Copy link
Member

vmesel commented Mar 9, 2024

@avelino @walison17 LGTM, but i'd rather have the columns tuple as a list instead of a tuple.

Copy link
Member

@vmesel vmesel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@avelino avelino merged commit 83ad1f0 into main Mar 9, 2024
@avelino avelino deleted the avelino/issue-146 branch March 9, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants