-
Notifications
You must be signed in to change notification settings - Fork 718
feat: add sentiment classification task #1346 #1480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Please make sure all the checkboxes are checked:
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughSearch now optionally runs sentiment analysis using the last user interaction (when save_interaction=true) before dataset validation and search delegation. A new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant API as Search API
participant Graph as Graph Engine
participant SA as Sentiment Task
participant LLM as LLM Gateway
participant Store as Data Store
User->>API: search(query_text, save_interaction=True)
API->>Graph: init() + get_last_user_interaction(limit=1)
alt last interaction exists AND save_interaction
API->>SA: run_sentiment_analysis(prev_q, prev_a, current_q, user)
SA->>LLM: acreate_structured_output(conversation context)
LLM-->>SA: { "sentiment": "...", "score": ... }
SA->>Store: add_data_points(CogneeSearchSentiment)
SA-->>API: { prev_q, prev_a, current_q, sentiment, score }
else No prior interaction or save_off
API-->>API: skip sentiment analysis
end
API->>API: validate/normalize datasets
API->>API: delegate to existing search_function(...)
API-->>User: search results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/api/v1/search/search.py (1)
175-213: Pass a resolved user instance into sentiment analysisRight now we call
run_sentiment_analysisbeforeuseris resolved, and we pass theUserclass object instead of an instance. With the default path (userarg isNone), this raises onstr(user.id)in the task every time sentiment analysis runs. Please move theif user is None: user = await get_default_user()block above this section and pass the actualuservariable. Also drop the unusedresultsassignment.Suggested diff:
- graph_engine = await get_graph_engine() + graph_engine = await get_graph_engine() @@ - # if last interaction is present and save interaction is enabled call sentiment analysis - if len(last_interactions)>0 and save_interaction: - results = await run_sentiment_analysis( - prev_question=last_interactions[0]['question'], - prev_answer=last_interactions[0]['answer'], - current_question=query_text, - user=User) - # print(results) - # print('here i am and this is interaction',last_interactions[0]['question'],last_interactions[0]['answer'],query_text) + if user is None: + user = await get_default_user() + + if last_interactions and save_interaction: + await run_sentiment_analysis( + prev_question=last_interactions[0]["question"], + prev_answer=last_interactions[0]["answer"], + current_question=query_text, + user=user, + ) @@ - if user is None: - user = await get_default_user() -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cognee/api/v1/search/search.py(2 hunks)cognee/demo.py(1 hunks)cognee/modules/retrieval/utils/models.py(1 hunks)cognee/tasks/sentiment_analysis/__init__.py(1 hunks)cognee/tasks/sentiment_analysis/sentiment_analysis.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cognee/modules/retrieval/utils/models.py (2)
cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(20-220)cognee/modules/engine/models/node_set.py (1)
NodeSet(4-7)
cognee/tasks/sentiment_analysis/__init__.py (1)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
run_sentiment_analysis(11-43)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (3)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-164)cognee/modules/engine/models/node_set.py (1)
NodeSet(4-7)cognee/modules/retrieval/utils/models.py (1)
CogneeSearchSentiment(44-52)
cognee/api/v1/search/search.py (5)
cognee/modules/retrieval/utils/models.py (1)
CogneeUserInteraction(8-14)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
run_sentiment_analysis(11-43)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)
get_last_user_interaction_ids(1328-1349)get_node(685-705)cognee/infrastructure/databases/graph/graph_db_interface.py (1)
get_node(229-238)
cognee/demo.py (2)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
run_sentiment_analysis(11-43)cognee/api/v1/search/search.py (1)
search(15-223)
🪛 Ruff (0.13.1)
cognee/api/v1/search/search.py
186-186: Local variable results is assigned to but never used
Remove assignment to unused variable results
(F841)
|
Hey @kartik-raj7, thanks so much for your contribution! I really appreciate the time and effort you’ve put in, and I’m looking forward to reviewing this one. A couple of quick notes: Once you’ve had a chance to address these points, I’ll be happy to take another look. Thanks again for your work on this! Really glad to have your contribution here and looking forward to supporting and collaborating |
Hey @lxobr, I’ve closed the irrelevant PR. Actually, this was a draft PR, and I mistakenly clicked the review option. I’ve integrated the sentiment analysis with the search. You can run it using sentiment_analysis_demo.py in python/examples. I also have a couple of questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/api/v1/search/search.py (1)
198-201: Logic bug: stray brackets force the datasets branch to always execute
if datasets is not None and [all(...)]is always truthy because[False]is truthy. This will incorrectly treat UUID lists as string names.Apply:
- if datasets is not None and [all(isinstance(dataset, str) for dataset in datasets)]: + if datasets is not None and all(isinstance(dataset, str) for dataset in datasets):
🧹 Nitpick comments (2)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (2)
5-6: AlignNodeSetimport path with existing modulesOther modules import
NodeSetfrom...models.node_set. Align to avoid package export assumptions.-from cognee.modules.engine.models import NodeSet +from cognee.modules.engine.models.node_set import NodeSet
25-33: Make sentiment data point IDs collision‑resistantUsing only
user_id + current_questionwill overwrite prior entries when the same user asks the same question again with different prior context. Include prior Q/A (or a timestamp/UUID4).- id=uuid5(NAMESPACE_OID, name=user_id + current_question), + id=uuid5(NAMESPACE_OID, name=f"{user_id}|{prev_question}|{prev_answer}|{current_question}"),Optionally clamp/cast score before persist:
- score=sentiment_result.score, + score=int(max(-5, min(5, sentiment_result.score))),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cognee/api/v1/search/search.py(2 hunks)cognee/tasks/sentiment_analysis/sentiment_analysis.py(1 hunks)examples/python/sentiment_analysis_demo.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (3)
cognee/infrastructure/llm/LLMGateway.py (1)
LLMGateway(6-164)cognee/modules/engine/models/node_set.py (1)
NodeSet(4-7)cognee/modules/retrieval/utils/models.py (1)
CogneeSearchSentiment(44-52)
examples/python/sentiment_analysis_demo.py (2)
cognee/api/v1/search/search.py (1)
search(15-221)cognee-mcp/src/server.py (1)
save_interaction(286-338)
cognee/api/v1/search/search.py (6)
cognee/modules/retrieval/utils/models.py (1)
CogneeUserInteraction(8-14)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
run_sentiment_analysis(11-41)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)
get_last_user_interaction_ids(1328-1349)get_node(685-705)cognee/infrastructure/databases/graph/kuzu/adapter.py (2)
get_last_user_interaction_ids(1650-1671)get_node(787-822)cognee/infrastructure/databases/graph/graph_db_interface.py (1)
get_node(229-238)
🪛 Ruff (0.13.1)
cognee/api/v1/search/search.py
186-186: Local variable results is assigned to but never used
Remove assignment to unused variable results
(F841)
🔇 Additional comments (3)
cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
19-23: Fix structured-output schema: don’t pass DataPoint model to the LLM responseUse a minimal Pydantic model for the LLM output, then map into
CogneeSearchSentiment. This prevents validation failure on required fields.Apply:
-from cognee.modules.retrieval.utils.models import CogneeSearchSentiment +from cognee.modules.retrieval.utils.models import CogneeSearchSentiment +from pydantic import BaseModel, Field @@ - sentiment_result = await LLMGateway.acreate_structured_output( + class SentimentClassification(BaseModel): + sentiment: str + score: int = Field(ge=-5, le=5) + + sentiment_result = await LLMGateway.acreate_structured_output( text_input=text_input, system_prompt="""Classify the user's reaction as Positive, Neutral, or Negative with a score (-5 to 5).Return the result as valid JSON like:{"sentiment": "Positive","score": 3}""", - response_model= CogneeSearchSentiment + response_model=SentimentClassification )cognee/api/v1/search/search.py (2)
12-12: Import path is correct, no change needed
get_graph_engineis re-exported incognee/infrastructure/databases/graph/__init__.py, so the existing import works as intended.
172-183: Guard graph client initialization & normalize adapter response shapesFile: cognee/api/v1/search/search.py:172-183
- Only call
get_graph_engine()whensave_interactionis True to avoid unnecessary overhead.- Normalize the output of
graph_engine.get_node()across adapters (e.g. unwrap Kuzu’s{properties: …}) soquestion/answerfields can be accessed uniformly.
|
Hey @kartik-raj7 , thanks for the updates! I will go over your code and questions tomorrow and will be in touch. In the meanwhile, I would like to kindly ask you to address and resolve the comments raised by the coderabbit. Thanks so much for your work on this and I will get back to you shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cognee/api/v1/search/search.py (1)
208-214: Bug: square brackets force the condition to always be truthy
[all(...)]creates a non-empty list, so the branch runs for any non-Nonedatasets. Fix by removing the brackets.- if datasets is not None and [all(isinstance(dataset, str) for dataset in datasets)]: + if datasets is not None and all(isinstance(dataset, str) for dataset in datasets): datasets = await get_authorized_existing_datasets(datasets, "read", user) datasets = [dataset.id for dataset in datasets] if not datasets: raise DatasetNotFoundError(message="No datasets found.")
🧹 Nitpick comments (6)
examples/python/sentiment_analysis_demo.py (3)
1-3: Add a brief module docstring noting data reset and requirementsClarify that the demo wipes local data and needs LLM_API_KEY to run.
+"""Sentiment Analysis demo. + +- WARNING: prunes local data and system metadata. +- Requires LLM_API_KEY for LLM-backed search. +""" import cognee import asyncio
4-12: Fail fast if LLM_API_KEY is missingProvide a clear message rather than failing later during search.
async def main(): + import os + if not os.getenv("LLM_API_KEY"): + print("Set LLM_API_KEY to run LLM-backed search.") + return await cognee.prune.prune_data() await cognee.prune.prune_system(metadata=True)
27-31: Improve output readability for demo usersPretty-print the results to make the demo more informative.
- for query, res in all_results.items(): - print(f"\nQuery: {query}") - for r in res: - print(f" - {r}") + import pprint + pp = pprint.PrettyPrinter(indent=2, width=100) + for query, res in all_results.items(): + print(f"\nQuery: {query}") + pp.pprint(res) + print("\nNote: Sentiment runs starting with the second query (uses prior interaction).")cognee/api/v1/search/search.py (3)
175-176: Minor: merge isinstance callsUse the tuple form to simplify.
- if isinstance(datasets, UUID) or isinstance(datasets, str): + if isinstance(datasets, (UUID, str)): datasets = [datasets]
205-206: Log the stack trace and avoid generic “error” logUse logger.exception and drop the unused exception variable.
- except Exception as e: - logger.error(f"Sentiment Analysis Failed: {e}") + except Exception: + logger.exception("Sentiment analysis failed")This also addresses lints BLE001/TRY400. [Based on learnings]
181-204: Direction check: sentiment integration looks good; a few product decisions
- Approach: Using (prev_question, prev_answer, current_question) to infer reaction to the previous answer is sound.
- Storage/linking: Also link each CogneeSearchSentiment to the corresponding CogneeUserInteraction (DERIVED_FROM) and to the user or user set. Adds traceability over time.
- Enum vs string: Prefer an Enum for sentiment (align with UserFeedbackSentiment) for consistency and validation; keep a numeric score for sorting.
- Resilience: Current try/except + non-blocking is appropriate; consider sampling/ratelimiting if save_interaction is high-traffic.
- Memify pipeline: Run sentiment as part of the save_interaction flow (post-ingest) or via a small background task triggered there; keep search path fast and optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cognee/api/v1/search/search.py(2 hunks)examples/python/sentiment_analysis_demo.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
examples/python/sentiment_analysis_demo.py (2)
cognee/api/v1/search/search.py (1)
search(16-231)cognee-mcp/src/server.py (1)
save_interaction(286-338)
cognee/api/v1/search/search.py (9)
cognee/modules/retrieval/utils/models.py (1)
CogneeUserInteraction(8-14)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/sentiment_analysis/sentiment_analysis.py (1)
run_sentiment_analysis(11-41)cognee/shared/logging_utils.py (1)
get_logger(182-194)cognee/infrastructure/databases/graph/kuzu/adapter.py (2)
get_last_user_interaction_ids(1650-1671)get_node(787-822)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)
get_last_user_interaction_ids(1328-1349)get_node(685-705)cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
get_node(46-47)cognee/infrastructure/databases/graph/graph_db_interface.py (1)
get_node(229-238)cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (1)
get_node(649-656)
🪛 Ruff (0.13.1)
cognee/api/v1/search/search.py
205-205: Do not catch blind exception: Exception
(BLE001)
206-206: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🪛 Pylint (3.3.8)
cognee/api/v1/search/search.py
[refactor] 175-175: Consider merging these isinstance calls to isinstance(datasets, (UUID, str))
(R1701)
Hey @Ixobr, Were you able to have a look at it? Please let me know if you have suggestions and anythings that I need to improve. |
|
Hey @kartik-raj7! Thanks so much for your work on this so far. And sorry for the late reply, I was off for a couple of days. Let me clarify the scope of this issue/PR since it’s a bit ambiguously worded. There are two main parts to implement. First, we need to create a task that processes all saved interactions. This is the task you’ve already started working on. Here are some additional details about what the task should do:
Second, we need to create a custom memify pipeline that uses the task. This essentially means creating your own version of this example. For this issue, you should be able to keep it minimal by setting:
This can all live in your demo, which you’ve already started working on. Just make sure to first run regular cognify on some text, perform a few searches (some neutral, some not), and then run your memify pipeline demonstrating that the sentiment analysis works. Thanks again for all the time and effort you are investing in this. I am looking forward to hearing from you and supporting you to get this PR through! |
<!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> Closes topoteretes#1558 ## Type of Change <!-- Please check the relevant option --> - [x] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [x] **I have tested my changes thoroughly before submitting this PR** - [x] **This PR contains minimal changes necessary to address the issue/feature** - [x] My code follows the project's coding standards and style guidelines - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have added necessary documentation (if applicable) - [x] All new and existing tests pass - [x] I have searched existing PRs to ensure this change hasn't been submitted already - [x] I have linked any relevant issues in the description - [x] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
| user_id=user_id, | ||
| belongs_to_set=NodeSet(id=uuid5(NAMESPACE_OID, "CogneeSearchSentiment"), name="CogneeSearchSentiment") | ||
| ) | ||
| await add_data_points(data_points=[sentiment_data_point], update_edge_collection=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update_edge_collection is removed from add_data_points, only data_points should be sent.
cognee/api/v1/search/search.py
Outdated
| last = props | ||
| break | ||
| if last: | ||
| await run_sentiment_analysis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running sentiment analysis on search request will slow down the search. We are currently trying to speed it up, so would be good to have this running somehow independently.
When/where do you need that sentiment analysis? What is the use case?
<!-- .github/pull_request_template.md --> ## Description <!-- Please provide a clear, human-generated description of the changes in this PR. DO NOT use AI-generated descriptions. We want to understand your thought process and reasoning. --> ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ x] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Screenshots/Videos (if applicable) <!-- Add screenshots or videos to help explain your changes --> ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add scorecard CI/CD for github secure open source fund ## Type of Change <!-- Please check the relevant option --> - [ ] Bug fix (non-breaking change that fixes an issue) - [ ] New feature (non-breaking change that adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Documentation update - [ ] Code refactoring - [ ] Performance improvement - [ ] Other (please specify): ## Pre-submission Checklist <!-- Please check all boxes that apply before submitting your PR --> - [ ] **I have tested my changes thoroughly before submitting this PR** - [ ] **This PR contains minimal changes necessary to address the issue/feature** - [ ] My code follows the project's coding standards and style guidelines - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have added necessary documentation (if applicable) - [ ] All new and existing tests pass - [ ] I have searched existing PRs to ensure this change hasn't been submitted already - [ ] I have linked any relevant issues in the description - [ ] My commits have clear and descriptive messages ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
|
Hey @lxobr , Here’s an update on what I’ve done so far:
Where I need help: I’m unsure about how to pass the list of sentiment data points from the sentiment analysis task to add_data_points in the enrichment task within the memify pipeline. Could you guide me on how to correctly pass this data between tasks in memify? Also, please let me know if my work so far aligns with your expectations and how I should proceed. |
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 9573981 | Triggered | Generic Password | f240035 | .github/workflows/e2e_tests.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
|
Hey @kartik-raj7 , thanks so much for all the work you've done so far! Great to see this shaping up 🚀 Currently, the demo isn’t working for me. Could you please confirm whether it’s running on your end? I won’t be able to proceed until this is resolved. I also took a look at the code you submitted and had a few general remarks that I hope you’ll find helpful:
Thanks again for all the time and effort you’re putting into this. I’m looking forward to seeing the improvements and reviewing the next iteration. 💪 |
Description
Type of Change
Screenshots/Videos (if applicable)
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.