20,692 questions
0
votes
1
answer
66
views
R Code to recognize speaker turns in verbatim records with inconsistent formatting
I am trying to perform text analysis on large character strings that include multiple different speakers. I need to create a dataframe of 2 columns, speaker and turn, which have the person speaking ...
1
vote
4
answers
175
views
Split in Half Long PDF Texts with Poor OCR-ing to Generate Speaker Turns in the Correct Order (R)
I am trying to process text for quantitative text analysis. I need to read in pdfs of transcripts from WHO plenary meetings and process the text into a speaker turn dataframe, identifying the speaker ...
1
vote
0
answers
66
views
Cannot response as stream token-by-token when an agent is a part of a workflow in Langgraph
I am designing a complex workflow in Langgraph where there will be multiple agents. Some of the agents will be intermediary and some other will be user-facing (that is I will show the output of those ...
-2
votes
0
answers
70
views
Facing Issue while developing natural language T-SQL generation [closed]
I am developing a NL2SQL model, and for this I am using defog/sqlcoder-7b. While working with this model, I am facing issues where it is not able to generate complex SQL queries, especially queries ...
1
vote
1
answer
152
views
Making a interactive wordcloud in Dash + 2 different wordclouds for different clicks
I'm trying to make an interactive wordcloud in Dash. Is it even possible?
Also, I'm trying to make two different wordclouds, with each click being a different wordcloud:
1 click -> shows one ...
1
vote
2
answers
155
views
ImportError: cannot import name 'MistralCommonBackend' from 'transformers'
Given:
$ pip install transformers mistral-common -Uq
$ pip show transformers mistral-common torch # to verify
Name: transformers
Version: 4.57.3
Summary: State-of-the-art Machine Learning for JAX, ...
0
votes
0
answers
57
views
Huggingface caching stops during download without any error
I am trying to run an example from z-image-turbo
(https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
import os
os.environ["HF_HOME"] = "E:/MyCustomHFCache"
os.environ["...
Best practices
0
votes
2
replies
61
views
Recommended way to create abstracted text embeddings from large text data?
I would like to use a LLM Encoder model to create vector embeddings for certain texts in my dataset. The texts are written as technical problem descriptions by experts who are trying to repair a ...
Advice
0
votes
1
replies
44
views
Organisation/Person tagging using Spacy
We’re working on a problem where our master dataset contains names of organizations and individuals, but some entries are untagged. We only have the names (no additional details such as email or ...
1
vote
1
answer
149
views
Memory usage keeps increasing when extracting embeddings via sentence-transformers
I have a set of about 100M paragraph-sized strings (multilingual) I am extracting embeddings for, but the memory usage keeps increasing until I start overflowing into disk swap:
model = ...
0
votes
0
answers
104
views
Torch example transformer with TransformerDecoder
In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
0
votes
0
answers
43
views
T5-small generates only padding tokens during validation/test in PyTorch Lightning
I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps.
The Problem:
During validation_step and test_step, model.generate() consistently ...
0
votes
0
answers
65
views
Utilizing GPU with RNN models which takes it's output as input [torch]
I have a machine-translation model. In this model, I calculate a vector for a given sentence and I take this vector, aggregate with each generated output of RNN and put it into RNN again for ...
0
votes
0
answers
77
views
Streamlit app throwing "NotFittedError: idf vector is not fitted" even though TF-IDF pipeline is fitted and works locally
I trained a sentiment classification model using a scikit-learn Pipeline that includes a TfidfVectorizer and LogisticRegression classifier.
Everything works perfectly on my local machine, but when I ...
2
votes
1
answer
73
views
Angle Embedder in Python Messing Up Logging Config
I wrote another question on this earlier, but could not pinpoint the issue on my side, here, I am giving a minimal reproducible code.
System
Angle version 0.5.6
UV 0.8.22
Python 3.12
Ubuntu 24.04
I ...