Newest 'nlp' Questions - Stack Overflow

0 votes

1 answer

66 views

R Code to recognize speaker turns in verbatim records with inconsistent formatting

I am trying to perform text analysis on large character strings that include multiple different speakers. I need to create a dataframe of 2 columns, speaker and turn, which have the person speaking ...

flâneur

353

asked 17 hours ago

1 vote

4 answers

175 views

Split in Half Long PDF Texts with Poor OCR-ing to Generate Speaker Turns in the Correct Order (R)

I am trying to process text for quantitative text analysis. I need to read in pdfs of transcripts from WHO plenary meetings and process the text into a speaker turn dataframe, identifying the speaker ...

flâneur

353

asked Dec 24, 2025 at 16:47

1 vote

0 answers

66 views

Cannot response as stream token-by-token when an agent is a part of a workflow in Langgraph

I am designing a complex workflow in Langgraph where there will be multiple agents. Some of the agents will be intermediary and some other will be user-facing (that is I will show the output of those ...

hafiz031

2,964

asked Dec 23, 2025 at 5:33

-2 votes

0 answers

70 views

Facing Issue while developing natural language T-SQL generation [closed]

I am developing a NL2SQL model, and for this I am using defog/sqlcoder-7b. While working with this model, I am facing issues where it is not able to generate complex SQL queries, especially queries ...

Mihir Kansara

1

asked Dec 22, 2025 at 6:34

1 vote

1 answer

152 views

Making a interactive wordcloud in Dash + 2 different wordclouds for different clicks

I'm trying to make an interactive wordcloud in Dash. Is it even possible? Also, I'm trying to make two different wordclouds, with each click being a different wordcloud: 1 click -> shows one ...

Rasmus Svendsen

11

asked Dec 18, 2025 at 9:24

1 vote

2 answers

155 views

ImportError: cannot import name 'MistralCommonBackend' from 'transformers'

Given: $ pip install transformers mistral-common -Uq $ pip show transformers mistral-common torch # to verify Name: transformers Version: 4.57.3 Summary: State-of-the-art Machine Learning for JAX, ...

farid

1,641

asked Dec 17, 2025 at 13:09

0 votes

0 answers

57 views

Huggingface caching stops during download without any error

I am trying to run an example from z-image-turbo (https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) import os os.environ["HF_HOME"] = "E:/MyCustomHFCache" os.environ["...

Recently_Created_User

171

asked Dec 9, 2025 at 11:08

Best practices

0 votes

2 replies

61 views

Recommended way to create abstracted text embeddings from large text data?

I would like to use a LLM Encoder model to create vector embeddings for certain texts in my dataset. The texts are written as technical problem descriptions by experts who are trying to repair a ...

Alles Klar

31

asked Dec 8, 2025 at 12:32

Advice

0 votes

1 replies

44 views

Organisation/Person tagging using Spacy

We’re working on a problem where our master dataset contains names of organizations and individuals, but some entries are untagged. We only have the names (no additional details such as email or ...

MJ17

109

asked Nov 12, 2025 at 17:03

1 vote

1 answer

149 views

Memory usage keeps increasing when extracting embeddings via sentence-transformers

I have a set of about 100M paragraph-sized strings (multilingual) I am extracting embeddings for, but the memory usage keeps increasing until I start overflowing into disk swap: model = ...

Layman

1,076

asked Oct 29, 2025 at 18:09

0 votes

0 answers

104 views

Torch example transformer with TransformerDecoder

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...

cuneyttyler

1,395

asked Oct 21, 2025 at 8:48

0 votes

0 answers

43 views

T5-small generates only padding tokens during validation/test in PyTorch Lightning

I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps. The Problem: During validation_step and test_step, model.generate() consistently ...

GeraniumCat

21

asked Oct 20, 2025 at 20:11

0 votes

0 answers

65 views

Utilizing GPU with RNN models which takes it's output as input [torch]

I have a machine-translation model. In this model, I calculate a vector for a given sentence and I take this vector, aggregate with each generated output of RNN and put it into RNN again for ...

cuneyttyler

1,395

asked Oct 15, 2025 at 14:20

0 votes

0 answers

77 views

Streamlit app throwing "NotFittedError: idf vector is not fitted" even though TF-IDF pipeline is fitted and works locally

I trained a sentiment classification model using a scikit-learn Pipeline that includes a TfidfVectorizer and LogisticRegression classifier. Everything works perfectly on my local machine, but when I ...

MOSAB FATAH

1

asked Oct 13, 2025 at 4:55

2 votes

1 answer

73 views

Angle Embedder in Python Messing Up Logging Config

I wrote another question on this earlier, but could not pinpoint the issue on my side, here, I am giving a minimal reproducible code. System Angle version 0.5.6 UV 0.8.22 Python 3.12 Ubuntu 24.04 I ...

Della

1,720

asked Oct 12, 2025 at 1:03

Collectives™ on Stack Overflow

R Code to recognize speaker turns in verbatim records with inconsistent formatting

Split in Half Long PDF Texts with Poor OCR-ing to Generate Speaker Turns in the Correct Order (R)

Cannot response as stream token-by-token when an agent is a part of a workflow in Langgraph

Facing Issue while developing natural language T-SQL generation [closed]

Making a interactive wordcloud in Dash + 2 different wordclouds for different clicks

ImportError: cannot import name 'MistralCommonBackend' from 'transformers'

Huggingface caching stops during download without any error

Recommended way to create abstracted text embeddings from large text data?

Organisation/Person tagging using Spacy

Memory usage keeps increasing when extracting embeddings via sentence-transformers

Torch example transformer with TransformerDecoder

T5-small generates only padding tokens during validation/test in PyTorch Lightning

Utilizing GPU with RNN models which takes it's output as input [torch]

Streamlit app throwing "NotFittedError: idf vector is not fitted" even though TF-IDF pipeline is fitted and works locally

Angle Embedder in Python Messing Up Logging Config

Hot Network Questions