Day 21 Complete: Multi-Modal Analysis Done ✓ Friday evening. 5 days. Done. Buffer #BufferAPI #BuildInPublic What shipped: Image analysis ✓ • Object detection, OCR, colors, composition • 180ms average (target 200ms) • 34 tests, 93% coverage Video pipeline ✓ • Metadata, thumbnails, keyframes, scenes, audio • 450ms average (target 500ms) • 48 tests, 92% coverage Link analysis ✓ • Preview optimization, quality scoring, CTR prediction • 280ms average (target 300ms) • 38 tests, 94% coverage Feature engineering ✓ • Multi-modal fusion (text + image + video + links) • 24 tests, 93% coverage Prediction ✓ • R² = 0.83 (target 0.80) • 18 tests Intelligence service ✓ • End-to-end pipeline working • 12 tests Final numbers: Files: 89 of 95 (6 docs deferred to morning) Tests: 1,847 passing (100%) Coverage: 93% Type safety: 100% All performance targets: beat ✓ Key findings: Multi-modal genuinely improves predictions • Text only: R² = 0.71 • Text + image + video + links: R² = 0.83 • Visual features add real signal Title quality = biggest CTR driver • Predicts 68% of CTR variance • Image quality adds 18% more • Description adds 9% Audio in videos matters • Posts with audio: +23% engagement • Worth detecting and recommending Tomorrow: Buffer API integration 21 days of ML meets real data. Predict → Post via Buffer API → Measure actual → Compare → Improve Feedback loop finally closes. This is what all of it was building toward. Week reflection: Mon: Ambitious start Wed: Buffer API launched, celebrated + built Fri: Done Building in phases, sustainable pace, no all-nighters. Quality maintained throughout. Buffer 20wks: https://buff.ly/VkbM93f 📖 https://buff.ly/lSVraLc 💻 https://buff.ly/WQtvbul #BufferAPI #BufferIQ #BuildInPublic #Day21Complete
Multi-Modal Analysis Done: 93% Coverage, 0.83 R²
More Relevant Posts
-
I recently built ContextLens, a platform designed to benchmark RAG pipelines and detect hallucinations in long-context retrieval. Most RAG systems lack rigorous evaluation. ContextLens addresses this by providing a "Triple-Pipeline" comparison engine that benchmarks models in three concurrent states: - Baseline: No context (testing parametric knowledge). - Full Context: Injecting the entire document (testing long-context recall). - RAG: Semantic chunks (testing retrieval accuracy). The Tech Stack: Managing real-time streaming answers alongside positional bias heatmaps and "noise-slope" metrics requires a highly reactive UI. I chose Svelte 5 for this—runes like $derived.by and $state allowed me to keep the complex analytics logic decoupled and clean. The performance gains when paired with Bun and Convex for the backend were significant. Key Features: - Positional Bias Heatmap: Real-time visualization of the "lost-in-the-middle" effect. - Noise Injection Lab: Stress-tests model robustness against irrelevant context. - Triple Answer Engine: Side-by-side evaluations streamed via SSE. - Automated Research Probes: Adversarial query generation to find hallucination boundaries. Demo: https://lnkd.in/gtn8xi59 Repo: https://lnkd.in/gJ_4uDda I'm curious to hear how others are handling hallucination detection in production. Let’s discuss in the comments.
To view or add a comment, sign in
-
Shipped loOom v0.8. Something I noticed about myself and other analysts I work with — most of our work is recurring. Monthly financial plan review. OKR check-in at the start of the quarter. Investigating why some metric drifted. Same sources, same algorithm, same edge cases we already argued about and agreed on last time. But when I ask an AI agent to help, it starts from zero every single time. Doesn't remember where the data lives, doesn't remember how we count, doesn't remember that one weird line we decided to bucket differently. So you sit there and re-explain the whole context for the tenth time. v0.8 is my attempt to fix this. I added a thing called a task — really just a folder at `.looom/tasks/NNN/` with five text files: the goal, the algorithm, accumulated knowledge, a short journal of sessions, and a list of related entities from the project graph. All plain text. Sits next to your project. Diffs nicely in git. No database, no parallel state inside the binary. When a task is fresh and the algorithm file is empty, the agent on its first run asks you the questions — what are the sources, how do we count, is this one-off or recurring — and writes the answers into `instruction.md`. Next time you press A, it starts with that instruction plus the accumulated state plus the last entries from the journal. Not from scratch. The full chat doesn't go into the journal, only a short "done / open" note, 3-5 lines per session. Open a task and the graph auto-filters to its scope, so you're not staring at the whole project. If new entities come up during a session, the agent appends them to `scope.tsv`. macOS, ~640KB, no network calls, no dependencies. Comes with a Demo folder so you can try it without wiring up your own data. Link in comments. #cli #terminal #devtools #analytics #aiagents
To view or add a comment, sign in
-
Multi-hop RAG with knowledge graphs costs more at inference than most teams budget for. GRASP has a number on that. The pattern I keep seeing: teams bolt a knowledge graph onto their RAG pipeline for better multi-hop reasoning, ship it, then get surprised when token costs compound across iterative retrieval steps. Each hop adds context. Context adds tokens. Tokens add cost. By step 4 or 5, you are paying 3-5x what a flat vector search would have cost for the same question. GRASP (Graph Agentic Search over Propositions) is an arXiv paper that addresses this directly. The core idea is decomposing documents into atomic propositions before indexing, then running agentic graph traversal over those propositions rather than full chunks. The result is more precise retrieval with less token bleed at each hop. What this paper surfaces for production teams: 🔹 Index-time cost is the hidden tax nobody talks about in knowledge graph RAG, and GRASP's proposition extraction step shifts where you pay 🔹 Teams running LlamaIndex or LangGraph pipelines with multi-hop retrieval should expect compounding token usage past step 3, not just at the edges 🔹 Proposition-level granularity reduces context window pressure at inference, which matters when you are running 50,000+ queries a day against a 128k-token model GRASP production takeaway: Over the next 6-12 months, teams building serious multi-hop QA systems will need to choose between graph complexity at index time versus token cost at inference time. The mainstream assumption is that richer graphs mean better answers. My read is that granularity of the retrieval unit matters more than graph topology, and proposition-level indexing deserves a real benchmark against your current chunking strategy before you commit to a full knowledge graph build. If you have run knowledge graph retrieval in production, at what hop count did inference token costs become the design constraint rather than retrieval quality? Drop your take below. #AgenticAI #LLMAgents #RAG #MultiAgentSystems #AIEngineering
To view or add a comment, sign in
-
last time i noticed a 9-point gap between train (97.95%) and test accuracy (88.69%) — a clear sign of overfitting. today i fixed it. same dataset. same architecture. added BatchNorm + Dropout + weight decay. test accuracy: 88.32% → held steady train accuracy: 97.95% → dropped to 93.25% the gap closed from 9 points to ~5. the model is generalizing better. here's what i added and why each one matters: BatchNorm1d after every Linear layer → normalizes activations between layers, stabilizes training, and lets you use higher learning rates without diverging. it also acts as mild regularization on its own. Dropout(p=0.3) after every ReLU → randomly zeros 30% of neurons during training. forces the network to learn redundant representations instead of memorizing specific paths. disabled automatically during model.eval(). weight_decay=1e-4 in SGD optimizer → adds L2 regularization — penalizes large weights. nudges the optimizer to prefer simpler solutions over ones that overfit to training patterns. one thing that's easy to miss: Dropout behaves differently in train vs eval mode. during training it randomly drops neurons. during evaluation it does nothing — but only if you've called model.eval(). if you forget that call, your test accuracy will be artificially lower than it should be. the loss curve also looked different — much smoother and more stable than before. BatchNorm and Dropout together slow down raw training accuracy, but what you lose in speed you gain in robustness. this is the regularization toolkit every PyTorch model needs before you call it production-ready. notebook here 👇 https://lnkd.in/g7MhfEV5 #PyTorch #DeepLearning #MachineLearning #AIEngineering #LearningInPublic #Python
To view or add a comment, sign in
-
One fixed RAG pipeline for every query type is not a neutral default. It is a hidden performance tax. Most teams building retrieval-augmented systems treat the retrieval step as solved once they have a vector store connected. It is not. The research coming out around Experience-RAG Skill makes this concrete: * Factoid lookups, multi-hop reasoning, and scientific verification each need different retrieval strategies * A single fixed pipeline forces every query through the same path regardless of what the task actually requires * The cost shows up in quality degradation that is hard to attribute and easy to miss What the Experience-RAG Skill approach does is insert an orchestration layer between the agent and the retriever pool. It reads the current task, consults an experience memory of past retrieval outcomes, and routes accordingly. Not static routing rules. Adaptive selection based on what has actually worked before. The underappreciated piece here is the experience memory component. Most teams using LangGraph or similar orchestration frameworks spend energy on which retriever to use. Almost none are logging which retrieval strategy worked for which query class and feeding that back into future decisions. That feedback loop is the part worth building. My takeaway: In 6-12 months, teams running production RAG at scale will start treating retrieval strategy selection as a first-class problem, not an implementation detail. The ones who instrument this now will have a real quality advantage over teams still using one-size-fits-all pipelines. For anyone running a multi-step RAG pipeline in production: A. Fixed pipeline, no routing logic B. Rule-based routing by query type C. Dynamic routing with some feedback mechanism D. Still figuring out what is breaking and why Curious where you stand. #AgenticAI #AIAgents #RAG #LLMAgents #ArtificialIntelligence
To view or add a comment, sign in
-
Just shipped LivePack into Knolo Cortex; the live KB overlay that actually lets you mutate knowledge without turning your deterministic core into a pile of eventual consistency headaches. You mount your immutable base .knolo pack once. Then drop a LivePack on top. Add, update, remove docs by stable ID. The base stays untouched, your live changes shadow or tombstone cleanly, and every query merges delta-over-base with the exact same Hit[] shape you already use. When you’re ready, serialize() spits out a clean snapshot you can save, ship, or reload anywhere. No vector DB. No background jobs. No drift. Still pure lexical-first, graph-aware, fully offline. Perfect for agents that need to learn on the fly, note apps that users edit in real time, or any workflow where the knowledge base can’t wait for the next rebuild. const base = await mountPack({ src: './knowledge.knolo' }); const live = await createLivePack(base, [ { id: 'notes.alpha', text: 'initial draft', namespace: 'notes' } ]); await live.updateDocument({ id: 'notes.alpha', text: 'updated after user feedback v2' }); await live.addDocument({ id: 'notes.beta', text: 'new live entry' }); const hits = live.query('feedback', { topK: 8 }); const snapshot = await live.serialize(); // ready to persist I’ve built enough RAG systems to know the pain: static packs are great until reality hits and everything needs to change. LivePack gives you that mutability layer without selling out the determinism that actually matters. This is phase one — lexical/graph only, rock solid, easy to reason about. Cortex stays your append-only memory layer on the side. Together they feel like a real local-first knowledge runtime instead of another fragile vector wrapper. If you’re running small models on-device, shipping to edge, or just tired of rebuilding packs every time someone adds a note, go mount this. https://lnkd.in/dnuXzdwh Tell me what you’re wiring it into. The trenches are loud — let’s make the next layer quieter.
To view or add a comment, sign in
-
-
Stop memorizing Claude Code commands. I turned every one into a single cheat sheet you can save and reference forever. 🎯 Updated for v2.1.142 + Opus 4.7. Organized into 6 categories: 1. Setup → install, doctor, auth in 60 seconds 2. Session flags → -p, -c, --model opus, --permission-mode 3. Slash commands (session) → /plan, /effort, /rewind, /branch 4. Context → /compact, /context, @file, @folder, @url 5. Agents & MCP → /agents, /mcp, /goal, /voice, /loop 6. Keyboard shortcuts → Shift+Tab, Esc Esc, Cmd+Enter (full cheat sheet in the image 👆 — save it) Here's the shift most people miss: You don't need more prompts. You need better context + tools. Think in workflows, not prompts. Design the system. Let Claude execute. Most people use 10% of Claude Code. Master the commands above and you're in the top 1%. I built the full printable reference — every command, every flag, with the "when to actually use it" notes the cheat sheet doesn't have room for. If you want it FREE: 1️⃣ Like this post 👍 2️⃣ Comment "CODE" 💬 3️⃣ Connect with me — add "CODE" in the connection note I'll DM the full reference this weekend. MUST BE CONNECTED. ♻️ Repost so your team stops Googling commands. Which category will you use most? 👇 #ClaudeCode #AI #DevTools #buildinpublic #Marketing
To view or add a comment, sign in
-
-
Reverse engineer your competition entities with http://schemawriter.ai We added an entity report in schemawriter, that allows you to see what entities the competition is using, and where they use them. I know from Primeindexer, that named entities in the title and headers, increase indexing chance. I know from schemawriter that using named entities in title and headers improve rankings, and now we can see exactly what entities the competition is using, and where. #schemawriter #jespernissenseo
To view or add a comment, sign in
-
-
replacing a fully-connected ANN with a CNN on the same dataset bumped accuracy from 88.69% → 92.69%. same data. same GPU. same training loop. different architecture. that 4-point jump is the CNN doing what ANNs fundamentally cannot — exploiting spatial structure in image data. here's the engineering breakdown. the architecture decision: an ANN flattens a 28×28 image into 784 numbers and loses all spatial relationships. a pixel next to another pixel carries information — the ANN throws that away. a CNN processes the image as a 2D grid and extracts local patterns (edges, textures, shapes) through learned filters. the feature extraction block: → Conv2d(1→32, kernel=3, padding='same') → ReLU → BatchNorm2d → MaxPool2d(2×2) → Conv2d(32→64, kernel=3, padding='same') → ReLU → BatchNorm2d → MaxPool2d(2×2) 28×28 input → after two MaxPool layers → 7×7 feature maps × 64 channels = 3,136 values passed to the classifier. the classifier block: → Flatten → Linear(3136, 128) → ReLU → Dropout(0.4) → Linear(128, 64) → ReLU → Dropout(0.4) → Linear(64, 10) the engineering choices that matter: padding='same' — keeps spatial dimensions unchanged after convolution. without it you'd need to track shrinking dimensions manually across layers. BatchNorm2d after each conv layer — normalizes feature maps, stabilizes training, and lets higher learning rates work without diverging. MaxPool2d — reduces spatial dimensions by half each time, cutting compute while preserving dominant features. it also adds a degree of translation invariance. Dropout(0.4) only in the classifier — not in the conv layers. conv layers generalize well through weight sharing. dropout there hurts more than it helps. results: train accuracy: 99.99% test accuracy: 92.69% loss: 0.64 → 0.015 over 100 epochs the 7-point train/test gap is worth watching. the model is strong but can go further — next step is data augmentation or a learning rate scheduler to push test accuracy closer to 94–95%. progression across this series: CPU ANN (5k samples) → 83.25% GPU ANN (60k samples) → 88.69% GPU ANN + regularization → 88.32% GPU ANN + Optuna → 89.1% GPU CNN → 92.69% ✓ every improvement came from an architectural or engineering decision — not luck. notebook here 👇 https://lnkd.in/gUf765Ps #PyTorch #CNN #DeepLearning #ComputerVision #AIEngineering #MachineLearning #LearningInPublic
To view or add a comment, sign in
-
I am extremely impressed with #ClaudeCode's "/insights" Claude scanned my last 30 days of code and identified bottlenecks, suggested skill documents for specific repeatable workflows, and created genuine insights for how I approach my work moving forward! Claude noted /resume as a chokepoint on efficient workflow and suggested front loading context and created a 5 point summary before running anything. The reason is that "framing drift" is constantly recognized during pivot/refresh moments regarding Claude's reasoning. What I find most interesting about "framing drift" is that Claude keeps reaching for ML and probabilistic paradigms because it treats them as the frontier, which means the real frontier is exactly where it stops reaching.
To view or add a comment, sign in