HAR.S.H. (Hardware-Aware extReme-scale Similarity search)’s Post

HAR.S.H. (Hardware-Aware extReme-scale Similarity search)

61 followers

1mo

⭐️We are thrilled to share our recent publication, Concurrent Balanced Augmented Trees, accepted at Symposium on Principles and Practice of Parallel Programming (PPoPP 2026). 📄The paper, written by Evan Wrench, Ajay Singh, Younghun Roh, Panagiota Fatourou, Siddhartha Jayanti, Eric Ruppert and Yuanhao Wei, explores a new way to create lock-free augmented balanced search trees. 📈This work presents the first lock-free augmented balanced search tree supporting generic augmentation functions. The experiments show that the suggested augmented balanced tree completes updates 2.2 to 30 times faster than the unbalanced augmented tree and outperforms unaugmented trees by several orders of magnitude on 120 threads. https://lnkd.in/deMqun3m Η δράση υλοποιείται στο πλαίσιο του Εθνικού Σχεδίου Ανάκαμψης και Ανθεκτικότητας Ελλάδα 2.0 με τη χρηματοδότηση της Ευρωπαϊκής Ένωσης – NextGenerationEU The project is implemented under the National Recovery and Resilience Plan “Greece 2.0”, with funding from the European Union – NextGenerationEU

Concurrent Balanced Augmented Trees arxiv.org

To view or add a comment, sign in

More Relevant Posts

Mark Kovarski
1mo
Report this post
𝐑𝐞𝐭𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐌𝐞𝐦𝐨𝐫𝐲 𝐌𝐞𝐜𝐡𝐚𝐧𝐢𝐬𝐦𝐬 𝐨𝐟 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐀𝐠𝐞𝐧𝐭𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐞𝐜𝐨𝐧𝐝 𝐇𝐚𝐥𝐟: 𝐀 𝐒𝐮𝐫𝐯𝐞𝐲 This 83 page survey maps how “memory” is being designed and implemented in foundation-model agents covering architectures, memory mechanisms and evaluation. It reviews 218 papers (2023 Q1 → 2025 Q4) and shows research accelerating especially in 2025. The timing also matches a broader shift toward context engineering. Memory systems taxonomy suggested: 1️⃣ 𝐌𝐞𝐦𝐨𝐫𝐲 𝐬𝐮𝐛𝐬𝐭𝐫𝐚𝐭𝐞 (𝐰𝐡𝐞𝐫𝐞 𝐦𝐞𝐦𝐨𝐫𝐲 𝐥𝐢𝐯𝐞𝐬) ╰┈➤ Internal (weights + inference-time state). ╰┈➤ External (stores / DBs / logs). ╰┈➤ Hybrid (multi-store). 2️⃣ 𝐌𝐞𝐦𝐨𝐫𝐲 𝐂𝐨𝐠𝐧𝐢𝐭𝐢𝐯𝐞 𝐦𝐞𝐜𝐡𝐚𝐧𝐢𝐬𝐦 (𝐰𝐡𝐚𝐭 𝐦𝐞𝐦𝐨𝐫𝐲 𝐝𝐨𝐞𝐬) ╰┈➤ Episodic: past interactions / trajectories. ╰┈➤ Working: short-term state for ongoing reasoning. ╰┈➤ Semantic: facts / knowledge. ╰┈➤ Procedural: skills / routines (“how to do it”). ╰┈➤ Sensory: perception streams. 3️⃣ 𝐌𝐞𝐦𝐨𝐫𝐲 𝐬𝐮𝐛𝐣𝐞𝐜𝐭 (𝐰𝐡𝐨 𝐢𝐭’𝐬 𝐟𝐨𝐫) ╰┈➤ User-centric: preferences, identity, long-term goals, personalization. ╰┈➤ Agent-centric: skills, policies, task strategies, tool-use. 𝐈𝐟 𝐲𝐨𝐮’𝐫𝐞 𝐛𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐚𝐠𝐞𝐧𝐭𝐬: ╰┈➤ Design memory (not “just add RAG”). ╰┈➤ Memory techniques such as compression, selective retention, provenance, abstention. ╰┈➤ Evaluate latency, drift, updateability, failure. Paper: https://lnkd.in/gCXfJZrf Github: https://lnkd.in/gsdaBu7C

1 Comment
Like Comment
To view or add a comment, sign in
Franky Schaut
1mo Edited
Report this post
I’ve been watching the recurring “which LLM platform is best?” discussions. Most of them fall into three categories: -Benchmark comparisons -Brand loyalty -Ideological positioning What is usually missing is a more basic question: What kind of task are you asking the model to perform, and under what amplification tolerance? I’ve published a preprint examining this question under a declared boundary calculus: Constraint-Relative Platform Selection Under a Hardened Boundary Calculus https://lnkd.in/eKwm-6rz Instead of ranking models, the study fixes: -A constraint grammar (K0G) -A high-stress artifact -A predefined experiment contract Under identical declared conditions, platforms diverge in amplification–containment behaviour. Comparative ordering becomes weight-dependent. A relatively modest shift in declared operational emphasis is sufficient to invert ordering. The point is not superiority. The point is that platform selection functions as part of the reasoning act. In high-stress contexts, choosing a model without declaring the intended constraint profile embeds an implicit weighting into the outcome. That is not a political claim. It is an epistemic one. Tool selection should not be treated as brand preference or benchmark reflex. It should be treated as a declared methodological decision aligned to task class and amplification tolerance. Right tool. Right job. Explicitly declared. Open to critique and replication.

Constraint-Relative Platform Selection Under a Hardened Boundary Calculus zenodo.org
Like Comment
To view or add a comment, sign in
Robin Schmöcker
1mo
Report this post
I am very happy to announce that the paper "Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm" has been accepted for publication at the ICLR 2026! Big thanks to Alexander Dockhorn and B Rosenhahn for the close collaboration. A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. Our work directly builds on the ASAP framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In the paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We describe how this novel abstraction framework can be combined with MCTS to yield KVDA-UCT. https://lnkd.in/eCuMCX2a

Grouping Nodes with known Value Differences: A lossless UCT-based... openreview.net

1 Comment
Like Comment
To view or add a comment, sign in
Sneha Bhapkar
1mo
Report this post
Multimodal documents demand more than vector search. An open-source framework is rethinking RAG as a graph-native, multimodal system. Today’s knowledge artifacts combine text, tables, figures, equations, and visuals in a single document. Traditional RAG pipelines flatten everything into embeddings and lose structural relationships. RAG-Anything is a graph-centric, end-to-end multimodal RAG framework built on LightRAG. It unifies the entire workflow in one architecture: • Structured document parsing • Cross-modal content understanding • Multimodal knowledge graph construction • Context-aware, modality-sensitive retrieval Developed by the team at HKU Data Intelligence Lab. If you are building RAG systems on top of research papers, technical PDFs, or enterprise documents, this approach is worth examining. Repo link in the comments. Check my profile for more such resources!
1 Comment
Like Comment
To view or add a comment, sign in
Mickael Martin Nevot
1mo
Report this post
Journal article about ranking Skylines in databases, experimented with video game Pokémon Showdown! My latest paper published in Knowledge and Information Systems is about ranking Skyline in several approaches: dp-idp with dominance hierarchy, RankSky, an adaptation of Google’s well-known PageRank solution, CoSky, a TOPSIS-based method and DeepSky, which couples these to multilevel Skyline. My research is available on @ResearchGate: https://lnkd.in/evPyaE6P

Ranking methods for skyline queries: Ranking Methods for Skyline...M. M. Nevot, L. Lakhal | Request PDF researchgate.net

2 Comments
Like Comment
To view or add a comment, sign in
Jonathan Santos
1mo
Report this post
Everyone picked a side in the attention efficiency debate. Quadratic is accurate but slow. Linear is fast but lossy. Sparse is somewhere in between. MiniCPM-SALA just ended the debate by asking a different question: what if you don't have to pick? 9B parameters. 1M token context. 75% less training compute than dense attention. The architecture is a 1:3 mix of sparse and linear attention — ratio chosen by a principled layer selection algorithm, not hand-tuned. Sparse layers handle high-fidelity long-range recall. Linear layers handle global efficiency. A Hybrid Positional Encoding (HyPE) reconciles both positional assumptions in a single model. The key insight isn't the components. It's that different layers need different mechanisms. We've been arguing about what to replace attention with. SALA suggests the answer was: nothing. Just allocate it correctly. The open question: does the 1:3 ratio generalize across scales and tasks — or did they find the right ratio for 9B on specific benchmarks? That's the paper worth reading next.
Like Comment
To view or add a comment, sign in
Deepak Kumar Sahoo
1mo
Report this post
✅ Day 37/100 – Data Structures & Algorithms Consistency Challenge Not every problem searches inside an array. Sometimes, the search happens in the range of possible answers. That’s exactly what today’s problems explored. 📌 Today’s Topic: • Binary Search on the answer 🧩 Problems I Solved: 1. Find Square Root of a Number 2. Find Nth Root of a Number 📘 What I Learned: • Monotonic functions allow binary search even without arrays • Instead of iterating values, we search within a valid numerical range • Precision and termination conditions are critical in numerical problems 🌍 Real-World / Industry Perspective: This technique is widely used in systems that involve numerical approximation, such as optimization engines, scientific computing, and performance tuning, where exact answers are impractical and efficient estimation is required. ⏱️ Efficiency Note: Using Binary Search: • Square Root: O(log n) • Nth Root: O(log n × log m) depending on precision • Space Complexity: O(1) 💡 Key Insight: Binary search is not limited to data — it’s a strategy for narrowing uncertainty. Day 37 completed. Consistency continues. #DSA #100DaysOfCode #BinarySearch #LearningInPublic #ProblemSolving #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Er. Dhiraj Kumar
1mo Edited
Report this post
🔷 Algo Series #1 – Heap Sort & Heapify Sorting algorithms are fundamental to computer science, and understanding them deeply builds strong problem-solving foundations. In this post, I’ve broken down the logic behind Heap Sort and the crucial operation that powers it — Heapify. 🔹 1️⃣ Understanding Heaps A Heap is a complete binary tree that satisfies a structural and ordering property. ✔ Structural Property The tree must be complete (filled left to right). ✔ Heap Property Max Heap: Every parent node ≥ its children Min Heap: Every parent node ≤ its children In array representation (0-based indexing): Left child → 2i + 1 Right child → 2i + 2 Parent → (i - 1) / 2 This array mapping eliminates the need for explicit tree pointers. 🔹 2️⃣ Heapify – The Core Operation Heapify ensures the heap property is maintained. If a node violates the heap property: 1. Compare it with its children 2. Swap with the larger child (Max Heap) or smaller child (Min Heap) 3. Recursively fix the affected subtree ⏱ Time Complexity of Heapify → O(log n) (because height of a complete binary tree is log n) 🔹 3️⃣ Heap Sort – Step by Step Heap Sort uses a Max Heap to sort elements in ascending order. Step 1: Build Max Heap Convert the entire array into a max heap. ⏱ Time: O(n) (not O(n log n) — important insight) Step 2: Extract Maximum Repeatedly Swap root with last element Reduce heap size Heapify the root Repeat until heap size becomes 1. Each extraction takes O(log n) Total extractions → n ⏱ Overall Time Complexity: O(n log n) 🔹 4️⃣ Complexity Analysis Time Complexity Best CaseO(n log n) Average CaseO(n log n) Worst CaseO(n log n) Note: Unlike Quick Sort, Heap Sort does not depend on input order. It guarantees consistent performance. 📦 Space Complexity: O(1) (In-place) ⚠ Stability: Not Stable 5️⃣ Why Heap Sort Matters Guarantees worst-case performance Useful when predictable time complexity is required Foundation for Priority Queues and scheduling systems Heap structures are heavily used in: Dijkstra’s Algorithm Prim’s Algorithm Task scheduling Memory management systems #AlgoSeries #HeapSort #DataStructures
Like Comment
To view or add a comment, sign in
Devesh M.
1mo
Report this post
The industry is still stuck on "normal" Knowledge Graphs. Everyone is treating GraphRAG like the endgame for memory. But let’s be real: for most production agents, a knowledge graph is just a slightly better lookup table. It gives you relationships, but it doesn't give you reasoning over time. I’ve been diving deep into what comes next, and two papers just killed my weekend: H-MEM vs. A-MEM. These aren't just "better retrieval." They represent two completely different philosophies for building a synthetic brain. The "File System" Approach: H-MEM (Hierarchical Memory) The Architecture: A strict 4-layer vertical hierarchy (Domain → Category → Trace → Episode). The Alpha: It uses positional index routing. Instead of a global vector search (expensive/noisy), the agent drills down layer-by-layer. It prunes 90% of the search space before it even looks for a match. Why I like it: It’s deterministic. It feels like engineering, not magic. The "Zettelkasten" Approach: A-MEM (Agentic Memory) The Architecture: A self-organizing mesh of "atomic notes" that mimic human memory plasticity. The Alpha: Memory Evolution. When new info comes in, it doesn't just append a row. It updates old notes and rewires connections based on new context. The graph gets smarter the longer it runs. The Risk: The compute cost for "evolving" memory on every turn sounds painful for latency. The Verdict? I’m starting implementation on H-MEM this week. We need structure before we need evolution. GraphRAG is cool, but if you can't route through 1M tokens efficiently, your graph is just a map to nowhere. Has anyone successfully shipped a hierarchical memory system yet? Or are we all just waiting for the next RAG wrapper?
1 Comment
Like Comment
To view or add a comment, sign in
Deepak Chourasiya
1mo
Report this post
POTD #3379 - Transformed Array (05 February 2026) Question: Given a circular array, for each index move left or right by nums[i] steps and record the value you land on. Approaches: Bruteforce simulation by stepping one move at a time. Naive index shifting without proper wrap-around. Direct index computation with modular arithmetic. Why naive ideas fail: Step-by-step simulation is unnecessary and noisy. Ignoring negative modulo leads to invalid indices. Manual wrap logic breaks on edge cases. Constraints guide the solution: n <= 100, steps bounded. Circular structure suggests modulo, not iteration. Intuition: Each index maps independently. Movement is deterministic: i → i + nums[i]. Circularity is pure math, not traversal. Exact approach: For each i, compute target = i + nums[i] Normalize index with modulo to stay in bounds Read value at the final index Zero maps to itself Why it works: Modulo correctly models circular movement. No state, no dependencies, no revisits. Pattern used: Circular array + modular index normalization. Complexity: Time: O(n) Space: O(n) #Algorithms #ProblemSolving #Coding
Like Comment
To view or add a comment, sign in

HAR.S.H. (Hardware-Aware extReme-scale Similarity search)

61 followers

View Profile Follow

HAR.S.H. (Hardware-Aware extReme-scale Similarity search)’s Post

More Relevant Posts

Explore content categories