Hazem Abdelazim’s Post

10mo

Currently working on a challenging project , "AI - Judge Assistant" Focusing on Arabic AI reasoning , Whenever a new "buzz" on a release of a new LLM I don't really get dragged by the public benchmarks . I compared DeepSeek R1 , with the new openAI o3-mini on some complex Arabic Judicial Agentic Tasks . Both are equally good on reasoning and analysis , with a slight plus for o3-mini on cultural alignment and more verbose . o3-mini was much faster in inference compared to DeepSeek R1 on nvidia API platform https://lnkd.in/d-9Bn3-s https://lnkd.in/dg9EdzyS

deepseek-r1 Model by Deepseek-ai | NVIDIA NIM build.nvidia.com

5 Comments

Hanan Tabak

10mo

Thanks, Dr. Hazem for this very helpful note. Any idea how you were able to create Deepseek R1 key? The service is currently down for me.

1 Reaction

Kirolos Farouk

10mo

I think that the”buzz” behind deepseek is because introduced a new path for the AI game , before as we were discussing in your lectures , the problem was that chatgpt was trained on almost all of the data in the world so the we thought that the progression might hit a plateau but what i find truly fascinating about deepseek is that they introduced a whole new approach which is offering a high performance model with substantial lower cost, encouraging the increase of commercial use of AI models. I dont think it would have made that buzz if it wasn’t substantially cheaper

2 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Walter Lee
1mo
Report this post
Exciting news from PyTorch Conference 2025! The latest launches are set to redefine AI development: • Monarch: A single-controller orchestration engine for streamlined distributed workflows. • Forge: Native PyTorch reinforcement learning for seamless RL integration. • TorchComms: A modern comms API to supercharge PyTorch communication layers. • Helion: A higher-level kernel authoring DSL for simplified GPU programming. • ExecuTorch GA: General availability of PyTorch at the Edge, powering efficient on-device AI. These innovations promise enhanced scalability, performance, and ease of use—marking a big leap for the PyTorch ecosystem! #PyTorch #AI2025 #TechLaunch
Like Comment
To view or add a comment, sign in
Rawan Mortada
1mo Edited
Report this post
For years, individuals have relied on "fine-tuning" as the primary approach to improve the performance of large language models. This entails providing the model with fresh data, modifying several weights, and listening to the GPU fans roar, all in anticipation of enhancing its performance next time. However, Stanford recently released a paper that essentially claims "There's no need for retraining whatsoever." The idea is referred to as Agentic Context Engineering, or just ACE, and it’s ingenious. Rather than modifying model weights, ACE enhances the context surrounding the model ,referring to it as the prompt. Thus, rather than retraining, the model just modifies its own playbook as it advances

1 Comment
Like Comment
To view or add a comment, sign in
汪志鹏
2mo
Report this post
Very impressive blog, the non-determinism seen in LLM online inference is not a flaw in the inference engine, but stems from how GPU kernels are implemented for performance optimization. It's impressive that such simple experiments resulted in such significant findings. I've noticed that these blog posts by thinkingmachines.ai are of very high quality. https://lnkd.in/gdqe-3NB

Defeating Nondeterminism in LLM Inference thinkingmachines.ai
Like Comment
To view or add a comment, sign in
Nabeel Ali
1mo Edited
Report this post
More compute doesn't always mean smarter AI. We're seeing models hit performance plateaus because they often don't explore enough different ways to solve a problem during training. A new paper on a framework called DeepSearch offers a fascinating fix: instead of just learning from one answer at a time, it teaches the model to explore a whole tree of possibilities while it's still learning. The result is the real eye-opener. It not only set a new state-of-the-art for models its size but did it using almost 6 times less compute. It’s a powerful reminder that smarter algorithms can beat brute-force scaling. This is a hopeful sign that we can build better AI by being more creative, not just by using more power. A cool read for anyone in the AI space. You can find the model on Hugging Face: [https://lnkd.in/gQp87fdA]. #AI #MachineLearning #LLM #Efficiency

fangwu97/DeepSearch-1.5B · Hugging Face huggingface.co
Like Comment
To view or add a comment, sign in
Mohammad Amin Dadgar
2mo
Report this post
🚀 In the 34th AI Talks, we dived into the challenges of deploying Large Language Models (LLMs) with Farbod Bijary. From GPU optimizations and parallelism to navigating Persian LLM leaderboards, the session shared practical insights on making LLMs faster, smarter, and ready for real-world use. 💡✨ Key takeaways: ⚡️ Deploying LLMs – CPU vs GPU trade-offs, costs, and setup choices 📊 Persian LLM benchmarks – when to rely on them (and when not to) 🏎️ Speed & optimization – quantization, Torch Compile, 5D parallelism 🎯 Practical deployment – balancing quality, speed, and user needs
4 Comments
Like Comment
To view or add a comment, sign in
Farbod Bijary
2mo
Report this post
This week I had the opportunity to make a presentation on the challenges of deploying LLMs in the 34th AI Talks event. I demonstrated how the performance of LLMs is evaluated on the Persian language and how the latency and throughput of these models can be optimized. Glad to have connected with this fun and technical community. 🫡 P.S. Here's the link of my presentation if you're interested: https://lnkd.in/ddarP33S
Mohammad Amin Dadgar

AI Engineer at TogetherCrew
2mo

🚀 In the 34th AI Talks, we dived into the challenges of deploying Large Language Models (LLMs) with Farbod Bijary. From GPU optimizations and parallelism to navigating Persian LLM leaderboards, the session shared practical insights on making LLMs faster, smarter, and ready for real-world use. 💡✨ Key takeaways: ⚡️ Deploying LLMs – CPU vs GPU trade-offs, costs, and setup choices 📊 Persian LLM benchmarks – when to rely on them (and when not to) 🏎️ Speed & optimization – quantization, Torch Compile, 5D parallelism 🎯 Practical deployment – balancing quality, speed, and user needs
Like Comment
To view or add a comment, sign in
Byte Goose AI

58 followers
1mo
Report this post
GGML, a C library - machine learning Tensor Library. GGUF and Quantization for Edge LLM model Inference. The tutorial provides an extensive overview of GGML, a C library specifically designed for the efficient inference of large language models (LLMs) on consumer-grade hardware, particularly through CPU optimization and broad hardware compatibility. A core feature of GGML is its support for quantization, a process that reduces model precision to significantly decrease size and memory consumption, thereby enabling "inference at the edge." The sources explain that GGML operates using tensors and a computational graph for defining and executing operations, and they also introduce GGUF, a successor binary format optimized for storing models for GGML-based executors. Finally, the texts highlight the minimalism, portability, and community-driven development of GGML, demonstrating its use through low-level C code examples for matrix multiplication and neural network inference. https://lnkd.in/geDYDxZC

GGML - machine learning Tensor Library. GGUF and Quantization for Edge LLM model Inference.

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Vansh Agrawal
1mo
Report this post
A 300M-parameter language model just learned to predict GPU latency, memory usage, and model accuracy by reading code as plain text. No feature engineering. No graph encoders. Just pure text-to-number prediction with 0.93 correlation on memory benchmarks. Performance optimization is getting a major upgrade. Read how: https://lnkd.in/gM24AWRb #MachineLearning #ArtificialIntelligence #DeepLearning #MLOps #GPUComputing #CodeOptimization #NeuralNetworks #PerformanceEngineering #TechInnovation #AIResearch

Can a Small Language Model Predict Kernel Latency, Memory, and Model Accuracy from Code? medium.com
Like Comment
To view or add a comment, sign in
Michael Listrom
1mo Edited
Report this post
Libraries: Eigen, Boost, or custom GPU kernels support quaternion math. For large systems, storage scales linearly with quaternions per site but computations scale slightly worse due to non-commutativity. When simulating quaternionic Schrödinger equations, careful attention must be paid to order of multiplication in updates. Drawing: https://lnkd.in/dhdHSKqn

ChatGPT chatgpt.com
Like Comment
To view or add a comment, sign in
Roland Okeke
1mo Edited
Report this post
Ten years ago, you’d need an entire research team and large budget to build an AI that can recognize different types of cats, today, we have AI systems that can generate new realities within seconds. it's really exciting what one can do with just a few lines of code For example, I fine-tuned the small ResNet18 image model on CPU and achieved 0.000013% validation loss(tiny data set) with 100% prediction accuracy so far. The core model was trained in a Jupyter Notebook using fastai, pickled and deployed to huggingface space with gradio, all in under 60 lines of code. #fastai #finetuning #AI

3 Comments
Like Comment
To view or add a comment, sign in

6,955 followers

View Profile Follow

Hazem Abdelazim’s Post

More from this author

Enterprise Voice Agents: Technology, Market, and Middle East Focus (2023–2025)

Developments in Text-to-Speech Technology (2020–2025)- Special Focus on Arabic language

AI-Generated Podcast Technology: Pipeline and State-of-the-Art (2018–2025)

Explore content categories

Hazem Abdelazim’s Post

More Relevant Posts

GGML - machine learning Tensor Library. GGUF and Quantization for Edge LLM model Inference.

https://www.youtube.com/

More from this author

Enterprise Voice Agents: Technology, Market, and Middle East Focus (2023–2025)

Developments in Text-to-Speech Technology (2020–2025)- Special Focus on Arabic language

AI-Generated Podcast Technology: Pipeline and State-of-the-Art (2018–2025)

Explore related topics

Explore content categories