Designing reward models for domain-specific AI in finance

This title was summarized by AI from the post below.

7mo

Recently was working for designing 𝗥𝗲𝘄𝗮𝗿𝗱 𝗺𝗼𝗱𝗲𝗹𝘀 𝗳𝗼𝗿 𝗮 𝗳𝗶𝗻𝗮𝗻𝗰𝗲 𝗱𝗼𝗺𝗮𝗶𝗻 where internally they have a lot of domain specific tool-calling which the general LLMs is not aware of. Finetuned a model to do domain specific tool calling and the most critical part in that was the designing of reward model for the use on which I did the GRPO. Their internal tools for risk assessment, compliance checks, and portfolio analysis weren't something GPT-4.1 or Sonnet could just magically understand. Precision was an important KPI. What I learned the hard way: 1. Format rewards (0/1) for API compliance were non-negotiable 2. Correctness rewards (-3 to +3) needed heavy weighting for critical financial tools 3. Real trader feedback mattered more than theoretical metrics The major learning came when I stopped treating all tools equally. A typo in a market data call? Minor penalty. Wrong parameters in a regulatory filing tool? Maximum negative reward. After GRPO training, our model went from 60% accuracy to 94% on domain-specific tool calls. More importantly, the compliance team actually trusts it now. I have written a blog post on how someone should approach on designing a reward model for such use cases 👇 #FinTech #AI #MachineLearning #RewardModeling #GRPO #DomainSpecificAI #FinanceAI

3 Comments

Subham Kundu 7mo

Link to the blog: https://cenrax.substack.com/p/designing-reward-models-for-enterprise

1 Reaction

Jitin Kapila 7mo

This is interesting. How did you stumble upon this idea?

See more comments

To view or add a comment, sign in

More Relevant Posts

Singri Goutham
4mo Edited
Report this post
What if your #RAG system could understand the whole story, not just one paragraph at a time? That’s the idea behind #LateChunking by #JinaAI — a clever twist on how we process and #embed long documents. Most Retrieval-Augmented Generation (RAG) pipelines face a big challenge: You either chunk first (but #context-blind) or process the full document. Let’s say you’re building a banking knowledge assistant. A customer asks: “How are late payment fees decided, and do they affect my credit score?” With naive #chunking, you split it into small pieces (say 500 tokens each). the assistant might only read the late fee section and miss that the credit score impact is explained elsewhere — leading to an incomplete or misleading answer. With Late Chunking, you embed the full document first (capturing all relationships), and then split embeddings into smaller chunks after Inference. The system understands the full relationship: Late fees are determined by the issuer’s policy and unpaid balance. The credit score isn’t a factor in setting the fee — but it is affected by late payments. Because embeddings were created for the full document before splitting, the assistant can connect those details across chunks. #GenerativeAI #LLMs #VectorSearch #AIEngineering
Like Comment
To view or add a comment, sign in
Hassan Ahmed Hassan, PhD, Architecting the Future of AI-Driven Finance
5mo
Report this post
TradingGroup: A breakthrough in AI for quantitative finance This new framework goes beyond static models by integrating a team of LLM agents that collaborate on market analysis and trade execution. Its innovations include self-reflection, where agents learn from past wins and losses, and a data-synthesis pipeline that generates its own training data for continuous fine-tuning. In backtesting, TradingGroup achieved a 40.46% cumulative return on AMZN, compared to 13.27% from the best baseline. This is a step toward dynamic, self-improving AI systems that learn directly from their own trading experience. https://lnkd.in/ddZM6RXf #AIinFinance #QuantitativeTrading #LLM #FinTech #MultiAgentSystems #MachineLearning

TradingGroup: A Multi-Agent Trading System with Self-Reflection and Data-Synthesis arxiv.org
Like Comment
To view or add a comment, sign in
Mohammad Y Moghadam
4mo Edited
Report this post
💳 Credit Risk AI — A smarter way to assess financial trustworthiness. I’ve been working on a modular, automated credit scoring system designed for real-world banking environments. Built with XGBoost, Streamlit, and GitHub Actions, this project reflects my focus on scalable design, secure workflows, and professional presentation. Key features: • Streamlit dashboard for client onboarding and risk prediction • XGBoost model trained on synthetic financial data • Daily GitHub activity automation with natural commit messages • Modular structure ready for API integration and CI/CD • Designed with standards aligned to U.S. and Australian financial systems This project is more than a demo — it’s a showcase of how AI, automation, and clean design can come together to solve meaningful problems in finance. Explore the repo here: [https://lnkd.in/eGYfG8ZE] (https://lnkd.in/eGYfG8ZE) Feedback, collaboration, and ideas are always welcome!
Like Comment
To view or add a comment, sign in
Stone Capital

275 followers
4mo
Report this post
In the early 2000s, thanks to fairly fast data connections, the use of #algorithms became widespread on major #tradingfloors, in alternative funds, and even in specialized execution teams. To meet the challenge, ultra-fast connections to the exchange, fast computing power, and lots of #VBA, C++, or #Matlab code capable of replacing humans in "fair value" analysis and price contribution were needed. If you wander around trading floors now, you'll notice that many of them are stuck 20 years ago. Aside from the significantly larger and thinner monitors, they still work with Excel spreadsheets connected to #Bloomberg and algorithms that are certainly refined, but remarkably similar to the ones I used 20 years ago. There's still no awareness that #AI will overturn these rules because: 1) Codes are updated on a daily, not annual, basis, and #python is preferred over other platforms. 2) PCs are often capable of making better decisions than traders and financial advisors, enabling the creation of strategies that truly produce stable, repeatable results over time, uninfluenced by personal views. 3) The production of return and risk analysis reports is increasingly accurate and objective, and in some cases can forecast the future trends by analyzing gigabytes of data and finding correlations that standard Excel spreadsheets would never detect. We need to keep up with the times, and we are trying to do just that. https://lnkd.in/dcQktPKq

Advisory | Machine Learning | Risk Analysis | Financial Modelling | STONE CAPITAL stone-capital.net
Like Comment
To view or add a comment, sign in
Swaraj Dash
4mo
Report this post
🔍 Difference Between Regulatory Modelling and Machine Learning Modelling in Credit Risk In today’s risk analytics landscape, the terms “model development” and “machine learning” are often used interchangeably — but in credit risk, they serve very different purposes. Understanding this difference isn’t just academic — it’s fundamental to building models that are both regulatorily compliant and analytically powerful. 🧾 1️⃣ Regulatory Modelling (IFRS-9 / Basel / IRB Models) Regulatory models are designed with one primary goal — transparency and interpretability. They aim to explain why a borrower defaults, not just who defaults. ✅ Built using statistical techniques like logistic regression. ✅ Variables are pre-selected based on business logic and economic sense. ✅ Models are validated, stress-tested, and often subject to regulatory scrutiny. ✅ Every transformation, from WOE binning to PD calibration, must be explainable to an auditor or the RBI/ECB. 📘 Example: IFRS-9 PD models for ECL estimation or IRB models under Basel III. 🤖 2️⃣ Machine Learning Modelling Machine learning models (Random Forests, XGBoost, Neural Nets, etc.) focus on predictive power — the ability to detect complex, non-linear relationships. ✅ Excellent for risk segmentation, challenger models, and collections prioritization. ✅ Can uncover hidden interactions that traditional models miss. ❌ But often black-box in nature, difficult to interpret, and not always regulator-friendly. 📘 Example: Using Gradient Boosted Trees to predict early delinquency or prepayment risk. ⚖️ 3️⃣ Why Understanding the Difference Matters - A model that’s statistically strong but fails regulatory interpretability won’t pass validation. - A regulatory model that’s too simplistic may not capture portfolio risk under new data regimes. - Modern credit risk teams must balance compliance with innovation — using ML for insight generation, but relying on transparent frameworks for capital and provisioning decisions. 🔹 In summary: - Regulatory models explain risk. - Machine learning models predict risk. -A good risk professional knows when to use which — and how to blend both responsibly. 💬 Would love to hear how your institution approaches this balance — do you see ML models being used beyond challenger roles yet? #CreditRisk #IFRS9 #BaselIII #ModelValidation #MachineLearning #ECL #RiskAnalytics #BankingTransformation

1 Comment
Like Comment
To view or add a comment, sign in
Petr Podhajsky
4mo
Report this post
LLMs won’t build perfect trading systems for you. But they can evaluate and amplify your own know-how — if you think ahead and duplicate your brain into a “second brain.” Most traders want AI to replace their thinking. The real opportunity is to externalize it. Use tools like Obsidian and the Zettelkasten method to capture every hypothesis, test, and lesson. Link them. Tag them. Let your ideas talk to each other. Then connect that knowledge base to a local LLM. The model won’t magically invent alpha — but it will start to understand how you think, what you trade, what you’ve tested, and what tends to fail. It becomes an intelligent mirror that can: 🔹 Surface hidden links between your past insights 🔹 Expose redundant or overlapping ideas 🔹 Suggest new tests consistent with your own trading logic That’s synthetic intuition — your real cognitive edge. I use Obsidian with a Zettelkasten-style note system, where every idea, test, and post-mortem becomes a node in a living network of trading knowledge. Why it matters: 1️⃣ Our edge used to come from data access. Now the real constraint is cognitive bandwidth — how quickly we can recall and combine what we already know. 2️⃣ Most traders hoard information. PDFs, screenshots, messy Notion pages... that’s a graveyard of ideas. Real alpha comes from linking insights across timeframes, systems, and markets. 3️⃣ Keep it light, not rigid. Zettelkasten-style atomic notes + minimal metadata (e.g. #alpha_fragment, #failure, #momentum) beat over-engineered databases every time. 4️⃣ Integrate the LLM locally. When your private notes are indexed by your own model, it becomes a mirror of your reasoning — surfacing forgotten ideas, correlations, or recurring failure modes. 5️⃣ This isn’t automation — it’s amplification. Over time, your second brain develops synthetic intuition — a living extension of your own thought process. The future of trading isn’t man vs. machine. It’s man + machine, sharing the same brain.
8 Comments
Like Comment
To view or add a comment, sign in
Armin Gerami
4mo
Report this post
Thrilled that our paper has been accepted to #NeurIPS2025 in the Generative AI in Finance workshop! We found that financial Transformers have an under-explored bias: they disregard data volatility and favor assets with lower-frequency price movements. To show this, we introduce a new auditing metric based on Partial Information Decomposition. Check out the work here: https://lnkd.in/eEqDvBmM

Auditing Algorithmic Bias in Transformer-Based Trading arxiv.org

3 Comments
Like Comment
To view or add a comment, sign in
Barnie Ollerhead, DipPFS
4mo
Report this post
When OpenAI starts targeting the work of junior bankers, the wider industry should be taking notice. Because if the analytical backbone of investment banking can be automated, the same is almost certainly coming for wealth management. Over the next few years, many of the foundational tasks in advice - cashflow modelling, plan construction, client write-ups etc. - will likely be handled faster and more accurately by AI systems. Given OpenAI’s most recent move, this doesn’t feel speculative. Is the end for advisers? 𝐀𝐛𝐬𝐨𝐥𝐮𝐭𝐞𝐥𝐲 𝐧𝐨𝐭. Robo-advisers have been a displacement risk for more than a decade, yet here we are. Because advice isn’t just about calculations - it’s about understanding the deeply human skills that algorithms can’t replicate. By handling the technical heavy lifting, advisers gain time to focus on relationships - delivering higher-quality advice to more clients, without diluting personal connection. And beyond advice, AI is unlocking something new: 𝐦𝐚𝐬𝐬-𝐬𝐜𝐚𝐥𝐞 𝐟𝐢𝐧𝐚𝐧𝐜𝐢𝐚𝐥 𝐥𝐢𝐭𝐞𝐫𝐚𝐜𝐲. For millions priced out of the advice market, AI becomes the bridge - turning data into tailored insights that help people make smarter decisions, quickly and affordably. That’s not a threat to advisers. It’s the key to closing the advice gap at scale. 𝐓𝐡𝐞 𝐟𝐢𝐫𝐦𝐬 𝐭𝐡𝐚𝐭 𝐰𝐢𝐧 𝐰𝐨𝐧’𝐭 𝐛𝐞 𝐭𝐡𝐨𝐬𝐞 𝐰𝐚𝐢𝐭𝐢𝐧𝐠 𝐟𝐨𝐫 𝐀𝐈 𝐭𝐨 𝐦𝐚𝐭𝐮𝐫𝐞. 𝐓𝐡𝐞𝐲’𝐥𝐥 𝐛𝐞 𝐭𝐡𝐞 𝐨𝐧𝐞𝐬 𝐮𝐬𝐢𝐧𝐠 𝐢𝐭 𝐭𝐨𝐝𝐚𝐲 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐚 𝐦𝐨𝐫𝐞 𝐬𝐜𝐚𝐥𝐚𝐛𝐥𝐞, 𝐡𝐮𝐦𝐚𝐧, 𝐚𝐧𝐝 𝐫𝐞𝐬𝐢𝐥𝐢𝐞𝐧𝐭 𝐦𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐝𝐞𝐜𝐚𝐝𝐞 𝐚𝐡𝐞𝐚𝐝.
Like Comment
To view or add a comment, sign in
Piyush Kataria
4mo
Report this post
How We Taught Our Trading Bot to Be Honest 🤖 When I first built my ML trading bot, I was fooled by a 97% accuracy score! It looked like a genius, but it was really just predicting what the market usually does (going up). It couldn't spot a good entry point to save its life. Our Idea: We forced the bot to be honest. We changed the way it learned, making it check two scenarios for every historical candle: "Would a BUY trade have won?" and "Would a SELL trade have won?" The Benefit: We fixed the Data Bias problem. The bot finally learned to identify a true high-probability entry, not just a market trend, giving us a genuine trading edge. #MLTrading #DataScience #AlgoTrading #FixingTheData
Like Comment
To view or add a comment, sign in
Sawan Parihar
4mo
Report this post
Exploring the potential of 1M+ token LLMs with native retrieval for financial text analysis. These models can process full 10-K filings in one go, enabling rapid responses to complex Q&A with reduced context fragmentation. Leveraging attention mechanisms for on-the-fly search-and-synthesis, show promising 20-30% F1 gains over traditional FAISS/HNSW vector DBs in LongBench/RULER benchmarks for long-context QA. As the field evolves, reliance on external chunking, embedding stores, or multi-hop RAG pipelines may shift. Prompt-driven workflows could streamline deployments while maintaining accuracy and compliance. Impact?Evaluation harnesses will need to adapt, focusing on end-to-end synthesis (needle-in-haystack tests), finance-specific benchmarks, and robust adversarial testing for regulatory alignment. #AIinFinance #LLM #FinTech #AzureAI #CopilotStudio
Like Comment
To view or add a comment, sign in

17,050 followers

View Profile Follow

Designing reward models for domain-specific AI in finance

More from this author

From Brain to Bot: How the AMCC Could Revolutionise LLM Training

Explore content categories

Designing reward models for domain-specific AI in finance

More Relevant Posts

More from this author

From Brain to Bot: How the AMCC Could Revolutionise LLM Training

Explore related topics

Explore content categories