Finitum is a modern, extensible personal finance manager that automatically parses your bank and email notifications to track expenses, transfers, and financial activity. It features a web dashboard, category detection, and secure OAuth integration with Google.
- Automatic Email Parsing: Connect your Gmail account and Finitum will extract expenses and transfers from bank notifications.
- Category Detection: Uses robust keyword matching and normalization to classify transactions into categories (Food, Transport, Online, etc.).
- Dashboard: Interactive Dash/Plotly dashboard for visualizing spending trends, top merchants, category breakdowns, and more.
- User Authentication: Secure signup/signin with JWT-based authentication.
- Google OAuth2 Integration: Securely authorize Gmail access using OAuth2.
- Celery Task Queue: Asynchronous background tasks for fetching and parsing emails.
- PostgreSQL Database: Stores users, transactions, and credentials securely.
- Extensible: Modular parser and category system for easy adaptation to new banks or notification formats.
The recommended way to run Finitum is via Docker Compose. No local Python or database setup required.
Copy .env.example to .env and fill in the required values:
CONN_STR=postgresql+psycopg://finitum:yourpassword@db:5432/finitum
PGUSER=finitum
PGPASSWORD=yourpassword
PGDATABASE=finitum
REDIS_URL=redis://redis:6379/0
GOOGLE_CLIENT=your-google-client-id
GOOGLE_SECRET=your-google-client-secret
GOOGLE_REDIRECT_URI=http://localhost:9090/google_oauth2callback
SECRET_KEY=your-jwt-secret
TZ=America/Santiago
DEBUG=true- You must create Google OAuth credentials for Gmail API access: Google Cloud Console – Credentials
docker compose up --buildThis will start:
- API server (FastAPI, port 9090)
- Dashboard (Dash/Plotly, port 5050)
- Celery worker (background email parsing)
- Celery beat (periodic tasks)
- PostgreSQL (database, port 5444)
- Redis (cache, port 6379)
- Alembic (runs DB migrations on first up)
- Web Dashboard & API: http://localhost:9090
- API Docs: http://localhost:9090/docs
- Sign Up: Register a user via
/signupendpoint or UI. - Google OAuth: Visit
/google-authorizeto link your Gmail account. - Fetch Emails: Celery tasks will periodically fetch and parse new emails.
- Dashboard: Open the dashboard at http://localhost:5050 to view your financial analytics.
src/
api/ # FastAPI server, JWT auth, endpoints
db/ # SQLModel models, DB service
email_service/ # Gmail API integration, message parsing
oauth_service/ # Google OAuth2 logic
parsers/ # Expense/transference parsing, category logic
tasks/ # Celery tasks for background processing
ui/ # Dash/Plotly dashboard
utils/ # Config, logging, helpers
alembic/ # Database migrations
tests/ # Unit and integration tests
Finitum uses regex patterns to parse bank notification emails. Each bank has its own patterns defined in src/parsers/regex.json.
Get raw email samples from your bank for:
- Purchase notifications
- ATM withdrawals
- Transfer notifications
Use "Show Original" in Gmail to get the actual email source, including encoding artifacts. Clean the text—the parser needs to handle the clean format.
Use this function to clean the html from email text.
from bs4 import BeautifulSoup
def remove_html_tags(html_doc: str) -> str:
soup = BeautifulSoup(html_doc, "html.parser")
return soup.get_text()Create a directory under tests/banks/ for your bank:
mkdir -p tests/banks/your_bank_nameAdd test files for each transaction type:
purchase_clp.txt- Purchase notification bodypurchase_subject.txt- Purchase email subjectwithdrawal.txt- ATM withdrawal bodywithdrawal_subject.txt- ATM withdrawal subjecttransference.txt- Transfer notification bodytransference_subject.txt- Transfer email subject
Add your bank's patterns to src/parsers/regex.json:
{
"your_bank_name": {
"subject": {
"exclusions": ["internal transfer", "between accounts"],
"purchase": "purchase confirmation",
"withdrawal": "atm withdrawal",
"transference": "transfer notification"
},
"body": {
"amountPurchase": "Amount\\s+\\$([\\d.,]+)",
"amountTransference": "Transfer\\s+amount\\s+\\$([\\d.,]+)",
"amountWithdrawal": "Withdrawal\\s+\\$([\\d.,]+)",
"date": "\\d{1,2}[-/]\\d{1,2}[-/]\\d{4}",
"commerce": "Merchant:\\s+(.+?)\\s+Date",
"transferenceRecipient": "Recipient:\\s+(.+?)\\s+Account"
}
}
}Pattern Guidelines:
- Subject patterns: Lowercase text fragments to match in email subjects
- Exclusions: Subject patterns to ignore (e.g., internal transfers)
- Body patterns: Regex to capture specific fields from email body
- Use
\\s+for flexible whitespace matching - Capture groups
(...)extract the actual value - Handle encoded characters if present (e.g.,
=C3=B1forñ) - Account for separators like
>, tabs (=09), or HTML artifacts
Common Pitfalls:
- Emails often have quoted-printable encoding (
=C3=A9for accented chars) - HTML emails get stripped but may have odd spacing artifacts
- Line breaks might be
>or\ndepending on email format - Currency amounts may use
.or,as thousands separators
Add parametrized test cases in tests/test_parse.py:
@pytest.mark.parametrize(
("bank", "amount", "commerce", "cat"),
[
("banco_chile", 38844, "STA ISABEL JM CAR", ExpenseCategory.FOOD),
("santander", 68885, "Entel pcs", ExpenseCategory.SERVICES),
("your_bank_name", 15000, "COFFEE SHOP", ExpenseCategory.FOOD),
],
)
def test_amount_data_clp(bank, amount, commerce, cat):
# Test implementation
...pytest tests/test_parse.py -vFix regex patterns until all tests pass. Common issues:
- Commerce not captured: Adjust the pattern to match the exact structure
- Amount wrong format: Handle thousands separators (
.vs,) - Date format mismatch: Update date regex for DD/MM/YYYY vs DD-MM-YYYY
- Recipient name issues: Account for ALL CAPS vs Mixed Case
If regex doesn't match, dump the actual string being parsed:
with open(f"tests/banks/{bank}/purchase_clp.txt", "r") as f:
content = f.read()
print(repr(content)) # Shows exact bytes, whitespace, encodingCompare the repr() output against your regex pattern character by character.
Categories are defined in src/parsers/categories.json and mapped in src/parsers/base.py.
Add or modify categories in src/parsers/categories.json:
{
"food": ["grocery", "restaurant", "cafe", "market"],
"transport": ["uber", "taxi", "metro", "gas station"],
"your_new_category": ["keyword1", "keyword2", "keyword3"]
}Add your category to src/parsers/base.py:
class ExpenseCategory(str, Enum):
FOOD = "food"
TRANSPORT = "transport"
YOUR_NEW_CATEGORY = "your_new_category"Categories are detected by matching merchant names against keywords. Test with:
from parsers.expense import Expense
expense = Expense("100", "COFFEE SHOP", datetime.now())
assert expense.category == ExpenseCategory.FOODCommon problems and solutions:
Emails not being fetched:
- Check Celery logs:
docker compose logs celery_worker - Verify Gmail API scope includes read access
- Check Google OAuth token hasn't expired
Regex not matching:
- Email body might have HTML artifacts—check
remove_html_tags()function - Use
repr()to see exact string format including hidden characters - Test regex patterns at regex101.com
Wrong amounts extracted:
- Check for multiple amounts in email (bill amount vs. paid amount)
- Verify thousands separator handling (
.vs,) - Some banks show amounts in multiple places—ensure you're capturing the right one
Categories wrong:
- Add more specific keywords to
categories.json - Keywords are matched case-insensitive and after normalization
- Check keyword priority—more specific terms should come first
- Get raw email sample (Show Original in Gmail)
- Create test files with exact email content
- Write regex patterns in
regex.json - Run tests and iterate on patterns
- Debug with
repr()to see exact format - Validate in production with small date range first
- Celery tasks fetch emails in batches (configurable in
tasks/email.py) - Rate limits apply to Gmail API (quota: 1 billion requests/day)
- Database queries are optimized with proper indexes
- Dashboard caches data for 5 minutes to reduce DB load
- No email content is saved—only extracted transaction data
- All sensitive tokens and credentials are stored securely in the database
- OAuth2 state is managed with Redis for CSRF protection
- Passwords are hashed using strong algorithms
- JWT tokens are used for authentication
MIT License