LiveTranscript Reporter

Upload audio → Transcribe & Analyze → Generate Structured Conversation Report.

0. Quick Start

# Backend
cd backend
npm install && npm run dev

# Frontend (in separate terminal)
cd frontend
npm install && npm run dev

Visit: http://localhost:3000 (dashboard, jobs, upload) /documentation for in‑app docs.

Environment variables: see Environment

1. Purpose

End‑to‑end system to ingest an audio file, transcribe it with speaker diarization, run AI analysis, and surface a professional structured report containing:

Conversation Title
Executive Summary
Key Discussion Points
Action Items & Next Steps
Full Diarized Transcript

2. High‑Level Architecture

Client (Next.js App)
	│
	├── Auth (Supabase Auth)
	│
	├── Upload Audio  ──► Backend (Express API)
	│                         │
	│                         ├── Store raw file (filesystem or Supabase Storage)
	│                         ├── Create audio_jobs row (status=uploaded)
	│                         ├── Orchestration Service
	│                         │     1. Transcribe (AssemblyAI)
	│                         │     2. AI Analysis + Summary (Gemini)
	│                         │     3. Build report_data JSON
	│                         │     4. Update audio_jobs (status=completed)
	│                         │
	│                         └── Persist structured results
	│
	└── Poll / Fetch Job & Report Data ► Render Minimal Report UI

3. Technology Stack

Layer	Tech
Frontend	Next.js (App Router), TypeScript, Tailwind (utility classes)
Auth	Supabase Auth
Backend	Node.js + Express
DB	Postgres (Supabase)
Transcription	AssemblyAI (speaker diarization)
AI Analysis	Google Gemini
Storage	Local `uploads/` (swappable)

3.1 Why These Choices

Split Frontend / Backend: Easier to scale CPU/IO heavy transcription & AI steps separately from edge‑optimized UI.
Orchestration Service: Central point for status transitions, retries, and normalization.
Denormalized report_data: One read = full report; UI stays stateless & render‑only.
LLM Normalization Layer: Shields UI from prompt/format drift; future model swaps require minimal changes.

4. Data Model (Key Fields)

audio_jobs core columns: id, user_id, filename, status, transcript_data, analysis_data, report_data, error_message, processing_started_at, processing_completed_at, created_at, updated_at plus optional telemetry (duration, confidence, speakers, etc.).

4.1 `report_data` Shape

{
	"title": "Conversation Analysis Report",
	"executive_summary": "3–4 sentence narrative...",
	"key_points": ["Point 1", "Point 2"],
	"action_items": [
		{ "task": "Do X", "assignee": "Alice", "priority": "high", "due_date": "2025-10-05" }
	],
	"full_transcript": [ { "speaker": "A", "start": 0, "end": 3200, "text": "Hello", "confidence": 0.93 } ],
	"metadata": { "duration": 187, "speakers_count": 2, "processed_at": "2025-09-30T...Z" }
}

4.2 Action Item Normalization

LLM may return strings or heterogenous objects; all coerced to:

interface ActionItem { task: string; assignee?: string | null; priority: 'high' | 'medium' | 'low'; due_date?: string | null; }

5. Processing Pipeline

Upload → row created (uploaded).
Transcription (AssemblyAI) → transcribing.
Store transcript → analyzing.
Gemini analysis + summary.
Normalize & assemble report_data (title, summaries, key points, action items, transcript segments, metadata).
Mark completed or failed (with error_message).

6. AI Prompt Strategy

Dual structure prompt (comprehensive + meeting). Post‑process: strip code fences, parse JSON, normalize action items. Fallback logic ensures UI always has arrays (possibly empty) rather than null.

7. Frontend Routes

Route	Description
`/dashboard`	Overview stats & recent jobs
`/upload`	Upload audio file
`/jobs`	List & filter jobs
`/jobs/[id]`	Detailed job status & raw JSON previews
`/report/[id]`	Final structured report (5 required sections)
`/documentation`	In‑app condensed documentation (mirrors this README)
`/login` `/register`	Auth flows

8. API (Express)

Endpoint	Method	Notes
`/api/audio/jobs`	GET	User jobs list
`/api/audio/jobs/:id`	GET	Single job incl. report when ready
`/api/audio/upload`	POST	Multipart upload
`/api/audio/process-audio`	POST	Upload + immediately start processing pipeline
`/api/audio/process-audio`	POST	Upload + immediately start processing pipeline
`/api/audio/process/:id`	POST	(Optional trigger)

Standard response: { success, data?, error? }.

9. Environment Variables

Backend:

PORT=4000
FRONTEND_URL=http://localhost:3000
SUPABASE_URL=...
SUPABASE_ANON_KEY=...
SUPABASE_SERVICE_ROLE_KEY=...
GEMINI_API_KEY=...
ASSEMBLYAI_API_KEY=...

Frontend:

NEXT_PUBLIC_API_URL=...
NEXT_PUBLIC_MAX_FILE_SIZE=...
NEXT_PUBLIC_MAX_FILE_SIZE=...
NEXT_PUBLIC_ALLOWED_AUDIO_TYPES=...

10. Local Development Workflow

Start backend & frontend (see Quick Start).
Upload file → observe status progression in /jobs.
Once completed, open /report/<jobId>.

11. Error Handling & Resilience

Graceful fallbacks: placeholders instead of crashing UI; failed state with message for any stage exception.

12. Testing Suggestions

Layer	What
Unit	`formatActionItems`, title generation, transcript normalization
Integration	Upload → complete pipeline
Edge	Long audio, single speaker, low confidence

13. Troubleshooting Cheatsheet

Symptom	Cause	Fix
Empty report fields	Analysis not stored	Check backend logs & API keys
404 on report	Job not completed	Wait / verify status
No action items	LLM output unstructured	Reprocess, confirm normalization
One speaker only	No diarization	Ensure AssemblyAI speaker labels enabled

14. Future Enhancements

PDF export, webhooks (replace polling), streaming transcript, entity enrichment, multi‑tenant sharing, transcript confidence heat‑map.

15. Key Challenges & Resolutions

Challenge :- Integrating Assembly AI and Gemini and orchestrating the whole process as it was my first time doing it.

15. Key Challenges & Resolutions

Challenge :- Integrating Assembly AI and Gemini and orchestrating the whole process as it was my first time doing it.

Summary

The system delivers a reproducible pipeline: upload → transcription (AssemblyAI) → AI analysis (Gemini) → normalized report_data → minimal structured report UI. Architectural decisions prioritize clarity, future scalability (swap polling for events, move storage to cloud), and resilience against AI output variability.

System Design

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiveTranscript Reporter

0. Quick Start

1. Purpose

2. High‑Level Architecture

3. Technology Stack

3.1 Why These Choices

4. Data Model (Key Fields)

4.1 `report_data` Shape

4.2 Action Item Normalization

5. Processing Pipeline

6. AI Prompt Strategy

7. Frontend Routes

8. API (Express)

9. Environment Variables

10. Local Development Workflow

11. Error Handling & Resilience

12. Testing Suggestions

13. Troubleshooting Cheatsheet

14. Future Enhancements

15. Key Challenges & Resolutions

15. Key Challenges & Resolutions

Summary

System Design

About

Uh oh!

Releases

Packages

Languages

kush2863/live-transcript

Folders and files

Latest commit

History

Repository files navigation

LiveTranscript Reporter

0. Quick Start

1. Purpose

2. High‑Level Architecture

3. Technology Stack

3.1 Why These Choices

4. Data Model (Key Fields)

4.1 report_data Shape

4.2 Action Item Normalization

5. Processing Pipeline

6. AI Prompt Strategy

7. Frontend Routes

8. API (Express)

9. Environment Variables

10. Local Development Workflow

11. Error Handling & Resilience

12. Testing Suggestions

13. Troubleshooting Cheatsheet

14. Future Enhancements

15. Key Challenges & Resolutions

15. Key Challenges & Resolutions

Summary

System Design

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

4.1 `report_data` Shape

Packages