Skip to content

ardacey/Audify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Audify: AI-Powered Image & Video to Music Composer

Audify is an agent-based application built with LangGraph and Streamlit that transforms visual media (images and videos) into unique, emotionally resonant musical compositions. It uses a team of AI agents to analyze visual input, develop a musical concept, generate a prompt, and compose a track, which you can then refine with your own feedback.

image

✨ Features

  • Agentic Workflow: Powered by LangGraph, Audify uses a graph of specialized AI agents that collaborate to turn an idea into music.
  • Multi-Modal Input: Generate music from either static images or dynamic videos.
  • Dynamic Video Analysis: For videos, the app uses scenedetect to extract keyframes, which are then collectively analyzed to create a single, cohesive musical score that follows the video's narrative arc.
  • AI Music Theory: The MusicTheorist agent analyzes the visuals to determine mood, genre, tempo, and key instruments.
  • Creative Refinement: A MusicCritic agent enhances the initial musical idea, making it more evocative and detailed for the generation model.
  • Iterative Feedback Loop: Not satisfied with the result? Provide natural language feedback (e.g., "make it faster and more epic") and the RefinementAgent will rewrite the prompt and regenerate the track.
  • Advanced Parameter Tuning: An AI ParameterTuner automatically adjusts technical settings for the music generation model based on the visual analysis.
  • High-Quality Music Generation: Uses the powerful ACE-Step model for music synthesis.
  • Interactive Web UI: A user-friendly interface built with Streamlit.

πŸ›οΈ Architecture: The Agentic Graph

Audify's core logic is a stateful graph where each node is a specialized agent. The process flows from visual analysis to music generation, with decision points that direct the workflow based on the current state.

image
  1. Input Node (Router): The graph's entry point determines the first step based on the input:
    • If a video is provided, it routes to the VideoAnalyzer.
    • If an image is provided, it routes to the MusicTheorist.
    • If the user provides feedback on existing music, it routes to the RefinementAgent.
  2. VideoAnalyzer: If the input is a video, this node extracts keyframes, analyzes them as a sequence to understand the story, and generates a MusicTheory object (mood, genre, etc.) for the entire video.
  3. MusicTheorist: Analyzes a single image to generate a MusicTheory object.
  4. MusicCritic: Takes the MusicTheory object and refines the detailed_prompt, making it richer and more descriptive for the music model.
  5. ParameterTuner: Adjusts technical generation parameters (e.g., omega_scale, guidance_scale) based on the analyzed mood and genre.
  6. LyricsGenerator (Optional): If requested, this agent writes lyrics that match the musical concept.
  7. MusicGenerator: The final step in the main flow. It takes the refined prompt and tuned parameters and uses the ACE-Step model to generate the audio file.
  8. RefinementAgent: This node is triggered by user feedback. It modifies the existing music prompt based on the user's request and sends the new prompt back to the MusicGenerator.

πŸ› οΈ Technology Stack

  • Orchestration: LangGraph
  • Web Framework: Streamlit
  • LLM (Analysis & Text): Google Gemini via langchain_google_genai
  • Music Generation Model: ACE-Step
  • Video Processing: MoviePy, scenedetect
  • Deployment: Google Colab, pyngrok

πŸš€ Setup and Usage

This project is designed to be run in a Google Colab environment to leverage its free GPU resources.

Prerequisites

  • A Google Account
  • Git

Running in Google Colab

  1. Open the Notebook: Open the Audify.ipynb notebook in Google Colab.

  2. Set Up API Keys: You will need API keys for Google Gemini and ngrok.

  3. Add Keys to Colab Secrets:

    • In your Colab notebook, click the "πŸ”‘" (Secrets) icon in the left sidebar.
    • Add two new secrets:
      • GEMINI_API_KEY: Paste your Google Gemini key here.
      • NGROK_AUTH_TOKEN: Paste your ngrok authtoken here.
    • Make sure to enable the "Notebook access" toggle for both secrets.
  4. Run the Cells:

    • Execute the cells in the notebook sequentially from top to bottom.
    • The first few cells will install all required dependencies and set up the project structure.
    • The final cells will start the Streamlit server and use ngrok to create a public URL.
  5. Access the App:

    • The last cell's output will provide a public ngrok URL (e.g., https://<unique-id>.ngrok-free.app).
    • Click this URL to open the Audify web application in your browser.

How to Use the App

  1. Step 1: Upload: Upload an image (.jpg, .png) or a short video (.mp4, .mov).
  2. Step 2: Analyze: Click the "Analyze & Create Music Concept" button. The AI agents will analyze your media and generate a musical prompt.
  3. Step 3: Generate: Review the AI-generated prompt and parameters. You can edit them if you wish. Click "Generate Music!".
  4. Step 4: Listen & Refine: Listen to your new track! If it's not quite right, type your feedback into the refinement box and click "Refine & Regenerate" to try again.

πŸ“ Project Structure

/
β”œβ”€β”€ Audify.ipynb             # The main Google Colab notebook for setup and execution.
β”œβ”€β”€ ACE-Step/                # Cloned repository for the music generation model.
β”œβ”€β”€ app.py                   # The Streamlit web application front-end.
β”œβ”€β”€ outputs/                 # Directory where generated music and videos are saved.
β”œβ”€β”€ temp_uploads/            # Temporary storage for user-uploaded files.
└── src/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ config.py            # Configuration for models and default parameters.
    β”œβ”€β”€ graph.py             # Defines the LangGraph agentic workflow.
    β”œβ”€β”€ models.py            # Pydantic models for data structures (e.g., MusicTheory).
    β”œβ”€β”€ state.py             # Defines the AppState TypedDict for the graph.
    └── nodes/               # Contains the individual agent modules.
        β”œβ”€β”€ __init__.py
        β”œβ”€β”€ music_theorist.py
        β”œβ”€β”€ video_analyzer.py
        β”œβ”€β”€ music_critic.py
        β”œβ”€β”€ lyrics_generator.py
        β”œβ”€β”€ parameter_tuner.py
        β”œβ”€β”€ refinement_agent.py
        └── music_generator.py

πŸ“œ License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ™ Acknowledgements

  • The LangChain team for creating LangGraph.
  • The developers of the ACE-Step model for their incredible work in music generation.
  • Google for the powerful Gemini models.
  • The Streamlit team for making it easy to build beautiful data apps.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published