Dimensionality Reduction in Cinematography: Mapping Visual Style with UMAP

We often describe cinema with subjective adjectives like "gritty," "vibrant," or "chaotic," but a film is fundamentally a high-dimensional data signal—millions of pixels shifting over time. This project was born from a desire to bridge the gap between Film Theory and Computer Vision by asking a simple question: Can we mathematically quantify "cinematographic style"?

My goal was to build a pipeline that extracts the unique visual fingerprint of any movie, transforming abstract concepts like color grading, pacing, and entropy into analyzeable vectors. To achieve this, I developed moviesigdb, an open-source library that digests video content into temporal signals, and used the visually distinct world of Arcane as a proof-of-concept to demonstrate that algorithms can map the "visual geography" of a story without ever understanding the plot.

The Engine: Introducing moviesigdb

To handle the massive scale of video processing required for this project, I built and published moviesigdb (Movie Signal Database), an open-source Python library designed to extract "color fingerprints" from video files efficiently.

The library is available on PyPI and GitHub, serving as the foundational ETL (Extract, Transform, Load) tool for my cinematic analysis pipeline.

GitHub: https://github.com/shreyanbruh/moviesigdb

PyPI: Official open-source library

pip install moviesigdb

import moviesigdb as msdb

How It Works: The "Barcode" Pipeline

At its core, moviesigdb converts the temporal dimension of video into spatial data. It uses OpenCV to scan video files, but rather than processing every pixel of every frame (which is computationally expensive), it employs an intelligent sampling method:

Fast Frame Extraction: The library calculates exact step sizes to sample frames at a specific frequency (e.g., 1.0 FPS or 0.5 FPS) regardless of the source video's native framerate. This ensures consistent data density across different media formats.
Spatial Compression: For every sampled frame, the library collapses the 2D image (Height × Width) into a single RGB tuple (1 × 1) by calculating the mean color vector. This reduces gigabytes of video data into a lightweight "signal."
The "Image is the Database" Architecture: Perhaps the most unique feature of moviesigdb is how it stores data. When generating a "Movie Barcode" (a visualization of the timeline), the library injects the raw numerical data (JSON) directly into the PNG image’s metadata headers (Description field).

The following image represents a "barcode":

Project Application: Processing 1.6 Million Frames of Arcane

I tasked moviesigdb with digesting the entirety of Arcane (Seasons 1 & 2). The show contains approximately 1.6 million raw frames of animation.

Using the library's optimized extract_frames_fast module, I sampled the series at a fixed rate of 0.5 frames per second. This process converted roughly 18 hours of 4K animation into a precise, temporal color signal.

These extracted signals—visualized as the barcodes you see above—served as the raw input for the unsupervised learning (UMAP) model. By first compressing the show into these efficient color signals, I was able to map the series' entire visual trajectory without needing a supercomputer.

The following visual represents every episode of the show:

Final Graph

Extracting the raw colors with moviesigdb was just the first step. A "Movie Barcode" is beautiful, but it is linear—it traps you in the timeline. I wanted to break the timeline and see the structure.

To do this, I needed to group similar scenes together, regardless of when they happened in the show. If a scene in Episode 1 looks identical to a scene in Episode 9, they should be neighbors on the map.

1. Feature Engineering: Beyond Just Color

A simple average color isn't enough to define a style. A static red wall and a frantic red explosion might have the same average pixel value, but they feel completely different.

I built a custom VisualTrajectoryAnalyzer that slides a 30-second window across the entire series. For every window, it calculates an 8-dimensional feature vector:

Perceptual Color (CIELAB): How the human eye perceives the scene's palette.
Visual Entropy: A measure of complexity. (Is the image flat and foggy? Or sharp and detailed?)
Pacing Velocity: By measuring the rate of change between frames, we can mathematically quantify "action." High velocity means fast cuts and rapid movement; low velocity means stillness.

2. The Algorithm: UMAP

Humans can't visualize 8-dimensional data. To make sense of these vectors, I used UMAP (Uniform Manifold Approximation and Projection).

UMAP is a dimensionality reduction algorithm that acts like a translator. It takes complex, high-dimensional relationships and flattens them into a 2D map. It operates on a simple rule: Distance equals Difference.

If two dots on the graph are close together, those two scenes are visually indistinguishable (e.g., two conversations in Piltover).
If two dots are far apart, they are visually alien to one another (e.g., a bright Hextech lab vs. a dark Zaun alley).

Mathematically, UMAP constructs a high-dimensional graph by calculating the conditional probability pj∣i that a scene xj is similar to scene xi. It uses a local radius to ensure that even sparse regions (like the rare "Hextech" scenes) remain connected:

Where d(xi,xj) is the distance between the two feature vectors, ρi is the distance to the nearest neighbor, and σi acts as a normalization factor for the local density.

To generate the final 2D coordinates, the algorithm minimizes the fuzzy set cross-entropy (C) between the high-dimensional graph (P) and the low-dimensional embedding (Q). This forces the 2D layout to respect the complex relationships of the original 8D data:

The result is the image you see above: a generated "Atlas" of Arcane, where every pixel is a scene, and the continents are defined not by land and water, but by light, color, and kinetic energy.

I will consider publicly posting a huggingface dataset for other researchers to use.

Dimensionality Reduction in Cinematography: Mapping Visual Style with UMAP

Shreyan Banerjee

The Engine: Introducing moviesigdb

How It Works: The "Barcode" Pipeline

Project Application: Processing 1.6 Million Frames of Arcane

Final Graph

Recommended by LinkedIn

1. Feature Engineering: Beyond Just Color

2. The Algorithm: UMAP

Others also viewed

Master the Unseen

The Power of Images: How the Picture Superiority Effect Improves Recall

Countdown, the Short Animated Film: #12, Subdiv Proxy Experiments

Understanding the Impact of Linear vs. Non-Linear Gamma in DaVinci Resolve: A Case Study with ARRI Mini LF Footage

BETTER THAN AI? THE KEY TO OUR NEW KEY VISUAL

Some food for thought regarding High Dynamic Range Imaging-

Unleashing the Power of Oblique Imagery: From Shadowy Obscured Image to Photorealistic Ground Views with Generative AI

Breaking the Surface: My First Dive into Deep Compositing for CGI Passes

OctaneRender plus Gaussian Splatting Explained

Arcane art direction

Explore content categories