Vertex AI – Local Docker (Linux) and Google Cloud Run

For Developers: Minimal Flask endpoints providing Vertex AI text embeddings and Gemini generation, containerized for local use and Cloud Run deployment with multiple registry options (Docker Hub, GCR, Artifact Registry).
For Engineering Managers: Lightweight, stateless services enabling rapid experimentation and controlled rollout (separate images for embeddings vs. generative), supporting environment-based model switching without rebuilds.
For HR / Talent / Non-Technical Stakeholders: Demonstrates a clean, portable AI service template showing how Google Vertex AI models can be safely integrated, scaled on demand, and cost-optimized (scale-to-zero) using modern cloud practices.

flowchart LR
  U[Client] -->|POST JSON| R1[Cloud Run ai-model-endpoint]
  U -->|POST JSON| R2[Cloud Run ai-gemini-endpoint]

  subgraph Containers
    R1 --> A1[main.py Embeddings]
    R2 --> A2[main_gemini.py Gemini]
  end

  A1 -->|init project region| V[(Vertex AI)]
  A2 -->|init project region| V

  V -->|text-embedding-004| A1
  V -->|GEMINI_MODEL default google gemini 2.5 pro| A2

  subgraph Registries
    DH[(Docker Hub)]
    GCR[(gcr.io)]
    AR[(Artifact Registry)]
  end

  DH --> R1
  DH --> R2
  GCR --> R1
  GCR --> R2
  AR --> R1
  AR --> R2

Index

Overview
Proof / Screenshots (prtscr folder) — sample curls and browser shots proving the services work
Option A — Local Docker (Linux)
Option B — Deploy to Google Cloud Run (Linux)
Env vars

Test locally on Linux, then deploy to Google Cloud Run. Build from inside the docker folder.

Overview

gecko image: text embeddings (text-embedding-004).
gemini image: text generation (Gemini; default google/gemini-2.5-pro via env).
App listens on PORT 8080 and binds 0.0.0.0.
Vertex calls require ADC or a service account key.

Option A — Local Docker (Linux)

Variables

export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
export IMAGE_REPO="docker.io/crevesky/gcp-free-ai_models"

Build images (run from repo/docker)

cd docker

# Embeddings image (tag: gecko)
docker build -t "$IMAGE_REPO:gecko" -f Dockerfile .

# Gemini image (tag: gemini)
docker build -t "$IMAGE_REPO:gemini" -f Dockerfile_gemini .

Run locally without credentials (Vertex skipped)

# Embeddings
docker run --rm -p 8080:8080 "$IMAGE_REPO:gecko"

# Gemini
docker run --rm -p 8080:8080 "$IMAGE_REPO:gemini"

Run locally with Vertex AI using ADC (recommended)

gcloud auth application-default login

# Embeddings
docker run --rm -p 8080:8080 \
  -e GCP_PROJECT_ID="$PROJECT_ID" \
  -v "$HOME/.config/gcloud:/root/.config/gcloud:ro" \
  "$IMAGE_REPO:gecko"

# Gemini (set model via env; default is google/gemini-2.5-pro)
docker run --rm -p 8080:8080 \
  -e GCP_PROJECT_ID="$PROJECT_ID" \
  -e GCP_REGION="$REGION" \
  -e GEMINI_MODEL="google/gemini-2.5-pro" \
  -v "$HOME/.config/gcloud:/root/.config/gcloud:ro" \
  "$IMAGE_REPO:gemini"

Test endpoints

# Embeddings
curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"Hello from my local container!"}' \
  http://localhost:8080

# Gemini
curl -X POST -H "Content-Type: application/json" \
  -d '{"prompt":"List three benefits of containerizing ML inference."}' \
  http://localhost:8080

Expected (examples):

Embeddings without creds: {"ok":true,"echo":"...","length":30,"vertex_skipped":"..."}
Embeddings with ADC: {"ok":true,"echo":"...","length":30,"embedding_dim":768}
Gemini with ADC: {"ok":true,"gemini_model":"publishers/google/models/gemini-2.5-pro","gemini_text":"..."}

Option B — Deploy to Google Cloud Run (Linux)

B1.1 — Using Docker Hub (current)

Auth and set project/region

gcloud auth login
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"

Enable APIs

gcloud services enable run.googleapis.com aiplatform.googleapis.com

Push images

docker login
docker push "$IMAGE_REPO:gecko"
docker push "$IMAGE_REPO:gemini"

Grant Vertex AI permission to Cloud Run service account

PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Deploy embeddings service

gcloud run deploy "ai-model-endpoint" \
  --image "$IMAGE_REPO:gecko" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1

Deploy gemini service (set model without rebuilding)

gcloud run deploy "ai-gemini-endpoint" \
  --image "$IMAGE_REPO:gemini" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1

Test deployed services

# Embeddings
URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

# Gemini
URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Notes:

The service expects POST with JSON. Opening the URL in a browser (GET) returns a usage hint and does not call the model.
To avoid interactive prompts, set defaults:

gcloud config set run/region "$REGION"
gcloud config set core/disable_prompts true

Or add --region and --quiet to deploy commands.

B1.2 — Using Google Container Registry (gcr.io)

Set variables

export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
# Image names in GCR
export GCR_GECKO="gcr.io/$PROJECT_ID/ai-embeddings:gecko"
export GCR_GEMINI="gcr.io/$PROJECT_ID/ai-gemini:gemini"

Enable required APIs (includes Container Registry)

gcloud services enable run.googleapis.com aiplatform.googleapis.com containerregistry.googleapis.com
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"

Configure Docker to use gcloud as a credential helper for GCR

gcloud auth configure-docker gcr.io

Build and tag images (run from repo/docker)

cd docker

# Embeddings -> GCR
docker build -t "$GCR_GECKO" -f Dockerfile .

# Gemini -> GCR
docker build -t "$GCR_GEMINI" -f Dockerfile_gemini .

Push to GCR

docker push "$GCR_GECKO"
docker push "$GCR_GEMINI"

Grant Vertex AI permission to Cloud Run service account

PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Deploy services from GCR images

# Embeddings
gcloud run deploy "ai-model-endpoint" \
  --image "$GCR_GECKO" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1

# Gemini
gcloud run deploy "ai-gemini-endpoint" \
  --image "$GCR_GEMINI" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1

Test

URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Tip:

GCR is legacy; Artifact Registry is recommended. Use B1.1/Docker Hub or Artifact Registry if you prefer modern registries.

B1.1.3 — Build with Google Cloud Build (remote) → Artifact Registry

Variables

export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
export REPO="ai-tests"  # existing Artifact Registry repo
export AR_HOST="${REGION}-docker.pkg.dev"
export AR_GECKO="${AR_HOST}/${PROJECT_ID}/${REPO}/gecko:gecko"
export AR_GEMINI="${AR_HOST}/${PROJECT_ID}/${REPO}/gemini:gemini"
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"

Enable services and auth to Artifact Registry

gcloud services enable artifactregistry.googleapis.com cloudbuild.googleapis.com run.googleapis.com aiplatform.googleapis.com
gcloud auth configure-docker "$AR_HOST"

Remote build and push (no local docker build)

# Build gecko using docker/Dockerfile (submit the docker/ folder)
gcloud builds submit docker/ --tag "$AR_GECKO" --region "$REGION"

# Build gemini using docker/Dockerfile_gemini (submit repo root with explicit -f)
gcloud builds submit . --region "$REGION" --config <(cat <<'YAML'
steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build','-t','$_IMG','-f','docker/Dockerfile_gemini','.']
images: ['$_IMG']
YAML
) --substitutions=_IMG="$AR_GEMINI"

Deploy from Artifact Registry images

# Embeddings
gcloud run deploy "ai-model-endpoint" \
  --image "$AR_GECKO" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1

# Gemini
gcloud run deploy "ai-gemini-endpoint" \
  --image "$AR_GEMINI" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1

Test

URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Notes:

The gemini build uses an inline Cloud Build config to select Dockerfile_gemini.
Add --quiet to gcloud commands to avoid interactive prompts.

Env vars

GCP_PROJECT_ID / GCP_REGION: used to initialize Vertex AI (region defaults to us-central1).
GEMINI_MODEL: defaults to google/gemini-2.5-pro; also accepts full resource like publishers/google/models/gemini-2.5-pro.
PORT: set by Cloud Run (defaults to 8080 locally).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docker		docker
prtscr		prtscr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vertex AI – Local Docker (Linux) and Google Cloud Run

Index

Overview

Option A — Local Docker (Linux)

Option B — Deploy to Google Cloud Run (Linux)

B1.1 — Using Docker Hub (current)

B1.2 — Using Google Container Registry (gcr.io)

B1.1.3 — Build with Google Cloud Build (remote) → Artifact Registry

Env vars

About

Uh oh!

Releases

Packages

Languages

License

AI-Governance-Lab/vertex-ai-first-test

Folders and files

Latest commit

History

Repository files navigation

Vertex AI – Local Docker (Linux) and Google Cloud Run

Index

Overview

Option A — Local Docker (Linux)

Option B — Deploy to Google Cloud Run (Linux)

B1.1 — Using Docker Hub (current)

B1.2 — Using Google Container Registry (gcr.io)

B1.1.3 — Build with Google Cloud Build (remote) → Artifact Registry

Env vars

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages