Skip to content

A lightweight Python/Flask setup exposing Vertex AI text embeddings (text-embedding-004) and Gemini (google/gemini-2.5-pro) via simple HTTP endpoints.

License

Notifications You must be signed in to change notification settings

AI-Governance-Lab/vertex-ai-first-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vertex AI – Local Docker (Linux) and Google Cloud Run

For Developers: Minimal Flask endpoints providing Vertex AI text embeddings and Gemini generation, containerized for local use and Cloud Run deployment with multiple registry options (Docker Hub, GCR, Artifact Registry).
For Engineering Managers: Lightweight, stateless services enabling rapid experimentation and controlled rollout (separate images for embeddings vs. generative), supporting environment-based model switching without rebuilds.
For HR / Talent / Non-Technical Stakeholders: Demonstrates a clean, portable AI service template showing how Google Vertex AI models can be safely integrated, scaled on demand, and cost-optimized (scale-to-zero) using modern cloud practices.

flowchart LR
  U[Client] -->|POST JSON| R1[Cloud Run ai-model-endpoint]
  U -->|POST JSON| R2[Cloud Run ai-gemini-endpoint]

  subgraph Containers
    R1 --> A1[main.py Embeddings]
    R2 --> A2[main_gemini.py Gemini]
  end

  A1 -->|init project region| V[(Vertex AI)]
  A2 -->|init project region| V

  V -->|text-embedding-004| A1
  V -->|GEMINI_MODEL default google gemini 2.5 pro| A2

  subgraph Registries
    DH[(Docker Hub)]
    GCR[(gcr.io)]
    AR[(Artifact Registry)]
  end

  DH --> R1
  DH --> R2
  GCR --> R1
  GCR --> R2
  AR --> R1
  AR --> R2
Loading

Index

Test locally on Linux, then deploy to Google Cloud Run. Build from inside the docker folder.

Overview

  • gecko image: text embeddings (text-embedding-004).
  • gemini image: text generation (Gemini; default google/gemini-2.5-pro via env).
  • App listens on PORT 8080 and binds 0.0.0.0.
  • Vertex calls require ADC or a service account key.

Option A — Local Docker (Linux)

  1. Variables
export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
export IMAGE_REPO="docker.io/crevesky/gcp-free-ai_models"
  1. Build images (run from repo/docker)
cd docker

# Embeddings image (tag: gecko)
docker build -t "$IMAGE_REPO:gecko" -f Dockerfile .

# Gemini image (tag: gemini)
docker build -t "$IMAGE_REPO:gemini" -f Dockerfile_gemini .
  1. Run locally without credentials (Vertex skipped)
# Embeddings
docker run --rm -p 8080:8080 "$IMAGE_REPO:gecko"

# Gemini
docker run --rm -p 8080:8080 "$IMAGE_REPO:gemini"
  1. Run locally with Vertex AI using ADC (recommended)
gcloud auth application-default login

# Embeddings
docker run --rm -p 8080:8080 \
  -e GCP_PROJECT_ID="$PROJECT_ID" \
  -v "$HOME/.config/gcloud:/root/.config/gcloud:ro" \
  "$IMAGE_REPO:gecko"

# Gemini (set model via env; default is google/gemini-2.5-pro)
docker run --rm -p 8080:8080 \
  -e GCP_PROJECT_ID="$PROJECT_ID" \
  -e GCP_REGION="$REGION" \
  -e GEMINI_MODEL="google/gemini-2.5-pro" \
  -v "$HOME/.config/gcloud:/root/.config/gcloud:ro" \
  "$IMAGE_REPO:gemini"
  1. Test endpoints
# Embeddings
curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"Hello from my local container!"}' \
  http://localhost:8080

# Gemini
curl -X POST -H "Content-Type: application/json" \
  -d '{"prompt":"List three benefits of containerizing ML inference."}' \
  http://localhost:8080

Expected (examples):

  • Embeddings without creds: {"ok":true,"echo":"...","length":30,"vertex_skipped":"..."}
  • Embeddings with ADC: {"ok":true,"echo":"...","length":30,"embedding_dim":768}
  • Gemini with ADC: {"ok":true,"gemini_model":"publishers/google/models/gemini-2.5-pro","gemini_text":"..."}

Option B — Deploy to Google Cloud Run (Linux)

B1.1 — Using Docker Hub (current)

  1. Auth and set project/region
gcloud auth login
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"
  1. Enable APIs
gcloud services enable run.googleapis.com aiplatform.googleapis.com
  1. Push images
docker login
docker push "$IMAGE_REPO:gecko"
docker push "$IMAGE_REPO:gemini"
  1. Grant Vertex AI permission to Cloud Run service account
PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  --role="roles/aiplatform.user"
  1. Deploy embeddings service
gcloud run deploy "ai-model-endpoint" \
  --image "$IMAGE_REPO:gecko" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1
  1. Deploy gemini service (set model without rebuilding)
gcloud run deploy "ai-gemini-endpoint" \
  --image "$IMAGE_REPO:gemini" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1
  1. Test deployed services
# Embeddings
URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

# Gemini
URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Notes:

  • The service expects POST with JSON. Opening the URL in a browser (GET) returns a usage hint and does not call the model.
  • To avoid interactive prompts, set defaults:
gcloud config set run/region "$REGION"
gcloud config set core/disable_prompts true

Or add --region and --quiet to deploy commands.


B1.2 — Using Google Container Registry (gcr.io)

  1. Set variables
export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
# Image names in GCR
export GCR_GECKO="gcr.io/$PROJECT_ID/ai-embeddings:gecko"
export GCR_GEMINI="gcr.io/$PROJECT_ID/ai-gemini:gemini"
  1. Enable required APIs (includes Container Registry)
gcloud services enable run.googleapis.com aiplatform.googleapis.com containerregistry.googleapis.com
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"
  1. Configure Docker to use gcloud as a credential helper for GCR
gcloud auth configure-docker gcr.io
  1. Build and tag images (run from repo/docker)
cd docker

# Embeddings -> GCR
docker build -t "$GCR_GECKO" -f Dockerfile .

# Gemini -> GCR
docker build -t "$GCR_GEMINI" -f Dockerfile_gemini .
  1. Push to GCR
docker push "$GCR_GECKO"
docker push "$GCR_GEMINI"
  1. Grant Vertex AI permission to Cloud Run service account
PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \
  --role="roles/aiplatform.user"
  1. Deploy services from GCR images
# Embeddings
gcloud run deploy "ai-model-endpoint" \
  --image "$GCR_GECKO" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1

# Gemini
gcloud run deploy "ai-gemini-endpoint" \
  --image "$GCR_GEMINI" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1
  1. Test
URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Tip:

  • GCR is legacy; Artifact Registry is recommended. Use B1.1/Docker Hub or Artifact Registry if you prefer modern registries.

B1.1.3 — Build with Google Cloud Build (remote) → Artifact Registry

  1. Variables
export PROJECT_ID="gcr_project_id"
export REGION="us-central1"
export REPO="ai-tests"  # existing Artifact Registry repo
export AR_HOST="${REGION}-docker.pkg.dev"
export AR_GECKO="${AR_HOST}/${PROJECT_ID}/${REPO}/gecko:gecko"
export AR_GEMINI="${AR_HOST}/${PROJECT_ID}/${REPO}/gemini:gemini"
gcloud config set project "$PROJECT_ID"
gcloud config set run/region "$REGION"
  1. Enable services and auth to Artifact Registry
gcloud services enable artifactregistry.googleapis.com cloudbuild.googleapis.com run.googleapis.com aiplatform.googleapis.com
gcloud auth configure-docker "$AR_HOST"
  1. Remote build and push (no local docker build)
# Build gecko using docker/Dockerfile (submit the docker/ folder)
gcloud builds submit docker/ --tag "$AR_GECKO" --region "$REGION"

# Build gemini using docker/Dockerfile_gemini (submit repo root with explicit -f)
gcloud builds submit . --region "$REGION" --config <(cat <<'YAML'
steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build','-t','$_IMG','-f','docker/Dockerfile_gemini','.']
images: ['$_IMG']
YAML
) --substitutions=_IMG="$AR_GEMINI"
  1. Deploy from Artifact Registry images
# Embeddings
gcloud run deploy "ai-model-endpoint" \
  --image "$AR_GECKO" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION" \
  --min-instances=0 --max-instances=1

# Gemini
gcloud run deploy "ai-gemini-endpoint" \
  --image "$AR_GEMINI" \
  --allow-unauthenticated \
  --set-env-vars GCP_PROJECT_ID="$PROJECT_ID",GCP_REGION="$REGION",GEMINI_MODEL="google/gemini-2.5-pro" \
  --min-instances=0 --max-instances=1
  1. Test
URL1="$(gcloud run services describe ai-model-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"text":"Hello Cloud Run"}' "$URL1"

URL2="$(gcloud run services describe ai-gemini-endpoint --format='value(status.url)')"
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Summarize Cloud Run."}' "$URL2"

Notes:

  • The gemini build uses an inline Cloud Build config to select Dockerfile_gemini.
  • Add --quiet to gcloud commands to avoid interactive prompts.

Env vars

  • GCP_PROJECT_ID / GCP_REGION: used to initialize Vertex AI (region defaults to us-central1).
  • GEMINI_MODEL: defaults to google/gemini-2.5-pro; also accepts full resource like publishers/google/models/gemini-2.5-pro.
  • PORT: set by Cloud Run (defaults to 8080 locally).

About

A lightweight Python/Flask setup exposing Vertex AI text embeddings (text-embedding-004) and Gemini (google/gemini-2.5-pro) via simple HTTP endpoints.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published