Skip to content

Tags: mudler/parakeet.cpp

Tags

v0.3.2

Toggle v0.3.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci(release): ship the C-API shared library and header (#25) (#29)

The release bundles only carried parakeet-cli, so integrating via
ctypes/dlopen meant compiling from source. Add a separate shared-library
bundle per (platform, backend, arch) alongside each CLI bundle.

The lib is built in its own build dir with PARAKEET_SHARED=ON and
PARAKEET_BUILD_CLI=OFF, so the CLI bundles stay single self-contained
binaries. Each lib bundle carries libparakeet.{so,dylib,dll}, the
parakeet_capi.h header, LICENSE and README (plus the same cudart/cublas
on CUDA). The release job already globs dist/*, so they attach with no
further change.

Two things that would have silently broken it:

- BUILD_SHARED_LIBS=OFF builds ggml's static objects without -fPIC, so
  folding them into a shared lib fails to link on Linux. The shared-lib
  configure steps set CMAKE_POSITION_INDEPENDENT_CODE=ON.

- The C-API header marks symbols with plain extern "C", no dllexport, so
  an MSVC DLL would export nothing. Set WINDOWS_EXPORT_ALL_SYMBOLS on the
  parakeet target when PARAKEET_SHARED is on. No effect on Linux/macOS.

Verified on Linux CPU: a self-contained libparakeet.so depending only on
standard system libs, with the parakeet_capi_* symbols exported.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

v0.3.1

Toggle v0.3.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix(server): build the model fetcher on Windows (MSVC) (#28)

model_fetch.cpp included POSIX-only <sys/wait.h>/<unistd.h> and used
fork/execvp/waitpid, so the Windows release build failed with
'Cannot open include file: sys/wait.h'. Add a Windows path using _spawnvp
(_P_WAIT), which runs curl/wget directly without a shell, keeping the
no-shell-injection property of the POSIX exec path. have_tool now uses
'where' on Windows and 'command -v' on POSIX.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

v0.3.0

Toggle v0.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
OpenAI-compatible HTTP server + docker image (#8)

* feat(server): OpenAI-compatible transcription server example

Squashed rebase of the worktree-openai-server branch (PR #8) onto current
master. Adds examples/server (httplib-based POST /v1/audio/transcriptions),
model resolver/fetcher, OpenAI response formatter, and unit + e2e tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(docker): publish a dedicated parakeet-server image

Split the Dockerfile into a shared build stage plus two runtime targets:
runtime (cli, default, unchanged) and runtime-server (entrypoint
parakeet-server --host 0.0.0.0, EXPOSE 8080, curl for alias fetch). The
docker workflow now builds and publishes both ghcr images, cli and server,
for each (variant, arch); the server build reuses the cli build stage so
ggml compiles once per job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: document the OpenAI server and docker images; point to LocalAI for production

Add an OpenAI-compatible server section to the main README (build, curl and
OpenAI-client usage) and extend the Docker section to cover the new
parakeet.cpp-server image alongside the cli. Note LocalAI as the production
path in both the main README and examples/server/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: add server-e2e job driving the real parakeet-server over HTTP

Builds parakeet-server and runs tests/server_e2e.sh (PARAKEET_SERVER_E2E=1):
fetches the ~125 MB tdt_ctc-110m-q4_k model by alias, starts the server, and
hits POST /v1/audio/transcriptions with a real WAV in json/text/verbose_json
(plus word timestamps), checking the transcription and the 400 paths. Runs on
pull_request and workflow_dispatch, like closed-loop; no NeMo/Python venv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

v0.2.0

Toggle v0.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: pre-built release binaries for linux, macos and windows (#22)

* ci: pre-built release binaries for linux, macos and windows (#21)

Adds a release workflow that builds self-contained parakeet-cli bundles
for every v* tag: linux x64 (cpu, vulkan, cuda) and arm64 (cpu), macos
arm64 (metal) and x64 (cpu), windows x64 (cpu, vulkan, cuda) plus a
separate cudart runtime zip. Assets attach to the GitHub release for
the tag, creating a draft release when none exists yet.

Fixes #21

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: point the README at the pre-built release bundles

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ci: capture the usage banner before grepping in the smoke tests

parakeet-cli exits 2 when invoked bare; under the runner's bash -e -o
pipefail that exit code fails the pipeline even though grep matched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ci: drop the temporary branch trigger used for matrix validation

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ci: let ggml pick the CUDA architectures, like llama.cpp releases

Dropping the hand-rolled CMAKE_CUDA_ARCHITECTURES lists lets ggml's
curated non-native default apply: PTX for the datacenter generations
(75, 80, 90), real code for the common consumer cards (86, 89, 120a),
and 121a-real for GB10 on CUDA 13. Smaller binaries, faster builds,
and the list stays current with submodule bumps.

Temporarily re-adds the branch trigger to validate the CUDA builds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

v0.1.2

Toggle v0.1.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: build and publish the parakeet-cli container image to ghcr (#7)

* ci: build and publish the parakeet-cli container image to ghcr

Add a multi-stage Dockerfile and a docker workflow that builds the
parakeet-cli image and pushes it to ghcr.io/<owner>/parakeet.cpp-cli.

- Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends,
  a slim runtime stage carries only the binary and the ggml .so files. One
  Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE /
  CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable
  across x86-64 hosts. The ggml submodule is re-inited as a throwaway git
  repo in the build stage so the CMake-driven patch step (git apply) works
  regardless of how the submodule arrived in the context.
- docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate
  on PRs), pushes to ghcr on master + tags + dispatch. Tags via
  metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA
  variant. Uses GITHUB_TOKEN, gha build cache.
- .dockerignore keeps the context small (excludes .git, build dirs, models,
  benchmark media) while keeping the ggml source.
- README: Docker section with CPU and CUDA run examples.

Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and
transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields
the exact NeMo reference transcript.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* ci(docker): build multi-arch (amd64 + arm64) images for both variants

Publish each variant (cpu, cuda) as a multi-arch manifest covering
linux/amd64 and linux/arm64. The arm64 CUDA image runs natively on Grace /
GB10-class hosts.

Every arch is built natively, no QEMU: amd64 on ubuntu-24.04, arm64 on the
ubuntu-24.04-arm hosted runner (free for public repos). Emulated nvcc builds
would be far too slow. The per-arch images are pushed by digest and a merge
job stitches them into one manifest per variant, tagged via metadata-action.

Verified the arm64 CPU image builds and runs (aarch64) under emulation
locally, and confirmed the ubuntu and nvidia/cuda base images all ship arm64.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* ci(docker): build CUDA images on CUDA 13 for Blackwell / GB10 (DGX Spark)

CUDA 12.6 tops out at sm_90, so the CUDA images would not run on GB10 /
Grace-Blackwell. The vendored ggml's CUDA CMake adds 120a-real at CUDA >= 12.8
and 121a-real (GB10 / DGX Spark / Thor) at CUDA >= 12.9, all under our
GGML_NATIVE=OFF default. Bumping both arches to nvidia/cuda:13.0.1 therefore
compiles Turing through Blackwell with no manual arch list: amd64 picks up
Hopper / Ada / RTX 50, arm64 picks up GH200 (sm_90 PTX) and GB10 (sm_121).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* ci(docker): fix CUDA link (GGML_CUDA_NO_VMM), trim arm64 archs, CPU-only PR gate

The CUDA builds failed at link: libggml-cuda.so had undefined references to
the CUDA driver API (cuMemCreate, cuMemMap, cuDeviceGet, ...). Those come from
ggml's VMM memory pool, which links libcuda -- a lib a GPU-less build
container does not have. Build with -DGGML_CUDA_NO_VMM=ON: every cuMem* call
is under #if defined(GGML_USE_VMM), which this flag disables, so the symbols
and the libcuda link dependency both go away. Verified locally: the amd64
CUDA image now links clean, ships libggml-cuda.so, and resolves libcudart /
libcublas from the CUDA 13 runtime base.

Also cut build time, which had blown out to 43 min on the arm64 CUDA job:
- arm64 CUDA targets only Grace GPUs now (CUDA_ARCHS=90;121-real -> GH200 +
  GB10/Spark) instead of ggml's full 7-arch list. Added a dedicated quoted
  CUDA_ARCHS build-arg so the ';' list separator survives the shell (the
  unquoted CMAKE_EXTRA_ARGS would split it as a command separator).
- pull_request now builds the CPU variant only (fast Dockerfile gate) via a
  dynamic matrix from a setup job. CUDA builds only on push / tag / dispatch,
  which also publish. Use workflow_dispatch to exercise CUDA before merging.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

v0.1.1

Toggle v0.1.1's commit message
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR

Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with
a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs,
quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API
(include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend.
Includes the NeMo parity suite and HF publishing tooling
(scripts/publish_hf.py -> mudler/parakeet-cpp-gguf).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

localai-backend-pin

Toggle localai-backend-pin's commit message
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR

Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with
a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs,
quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API
(include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend.
Includes the NeMo parity suite and HF publishing tooling
(scripts/publish_hf.py -> mudler/parakeet-cpp-gguf).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

v0.1.0

Toggle v0.1.0's commit message
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR

Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with
a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs,
quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API
(include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend.
Includes the NeMo parity suite and HF publishing tooling
(scripts/publish_hf.py -> mudler/parakeet-cpp-gguf).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]