llama

package
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 27, 2025 License: MIT Imports: 18 Imported by: 0

README

llama

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches
Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync
Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrKvCacheFull = errors.New("could not find a kv cache slot")

Functions

func BackendInit

func BackendInit()

func FreeModel

func FreeModel(model *Model)

func GetModelArch

func GetModelArch(modelPath string) (string, error)

func SchemaToGrammar

func SchemaToGrammar(schema []byte) []byte

SchemaToGrammar converts the provided JSON schema to a grammar. It returns nil if the provided schema is invalid JSON or an invalid JSON schema.

Types

type Batch

type Batch struct {
	// contains filtered or unexported fields
}

func NewBatch

func NewBatch(batchSize int, maxSeq int, embedSize int) (*Batch, error)

Creates a new batch for either word tokens or image embeddings (if embedSize is non-zero). Batches cannot contain both types at the same time. batchSize is the maximum number of entries that can be added per sequence

func (*Batch) Add

func (b *Batch) Add(token int, embed []float32, pos int, logits bool, seqIds ...int)

Add adds either a token or an image embedding to the batch depending on the type when the batch was initialized. The other argument will be ignored. Adds to the batch with the given position for the given sequence ids, and optionally instructs to include logits.

func (*Batch) Clear

func (b *Batch) Clear()

func (*Batch) Free

func (b *Batch) Free()

func (*Batch) IsEmbedding

func (b *Batch) IsEmbedding() bool

func (*Batch) NumTokens

func (b *Batch) NumTokens() int

func (*Batch) Size

func (b *Batch) Size() int

type Context ��

type Context struct {
	// contains filtered or unexported fields
}

func NewContextWithModel

func NewContextWithModel(model *Model, params ContextParams) (*Context, error)

func (*Context) Decode

func (c *Context) Decode(batch *Batch) error

func (*Context) GetEmbeddingsIth

func (c *Context) GetEmbeddingsIth(i int) []float32

func (*Context) GetEmbeddingsSeq

func (c *Context) GetEmbeddingsSeq(seqId int) []float32

Get the embeddings for a sequence id

func (*Context) GetLogitsIth

func (c *Context) GetLogitsIth(i int) []float32

GetLogitsIth gets the logits for the ith token

func (*Context) KvCacheCanShift

func (c *Context) KvCacheCanShift() bool

func (*Context) KvCacheClear

func (c *Context) KvCacheClear()

func (*Context) KvCacheSeqAdd

func (c *Context) KvCacheSeqAdd(seqId int, p0 int, p1 int, delta int)

func (*Context) KvCacheSeqCp

func (c *Context) KvCacheSeqCp(srcSeqId int, dstSeqId int, p0 int, p1 int)

func (*Context) KvCacheSeqRm

func (c *Context) KvCacheSeqRm(seqId int, p0 int, p1 int) bool

func (*Context) Model

func (c *Context) Model() *Model

func (*Context) Synchronize

func (c *Context) Synchronize()

type ContextParams

type ContextParams struct {
	// contains filtered or unexported fields
}

func NewContextParams

func NewContextParams(numCtx int, batchSize int, numSeqMax int, threads int, flashAttention bool, kvCacheType string) ContextParams

type Devices

type Devices struct {
	ml.DeviceID
	LlamaID uint64
}

func EnumerateGPUs

func EnumerateGPUs() []Devices

type Grammar

type Grammar struct {
	// contains filtered or unexported fields
}

func NewGrammar

func NewGrammar(grammar string, vocabIds []uint32, vocabValues []string, eogTokens []int32) *Grammar

func (*Grammar) Accept

func (g *Grammar) Accept(token int32)

func (*Grammar) Apply

func (g *Grammar) Apply(tokens []TokenData)

func (*Grammar) Free

func (g *Grammar) Free()

type Model

type Model struct {
	// contains filtered or unexported fields
}

func LoadModelFromFile

func LoadModelFromFile(modelPath string, params ModelParams) (*Model, error)

func (*Model) AddBOSToken

func (m *Model) AddBOSToken() bool

func (*Model) ApplyLoraFromFile

func (m *Model) ApplyLoraFromFile(context *Context, loraPath string, scale float32, threads int) error

func (*Model) NEmbd

func (m *Model) NEmbd() int

func (*Model) NumVocab

func (m *Model) NumVocab() int

func (*Model) TokenIsEog

func (m *Model) TokenIsEog(token int) bool

func (*Model) TokenToPiece

func (m *Model) TokenToPiece(token int) string

func (*Model) Tokenize

func (m *Model) Tokenize(text string, addSpecial bool, parseSpecial bool) ([]int, error)

func (*Model) Vocab

func (m *Model) Vocab() *C.struct_llama_vocab

type ModelParams

type ModelParams struct {
	Devices      []uint64
	NumGpuLayers int
	MainGpu      int
	UseMmap      bool
	TensorSplit  []float32
	Progress     func(float32)
	VocabOnly    bool
}

type MtmdChunk

type MtmdChunk struct {
	Embed  []float32
	Tokens []int
}

type MtmdContext

type MtmdContext struct {
	// contains filtered or unexported fields
}

vision processing

func NewMtmdContext

func NewMtmdContext(llamaContext *Context, modelPath string) (*MtmdContext, error)

func (*MtmdContext) Free

func (c *MtmdContext) Free()

func (*MtmdContext) MultimodalTokenize

func (c *MtmdContext) MultimodalTokenize(llamaContext *Context, data []byte) ([]MtmdChunk, error)

type SamplingContext

type SamplingContext struct {
	// contains filtered or unexported fields
}

sampling TODO: this is a temporary wrapper to allow calling C++ code from CGo

func NewSamplingContext

func NewSamplingContext(model *Model, params SamplingParams) (*SamplingContext, error)

func (*SamplingContext) Accept

func (s *SamplingContext) Accept(id int, applyGrammar bool)

func (*SamplingContext) Reset

func (s *SamplingContext) Reset()

func (*SamplingContext) Sample

func (s *SamplingContext) Sample(llamaContext *Context, idx int) int

type SamplingParams

type SamplingParams struct {
	TopK           int
	TopP           float32
	MinP           float32
	TypicalP       float32
	Temp           float32
	RepeatLastN    int
	PenaltyRepeat  float32
	PenaltyFreq    float32
	PenaltyPresent float32
	PenalizeNl     bool
	Seed           uint32
	Grammar        string
}

type TokenData

type TokenData struct {
	ID    int32
	Logit float32
}

Directories

Path Synopsis
llama.cpp
src