llama

package

v0.13.0 Latest Latest Go to latest Published: Nov 27, 2025 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/yurivict/ollama

Links

Open Source Insights

README ¶

`llama`

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches

Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync

Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.

Documentation ¶

Index ¶

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrKvCacheFull = errors.New("could not find a kv cache slot")

Functions ¶

func BackendInit ¶

func BackendInit()

func FreeModel ¶

func FreeModel(model *Model)

func GetModelArch ¶

func GetModelArch(modelPath string) (string, error)

func SchemaToGrammar ¶

func SchemaToGrammar(schema []byte) []byte

SchemaToGrammar converts the provided JSON schema to a grammar. It returns nil if the provided schema is invalid JSON or an invalid JSON schema.

Types ¶

type Batch ¶

type Batch struct {
	// contains filtered or unexported fields
}

func NewBatch ¶

func NewBatch(batchSize int, maxSeq int, embedSize int) (*Batch, error)

Creates a new batch for either word tokens or image embeddings (if embedSize is non-zero). Batches cannot contain both types at the same time. batchSize is the maximum number of entries that can be added per sequence

func (*Batch) Add ¶

func (b *Batch) Add(token int, embed []float32, pos int, logits bool, seqIds ...int)

Add adds either a token or an image embedding to the batch depending on the type when the batch was initialized. The other argument will be ignored. Adds to the batch with the given position for the given sequence ids, and optionally instructs to include logits.

func (*Batch) Clear ¶

func (b *Batch) Clear()

func (*Batch) Free ¶

func (b *Batch) Free()

func (*Batch) IsEmbedding ¶

func (b *Batch) IsEmbedding() bool

func (*Batch) NumTokens ¶

func (b *Batch) NumTokens() int

func (*Batch) Size ¶

func (b *Batch) Size() int

type Context ��

type Context struct {
	// contains filtered or unexported fields
}

func NewContextWithModel ¶

func NewContextWithModel(model *Model, params ContextParams) (*Context, error)

func (*Context) Decode ¶

func (c *Context) Decode(batch *Batch) error

func (*Context) GetEmbeddingsIth ¶

func (c *Context) GetEmbeddingsIth(i int) []float32

func (*Context) GetEmbeddingsSeq ¶

func (c *Context) GetEmbeddingsSeq(seqId int) []float32

Get the embeddings for a sequence id

func (*Context) GetLogitsIth ¶

func (c *Context) GetLogitsIth(i int) []float32

GetLogitsIth gets the logits for the ith token

func (*Context) KvCacheCanShift ¶

func (c *Context) KvCacheCanShift() bool

func (*Context) KvCacheClear ¶

func (c *Context) KvCacheClear()

func (*Context) KvCacheSeqAdd ¶

func (c *Context) KvCacheSeqAdd(seqId int, p0 int, p1 int, delta int)

func (*Context) KvCacheSeqCp ¶

func (c *Context) KvCacheSeqCp(srcSeqId int, dstSeqId int, p0 int, p1 int)

func (*Context) KvCacheSeqRm ¶

func (c *Context) KvCacheSeqRm(seqId int, p0 int, p1 int) bool

func (*Context) Model ¶

func (c *Context) Model() *Model

func (*Context) Synchronize ¶

func (c *Context) Synchronize()

type ContextParams ¶

type ContextParams struct {
	// contains filtered or unexported fields
}

func NewContextParams ¶

func NewContextParams(numCtx int, batchSize int, numSeqMax int, threads int, flashAttention bool, kvCacheType string) ContextParams

type Devices ¶

type Devices struct {
	ml.DeviceID
	LlamaID uint64
}

func EnumerateGPUs ¶

func EnumerateGPUs() []Devices

type Grammar ¶

type Grammar struct {
	// contains filtered or unexported fields
}

func NewGrammar ¶

func NewGrammar(grammar string, vocabIds []uint32, vocabValues []string, eogTokens []int32) *Grammar

func (*Grammar) Accept ¶

func (g *Grammar) Accept(token int32)

func (*Grammar) Apply ¶

func (g *Grammar) Apply(tokens []TokenData)

func (*Grammar) Free ¶

func (g *Grammar) Free()

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

func LoadModelFromFile ¶

func LoadModelFromFile(modelPath string, params ModelParams) (*Model, error)

func (*Model) AddBOSToken ¶

func (m *Model) AddBOSToken() bool

func (*Model) ApplyLoraFromFile ¶

func (m *Model) ApplyLoraFromFile(context *Context, loraPath string, scale float32, threads int) error

func (*Model) NEmbd ¶

func (m *Model) NEmbd() int

func (*Model) NumVocab ¶

func (m *Model) NumVocab() int

func (*Model) TokenIsEog ¶

func (m *Model) TokenIsEog(token int) bool

func (*Model) TokenToPiece ¶

func (m *Model) TokenToPiece(token int) string

func (*Model) Tokenize ¶

func (m *Model) Tokenize(text string, addSpecial bool, parseSpecial bool) ([]int, error)

func (*Model) Vocab ¶

func (m *Model) Vocab() *C.struct_llama_vocab

type ModelParams ¶

type ModelParams struct {
	Devices      []uint64
	NumGpuLayers int
	MainGpu      int
	UseMmap      bool
	TensorSplit  []float32
	Progress     func(float32)
	VocabOnly    bool
}

type MtmdChunk ¶

type MtmdChunk struct {
	Embed  []float32
	Tokens []int
}

type MtmdContext ¶

type MtmdContext struct {
	// contains filtered or unexported fields
}

vision processing

func NewMtmdContext ¶

func NewMtmdContext(llamaContext *Context, modelPath string) (*MtmdContext, error)

func (*MtmdContext) Free ¶

func (c *MtmdContext) Free()

func (*MtmdContext) MultimodalTokenize ¶

func (c *MtmdContext) MultimodalTokenize(llamaContext *Context, data []byte) ([]MtmdChunk, error)

type SamplingContext ¶

type SamplingContext struct {
	// contains filtered or unexported fields
}

sampling TODO: this is a temporary wrapper to allow calling C++ code from CGo

func NewSamplingContext ¶

func NewSamplingContext(model *Model, params SamplingParams) (*SamplingContext, error)

func (*SamplingContext) Accept ¶

func (s *SamplingContext) Accept(id int, applyGrammar bool)

func (*SamplingContext) Reset ¶

func (s *SamplingContext) Reset()

func (*SamplingContext) Sample ¶

func (s *SamplingContext) Sample(llamaContext *Context, idx int) int

type SamplingParams ¶

type SamplingParams struct {
	TopK           int
	TopP           float32
	MinP           float32
	TypicalP       float32
	Temp           float32
	RepeatLastN    int
	PenaltyRepeat  float32
	PenaltyFreq    float32
	PenaltyPresent float32
	PenalizeNl     bool
	Seed           uint32
	Grammar        string
}

type TokenData ¶

type TokenData struct {
	ID    int32
	Logit float32
}

Source Files ¶

View all Source files

llama.go

Directories ¶

Path	Synopsis
llama.cpp
common
src
tools/mtmd