llama

package
v0.0.0-...-e65ccaf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2025 License: MIT Imports: 15 Imported by: 0

README

llama

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches
Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync
Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrKvCacheFull = errors.New("could not find a kv cache slot")

Functions

func BackendInit

func BackendInit()

func EnableDebug

func EnableDebug()

func FreeModel

func FreeModel(model *Model)

func GetModelArch

func GetModelArch(modelPath string) (string, error)

func PrintSystemInfo

func PrintSystemInfo() string

func Quantize

func Quantize(infile, outfile string, ftype uint32) error

func SchemaToGrammar

func SchemaToGrammar(schema []byte) []byte

SchemaToGrammar converts the provided JSON schema to a grammar. It returns nil if the provided schema is invalid JSON or an invalid JSON schema.

Types

type Batch

type Batch struct {
	// contains filtered or unexported fields
}

func NewBatch

func NewBatch(batchSize int, maxSeq int, embedSize int) (*Batch, error)

Creates a new batch for either word tokens or image embeddings (if embedSize is non-zero). Batches cannot contain both types at the same time. batchSize is the maximum number of entries that can be added per sequence

func (*Batch) Add

func (b *Batch) Add(token int, embed []float32, pos int, logits bool, seqIds ...int)

Add adds either a token or an image embedding to the batch depending on the type when the batch was initialized. The other argument will be ignored. Adds to the batch with the given position for the given sequence ids, and optionally instructs to include logits.

func (*Batch) Clear

func (b *Batch) Clear()

func (*Batch) Free

func (b *Batch) Free()

func (*Batch) IsEmbedding ��

func (b *Batch) IsEmbedding() bool

func (*Batch) NumTokens

func (b *Batch) NumTokens() int

func (*Batch) Size

func (b *Batch) Size() int

type ClipContext

type ClipContext struct {
	// contains filtered or unexported fields
}

vision processing

func NewClipContext

func NewClipContext(llamaContext *Context, modelPath string) (*ClipContext, error)

func (*ClipContext) Free

func (c *ClipContext) Free()

func (*ClipContext) NewEmbed

func (c *ClipContext) NewEmbed(llamaContext *Context, data []byte) ([][]float32, error)

type Context

type Context struct {
	// contains filtered or unexported fields
}

func NewContextWithModel

func NewContextWithModel(model *Model, params ContextParams) (*Context, error)

func (*Context) Decode

func (c *Context) Decode(batch *Batch) error

func (*Context) GetEmbeddingsIth

func (c *Context) GetEmbeddingsIth(i int) []float32

func (*Context) GetEmbeddingsSeq

func (c *Context) GetEmbeddingsSeq(seqId int) []float32

Get the embeddings for a sequence id

func (*Context) KvCacheClear

func (c *Context) KvCacheClear()

func (*Context) KvCacheDefrag

func (c *Context) KvCacheDefrag()

func (*Context) KvCacheSeqAdd

func (c *Context) KvCacheSeqAdd(seqId int, p0 int, p1 int, delta int)

func (*Context) KvCacheSeqCp

func (c *Context) KvCacheSeqCp(srcSeqId int, dstSeqId int, p0 int, p1 int)

func (*Context) KvCacheSeqRm

func (c *Context) KvCacheSeqRm(seqId int, p0 int, p1 int) bool

func (*Context) Model

func (c *Context) Model() *Model

func (*Context) SetCrossAttention

func (c *Context) SetCrossAttention(state bool)

func (*Context) Synchronize

func (c *Context) Synchronize()

type ContextParams

type ContextParams struct {
	// contains filtered or unexported fields
}

func NewContextParams

func NewContextParams(numCtx int, batchSize int, numSeqMax int, threads int, flashAttention bool, kvCacheType string) ContextParams

type MllamaContext

type MllamaContext struct {
	// contains filtered or unexported fields
}

func NewMllamaContext

func NewMllamaContext(llamaContext *Context, modelPath string) (*MllamaContext, error)

func (*MllamaContext) EmbedSize

func (m *MllamaContext) EmbedSize(llamaContext *Context) int

func (*MllamaContext) Free

func (m *MllamaContext) Free()

func (*MllamaContext) NewEmbed

func (m *MllamaContext) NewEmbed(llamaContext *Context, data []byte, aspectRatioId int) ([][]float32, error)

type Model

type Model struct {
	// contains filtered or unexported fields
}

func LoadModelFromFile

func LoadModelFromFile(modelPath string, params ModelParams) (*Model, error)

func (*Model) AddBOSToken

func (m *Model) AddBOSToken() bool

func (*Model) ApplyLoraFromFile

func (m *Model) ApplyLoraFromFile(context *Context, loraPath string, scale float32, threads int) error

func (*Model) NEmbd

func (m *Model) NEmbd() int

func (*Model) NumVocab

func (m *Model) NumVocab() int

func (*Model) TokenIsEog

func (m *Model) TokenIsEog(token int) bool

func (*Model) TokenToPiece

func (m *Model) TokenToPiece(token int) string

func (*Model) Tokenize

func (m *Model) Tokenize(text string, addSpecial bool, parseSpecial bool) ([]int, error)

type ModelParams

type ModelParams struct {
	NumGpuLayers int
	MainGpu      int
	UseMmap      bool
	UseMlock     bool
	TensorSplit  []float32
	Progress     func(float32)
	VocabOnly    bool
}

type SamplingContext

type SamplingContext struct {
	// contains filtered or unexported fields
}

sampling TODO: this is a temporary wrapper to allow calling C++ code from CGo

func NewSamplingContext

func NewSamplingContext(model *Model, params SamplingParams) (*SamplingContext, error)

func (*SamplingContext) Accept

func (s *SamplingContext) Accept(id int, applyGrammar bool)

func (*SamplingContext) Reset

func (s *SamplingContext) Reset()

func (*SamplingContext) Sample

func (s *SamplingContext) Sample(llamaContext *Context, idx int) int

type SamplingParams

type SamplingParams struct {
	TopK           int
	TopP           float32
	MinP           float32
	TypicalP       float32
	Temp           float32
	RepeatLastN    int
	PenaltyRepeat  float32
	PenaltyFreq    float32
	PenaltyPresent float32
	Mirostat       int
	MirostatTau    float32
	MirostatEta    float32
	PenalizeNl     bool
	Seed           uint32
	Grammar        string
}

Directories

Path Synopsis
llama.cpp
src