llama

package

v0.0.0-...-e65ccaf Latest Latest Go to latest Published: Feb 20, 2025 License: MIT Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zhaohb/ollama-ov

Links

Open Source Insights

README ¶

`llama`

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches

Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync

Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.

Documentation ¶

Index ¶

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrKvCacheFull = errors.New("could not find a kv cache slot")

Functions ¶

func BackendInit ¶

func BackendInit()

func EnableDebug ¶

func EnableDebug()

func FreeModel ¶

func FreeModel(model *Model)

func GetModelArch ¶

func GetModelArch(modelPath string) (string, error)

func PrintSystemInfo ¶

func PrintSystemInfo() string

func Quantize ¶

func Quantize(infile, outfile string, ftype uint32) error

func SchemaToGrammar ¶

func SchemaToGrammar(schema []byte) []byte

SchemaToGrammar converts the provided JSON schema to a grammar. It returns nil if the provided schema is invalid JSON or an invalid JSON schema.

Types ¶

type Batch ¶

type Batch struct {
	// contains filtered or unexported fields
}

func NewBatch ¶

func NewBatch(batchSize int, maxSeq int, embedSize int) (*Batch, error)

Creates a new batch for either word tokens or image embeddings (if embedSize is non-zero). Batches cannot contain both types at the same time. batchSize is the maximum number of entries that can be added per sequence

func (*Batch) Add ¶

func (b *Batch) Add(token int, embed []float32, pos int, logits bool, seqIds ...int)

Add adds either a token or an image embedding to the batch depending on the type when the batch was initialized. The other argument will be ignored. Adds to the batch with the given position for the given sequence ids, and optionally instructs to include logits.

func (*Batch) Clear ¶

func (b *Batch) Clear()

func (*Batch) Free ¶

func (b *Batch) Free()

func (*Batch) IsEmbedding ��

func (b *Batch) IsEmbedding() bool

func (*Batch) NumTokens ¶

func (b *Batch) NumTokens() int

func (*Batch) Size ¶

func (b *Batch) Size() int

type ClipContext ¶

type ClipContext struct {
	// contains filtered or unexported fields
}

vision processing

func NewClipContext ¶

func NewClipContext(llamaContext *Context, modelPath string) (*ClipContext, error)

func (*ClipContext) Free ¶

func (c *ClipContext) Free()

func (*ClipContext) NewEmbed ¶

func (c *ClipContext) NewEmbed(llamaContext *Context, data []byte) ([][]float32, error)

type Context ¶

type Context struct {
	// contains filtered or unexported fields
}

func NewContextWithModel ¶

func NewContextWithModel(model *Model, params ContextParams) (*Context, error)

func (*Context) Decode ¶

func (c *Context) Decode(batch *Batch) error

func (*Context) GetEmbeddingsIth ¶

func (c *Context) GetEmbeddingsIth(i int) []float32

func (*Context) GetEmbeddingsSeq ¶

func (c *Context) GetEmbeddingsSeq(seqId int) []float32

Get the embeddings for a sequence id

func (*Context) KvCacheClear ¶

func (c *Context) KvCacheClear()

func (*Context) KvCacheDefrag ¶

func (c *Context) KvCacheDefrag()

func (*Context) KvCacheSeqAdd ¶

func (c *Context) KvCacheSeqAdd(seqId int, p0 int, p1 int, delta int)

func (*Context) KvCacheSeqCp ¶

func (c *Context) KvCacheSeqCp(srcSeqId int, dstSeqId int, p0 int, p1 int)

func (*Context) KvCacheSeqRm ¶

func (c *Context) KvCacheSeqRm(seqId int, p0 int, p1 int) bool

func (*Context) Model ¶

func (c *Context) Model() *Model

func (*Context) SetCrossAttention ¶

func (c *Context) SetCrossAttention(state bool)

func (*Context) Synchronize ¶

func (c *Context) Synchronize()

type ContextParams ¶

type ContextParams struct {
	// contains filtered or unexported fields
}

func NewContextParams ¶

func NewContextParams(numCtx int, batchSize int, numSeqMax int, threads int, flashAttention bool, kvCacheType string) ContextParams

type MllamaContext ¶

type MllamaContext struct {
	// contains filtered or unexported fields
}

func NewMllamaContext ¶

func NewMllamaContext(llamaContext *Context, modelPath string) (*MllamaContext, error)

func (*MllamaContext) EmbedSize ¶

func (m *MllamaContext) EmbedSize(llamaContext *Context) int

func (*MllamaContext) Free ¶

func (m *MllamaContext) Free()

func (*MllamaContext) NewEmbed ¶

func (m *MllamaContext) NewEmbed(llamaContext *Context, data []byte, aspectRatioId int) ([][]float32, error)

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

func LoadModelFromFile ¶

func LoadModelFromFile(modelPath string, params ModelParams) (*Model, error)

func (*Model) AddBOSToken ¶

func (m *Model) AddBOSToken() bool

func (*Model) ApplyLoraFromFile ¶

func (m *Model) ApplyLoraFromFile(context *Context, loraPath string, scale float32, threads int) error

func (*Model) NEmbd ¶

func (m *Model) NEmbd() int

func (*Model) NumVocab ¶

func (m *Model) NumVocab() int

func (*Model) TokenIsEog ¶

func (m *Model) TokenIsEog(token int) bool

func (*Model) TokenToPiece ¶

func (m *Model) TokenToPiece(token int) string

func (*Model) Tokenize ¶

func (m *Model) Tokenize(text string, addSpecial bool, parseSpecial bool) ([]int, error)

type ModelParams ¶

type ModelParams struct {
	NumGpuLayers int
	MainGpu      int
	UseMmap      bool
	UseMlock     bool
	TensorSplit  []float32
	Progress     func(float32)
	VocabOnly    bool
}

type SamplingContext ¶

type SamplingContext struct {
	// contains filtered or unexported fields
}

sampling TODO: this is a temporary wrapper to allow calling C++ code from CGo

func NewSamplingContext ¶

func NewSamplingContext(model *Model, params SamplingParams) (*SamplingContext, error)

func (*SamplingContext) Accept ¶

func (s *SamplingContext) Accept(id int, applyGrammar bool)

func (*SamplingContext) Reset ¶

func (s *SamplingContext) Reset()

func (*SamplingContext) Sample ¶

func (s *SamplingContext) Sample(llamaContext *Context, idx int) int

type SamplingParams ¶

type SamplingParams struct {
	TopK           int
	TopP           float32
	MinP           float32
	TypicalP       float32
	Temp           float32
	RepeatLastN    int
	PenaltyRepeat  float32
	PenaltyFreq    float32
	PenaltyPresent float32
	Mirostat       int
	MirostatTau    float32
	MirostatEta    float32
	PenalizeNl     bool
	Seed           uint32
	Grammar        string
}

Source Files ¶

View all Source files

llama.go

Directories ¶

Path	Synopsis
llama.cpp
common
examples/llava
src