Skip to content

Conversation

@codelion
Copy link
Member

@codelion codelion commented Nov 10, 2024

  • Ability to use local built-in inference server
  • Allows logprobs in output responses (this is not supported in ollama)
  • Allows multiple response sampling (this is not supported in ollama)
  • Supports multiple LoRAs (this is not supported in ollama)
  • Supports prompt caching
  • Supports alternative decoding techniques like cot_decoding and entropy_decoding
- allow loading any model from hf and any lora adapter
- support caching
- support batchs
- support optimized attention
- add dynamic temperature
add support for logprobs
fix logprobs return
fix multiple loras loading and setting of adapters
bump version for new release
@codelion codelion merged commit 7381008 into main Nov 13, 2024
@codelion codelion deleted the feat-add-local-inference branch November 13, 2024 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants