From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Cost projecting and deploying LLMs to production

Cost projecting and deploying LLMs to production

- When we think about cost projecting with large language models, we're generally talking about either closed-source or open-sourced. So, let's start with closed-source. When you're thinking about deploying a closed-source LLM, like GPT-4 or Claude or Cohere, you're generally being charged by the token or more precisely batch of tokens. So, deploying an API implementation of a prompt on an LLM is more or less just counting the number of input and output tokens that match and matching that against the pricing. So, as an example, let's think about OpenAI's embedding product. When we used OpenAI's embeddings in previous lessons, what I didn't tell you is that they charge, at least at the time of recording, $0.0004 for every 1000 tokens that you embed using the engine that we did, Ada-002. Now, if we assume an average of 500 tokens per document, which is roughly a page of text, then the cost per document would be $0.0002. So, to embed a million documents, it would cost approximately $200.…

Contents