Skip to content

Commit f840ab1

Browse files
committed
Made README.md shorter
1 parent 00f238d commit f840ab1

File tree

1 file changed

+6
-192
lines changed

1 file changed

+6
-192
lines changed

‎README.md‎

Lines changed: 6 additions & 192 deletions
Original file line numberDiff line numberDiff line change
@@ -85,23 +85,19 @@ To see full Documentation and examples, go to [docs](https://docs.LangDrive.ai)
8585

8686
## Getting started
8787

88-
Thank you for taking interest in LangDrive!
89-
90-
LangDrive's set of connectors and services makes training LLMs easy, and you can get started with just a CSV file. By providing a Hugging Face API key you can train models and even host them in the cloud 😉
91-
92-
Import LangDrive in your project or configure and execute LangDrive directly from the CLI. The remainder of this article will explore using both approaches for training and deploy models with LangDrive. Along the way we will explore the use of a YAML doc to help with the connecting to data and services.
88+
The simplest way to get started with LangDrive is through your CLI. For a more detailed overview on getting started using the YAML config and API, please visit the [docs](https://docs.LangDrive.ai).
9389

9490

9591
#### Using the CLI
9692

9793
Node developers can train and deploy a model in 2 simple steps.
9894

99-
1. `npm install LangDrive`
100-
2. `LangDrive train --csv ./path/to/csvFileName.csv --hftoken apikey123 --deploy`
95+
1. `npm install langdrive`
96+
2. `langdrive train --csv ./path/to/csvFileName.csv --hftoken apikey123 --deploy`
10197

10298
In this case, LangDrive will retrieve the data, train a model, host it's weights on Hugging Face, and return an inference endpoint you may use to query the LLM.
10399

104-
The command `LangDrive train` is used to train the LLM, please see how to configure the command below.
100+
The command `langdrive train` is used to train the LLM, please see how to configure the command below.
105101

106102
args:
107103

@@ -112,191 +108,9 @@ args:
112108
- `deployToHf`: true | false
113109
- `hfModelPath`: The full path to your hugging face model repo where the model should be deployed. Format: hugging face username/model
114110

115-
It is assumed you do not want to deploy your model if you run `LangDrive train`. In such a case a link to where you can download the weights will be provided. Adding `--deploy` will return a link to the inferencing endpoint.
116-
117-
More information on how to ingest simple data using the CLI can be found in the [CLI](./cli.md) docs. For more comlex examples, read on...
118-
119-
120-
#### Getting Started with YAML
121-
122-
Getting the data and services you need shouldn't be the hardest part about training your models! Using YAML, you can configure more advanced data retrieval and training/ deployment strategies. Once configured, these settings are available for the standalone API and also from the CLI.
123-
124-
Refer to the [Yaml](./yaml.md) docs for more information.
125-
126-
###### Step 1: Configure Your Data Connectors
127-
128-
Our growing list of data-connectors allow anyone to retrieve data through a simple config doc. As LangDrive grows, our set of Open-Source integrations will grow. At the moment, you can connect to your data using our `email`, `firestore`, and `gdrive` classes.
129-
130-
In essense, config of these data-connectors is as straight forward as:
131-
132-
firestore:
133-
clientJson: "secrets/firebase_service_client.json"
134-
databaseURL: "env:FIREBASE_DATABASE_URL"
135-
136-
drive:
137-
clientJson: "secrets/drive_service_client.json"
138-
139-
email:
140-
password: env:GMAIL_PASSWORD
141-
email: env:GMAIL_EMAIL
142-
143-
You may specify .env variables using `env:` as a prefix for your secret information.
144-
145-
In our example above, the `clientJson` attribute is a [Firebase service account file](https://firebase.google.com/support/guides/service-accounts).
146-
147-
Once this information is provided, the entire __OAuth Process__ will automatically be handled on your behalf when using any associated library, regardless if it's used in the CLI or API. Please refer to our notes on [security](./security/authentication.md) for more information on the Outh2 process when using google.
148-
149-
###### Step 2: Configure Your LLM Tools
150-
151-
Once you have your data-connectors set up, config your training and deployment information. The last step will be to connect the two.
152-
153-
Training on Hugging Face and hosting the weights on Hugging Face hubs:
154-
```
155-
huggingface:
156-
hfToken: env:HUGGINGFACE_API_KEY
157-
deployToHf: false
158-
```
159-
160-
<b>NOTE</b>: To specify the model you want to train and where to host it:
161-
```
162-
huggingface:
163-
hfToken: env:HUGGINGFACE_API_KEY
164-
baseModel:
165-
name: "vilsonrodrigues/falcon-7b-instruct-sharded"
166-
trainedModel:
167-
name: "karpathic/falcon-7b-instruct-tuned"
168-
deployToHf: true
169-
```
170-
171-
Simple enough, huh? Here comes the final step.
172-
173-
###### Step 3: Connecting Your Data to Your LLM
174-
175-
To connect data to your LLM tool, we will need to create a new YAML entry `train:`.
176-
177-
Here we specify specific the specific data we want to train on. In the case of a CSV, a most simple example, we can use the `path` value to specify it's location.
178-
179-
LangDrive.yaml
180-
```
181-
train:
182-
path: ../shared.csv - Default Path for Input and Output
183-
inputValue: input - Attribute to extract from path
184-
outpuValue: output
185-
```
186-
187-
Now lets show how to query data from one of those third-party services we configured earlier.
188-
189-
Within the `train` entry, setting a `service` and `query` will do the trick. Set a data-connector as the `service` and one of its methods (and its args) as the `query` value. This will require exploring class documentation.
190-
191-
LangDrive.yaml
192-
```
193-
train:
194-
service: 'firebase'
195-
query:
196-
filterCollectionWithMultipleWhereClauseWithLimit:
197-
collection: "chat-state"
198-
filterKey: []
199-
filterData: []
200-
operation: []
201-
limit: 5
202-
```
203-
204-
In the example above we use the `filterCollectionWithMultipleWhereClauseWithLimit` method from LangDrive's Firestore class, passing arguements as specified in the LangDrive [Firestore](./api/firestore.md) docs. `collection` is the firestore collection name to retrieve data from, (limited to the first 5 entries). `filterKey` and `filterData` are not specified in this example but contain the field name/key to filter. The `operation` value specifies the firebstore query operator to use (For example, '==', '>=', '<=' ).
205-
206-
207-
Note:
208-
209-
> If the retrieved data has two columns (or attributes) they are assumed to be in the order [input, output]. If more columns exist, LangDrive grabs the first two columns after first looking for an 'inputValue' and 'outpuValue' column. The same logic applies for information retrieved from a query and works similarly for nested Json Objects (ala: `att1.attr2`)
210-
211-
#### Gettings Started with API:
212-
213-
Our classes can be exposed in the typical manner. For more information on any one class, please refer to it's corresponding documentation.
214-
215-
Coming Soon: Deploy self-hosted cloud based training infrastructure on AWS, Google Cloud Platform, Heroku, or Hugging Face. Code is currently being used internally and is under development prior to general release - code avaialbe in-repo under `/src/train`.
216-
217-
If you would like to interact directly directly with our training endpoint you can call our hosted training image directly via the LangDrive API.
218-
219-
Endpoint: `POST https://api.LangDrive.ai/train`
220-
221-
###### Request Body
222-
223-
The request accepts the following data in JSON format.
224-
```
225-
{
226-
"baseModel": "string",
227-
"hfToken": "string",
228-
"deployToHf": "Boolean",
229-
"trainingData": "Array",
230-
"hfModelPath": "string",
231-
}
232-
```
233-
234-
`baseModel`: The original model to train. This can be one of our supported models, listed below, or a Hugging Face model
235-
236-
- Type: String
237-
- Required: Yes
238-
239-
`hfToken`: Your Hugging Face token with write permissions. Learn how to create a Hugging Face token [here](https://huggingface.co/docs/hub/security-tokens).
240-
241-
- Type: String
242-
- Required: Yes
243-
244-
245-
`deployToHf`: A boolean representing whether or not to deploy the model to Hugging Face after training
246-
247-
- Type: Boolean
248-
- Required: Yes
249-
250-
`trainingData`: This is an array of objects. Each object must have two attributes: input and output. The input attribute represents the user’s input and output attribute represents the model’s output.
251-
252-
- Type: Array
253-
- Required: Yes
254-
255-
`hfModelPath`: The hugging face model repository to deploy the model to after training is complete
256-
257-
- Type: String
258-
- Required: No
259-
260-
261-
###### Response Body
262-
263-
The request returns the following data in JSON format.
264-
265-
```
266-
HTTP/1.1 200
267-
Content-type: application/json
268-
{
269-
"success": "true",
270-
}
271-
```
272-
273-
#### Model Training
274-
275-
We plan to expand the number of available models for training. at the moment only sharded models work as using PEFT is how these models are trained.
276-
277-
###### Models Support Matrix
278-
279-
######## Causal Language Modeling
280-
| Model | Supported |
281-
|--------------| ---- |
282-
| Falcon-7b-sharded ||
283-
| GPT-2 | Comming Soon |
284-
| Bloom | Comming Soon |
285-
| OPT | Comming Soon |
286-
| LLaMA | Comming Soon |
287-
| ChatGLM | Comming Soon |
111+
It is assumed you do not want to deploy your model if you run `langdrive train`. In such a case a link to where you can download the weights will be provided. Adding `--deploy` will return a link to the inferencing endpoint.
288112

289-
######## Model Type Support
290-
| Model Type | Support |
291-
|--------------| ---- |
292-
|Conditional Generation ||
293-
|Conditional Generation ||
294-
|Sequence Classification||
295-
|Token Classification ||
296-
|Text-to-Image Generation| |
297-
|Image Classification | |
298-
|Image to text (Multi-modal models) | |
299-
|Semantic Segmentation | |
113+
More information on how to ingest simple data using the CLI can be found in the [docs](https://docs.LangDrive.ai).
300114

301115
-----
302116

0 commit comments

Comments
 (0)