You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Baby Llama - Porting run.c for integration and fixed clang type conversion errors.
Signed-off-by: Shrinath Suresh <[email protected]>
Custom preprocess implementation
Signed-off-by: Shrinath Suresh <[email protected]>
Free memory only after the inference is done
Signed-off-by: Shrinath Suresh <[email protected]>
Implement Postprocess
Signed-off-by: Shrinath Suresh <[email protected]>
Setting Fast compiler option
Signed-off-by: Shrinath Suresh <[email protected]>
Reading checkpoint path and tokenizer path from config file using folly
Signed-off-by: Shrinath Suresh <[email protected]>
Removing run.c from cmake
Signed-off-by: Shrinath Suresh <[email protected]>
Replace auto with appropriate data type
Signed-off-by: Shrinath Suresh <[email protected]>
Using smartpointers and initializing the vector with appropriate size upfront
Signed-off-by: Shrinath Suresh <[email protected]>
Using smartpointers
Signed-off-by: Shrinath Suresh <[email protected]>
Directly converting the tensor values to prompt token ids
Signed-off-by: Shrinath Suresh <[email protected]>
Moving run.c and common variables to .cc file
Signed-off-by: Shrinath Suresh <[email protected]>
Moving run.c to a separate folder
Signed-off-by: Shrinath Suresh <[email protected]>
Uncommenting the original run.c main method
Signed-off-by: Shrinath Suresh <[email protected]>
Implemented destructor to free up resources
Signed-off-by: Shrinath Suresh <[email protected]>
Supporting files for unit test
Signed-off-by: Shrinath Suresh <[email protected]>
Processing all the batch inputs
Signed-off-by: Shrinath Suresh <[email protected]>
Setting InferenceMode guard
Signed-off-by: Shrinath Suresh <[email protected]>
Updating InferenceMode to use torch::InferenceMode
Signed-off-by: Shrinath Suresh <[email protected]>
Updating class name to BabyLlamaHandler
Signed-off-by: Shrinath Suresh <[email protected]>
Renaming llm_handler target to babyllama_handler
Signed-off-by: Shrinath Suresh <[email protected]>
Adding dummy pt file
Signed-off-by: Shrinath Suresh <[email protected]>
Typo Fix
Signed-off-by: Shrinath Suresh <[email protected]>
Calculate tokens/per second for batch input
Signed-off-by: Shrinath Suresh <[email protected]>
Adding README.md for babyllama example
Signed-off-by: Shrinath Suresh <[email protected]>
Fixing out-of-bound mem access in babyllama example
Move model instance out of ts_backend
Use shared_ptr<void> for model to detangle from torchscript
Move BaseHAndler to backends/handler
Move model instance into core
Remove Torchscript as a backend and implement it as a handler
Move torchscript test out of backend folder
Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file
* fix spell check
* Move cpp babyllama example to main example folder
* Add last successful location to error message in handle function
* Fix babyllama batching by changing input/output from tensor to IValue
* rename prompt file
* Fix spellcheck
---------
Co-authored-by: Shrinath Suresh <[email protected]>
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. [src/backends/core/backend.hh](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh) defines the APIs of backend to support multiple different platforms such as MxNet, ONNX and so on.
38
-
*[Backend](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L60) defines function `LoadModelInternal` to support model loading on different platforms.
39
-
*[ModelInstance](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L25) represents a model copy. The function `Predict` is to support prediction on different platforms.
40
-
### TorchScripted Backend
41
-
By default, TorchServe cpp provides [TorchScripted backend](https://github.com/pytorch/serve/tree/cpp_backend/cpp/src/backends/torch_scripted). Its [base handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh) defines APIs to customize handler.
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. Other platforms such as MxNet, ONNX can be supported through custom handlers following the TorchScript example [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh).
38
+
### Custom Handler
39
+
By default, TorchServe cpp provides a handler for TorchScript [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh). Its uses the [BaseHandler](https://github.com/pytorch/serve/blob/master/src/backends/handler/base_handler.hh) which defines the APIs to customize handler.
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
55
-
##### Using customized handler
53
+
##### Using Custom Handler
56
54
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
57
-
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
55
+
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
58
56
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
61
+
##### BabyLLama Example
62
+
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
63
+
To run the example we need to download the weights as well as tokenizer files:
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar"
83
+
```
84
+
The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared library name (as defined in our [CMakeLists.txt](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/CMakeLists.txt)) as well as the class name we chose for our [custom handler class](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/babyllama/baby_llama_handler.cc) which derives its properties from BaseHandler.
85
+
86
+
To test the model we can run:
87
+
```bash
88
+
cd serve/cpp/test/resources/torchscript_model/babyllama/
* Run model registration and prediction: [Using BaseHandler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).
79
-
80
-
81
-
82
-
83
-
106
+
* Run model registration and prediction: [Using BaseHandler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).
0 commit comments