Skip to content

Commit 3ecaf0b

Browse files
Feature/cpp baby llama rework (#2903)
* Baby Llama - Porting run.c for integration and fixed clang type conversion errors. Signed-off-by: Shrinath Suresh <[email protected]> Custom preprocess implementation Signed-off-by: Shrinath Suresh <[email protected]> Free memory only after the inference is done Signed-off-by: Shrinath Suresh <[email protected]> Implement Postprocess Signed-off-by: Shrinath Suresh <[email protected]> Setting Fast compiler option Signed-off-by: Shrinath Suresh <[email protected]> Reading checkpoint path and tokenizer path from config file using folly Signed-off-by: Shrinath Suresh <[email protected]> Removing run.c from cmake Signed-off-by: Shrinath Suresh <[email protected]> Replace auto with appropriate data type Signed-off-by: Shrinath Suresh <[email protected]> Using smartpointers and initializing the vector with appropriate size upfront Signed-off-by: Shrinath Suresh <[email protected]> Using smartpointers Signed-off-by: Shrinath Suresh <[email protected]> Directly converting the tensor values to prompt token ids Signed-off-by: Shrinath Suresh <[email protected]> Moving run.c and common variables to .cc file Signed-off-by: Shrinath Suresh <[email protected]> Moving run.c to a separate folder Signed-off-by: Shrinath Suresh <[email protected]> Uncommenting the original run.c main method Signed-off-by: Shrinath Suresh <[email protected]> Implemented destructor to free up resources Signed-off-by: Shrinath Suresh <[email protected]> Supporting files for unit test Signed-off-by: Shrinath Suresh <[email protected]> Processing all the batch inputs Signed-off-by: Shrinath Suresh <[email protected]> Setting InferenceMode guard Signed-off-by: Shrinath Suresh <[email protected]> Updating InferenceMode to use torch::InferenceMode Signed-off-by: Shrinath Suresh <[email protected]> Updating class name to BabyLlamaHandler Signed-off-by: Shrinath Suresh <[email protected]> Renaming llm_handler target to babyllama_handler Signed-off-by: Shrinath Suresh <[email protected]> Adding dummy pt file Signed-off-by: Shrinath Suresh <[email protected]> Typo Fix Signed-off-by: Shrinath Suresh <[email protected]> Calculate tokens/per second for batch input Signed-off-by: Shrinath Suresh <[email protected]> Adding README.md for babyllama example Signed-off-by: Shrinath Suresh <[email protected]> Fixing out-of-bound mem access in babyllama example Move model instance out of ts_backend Use shared_ptr<void> for model to detangle from torchscript Move BaseHAndler to backends/handler Move model instance into core Remove Torchscript as a backend and implement it as a handler Move torchscript test out of backend folder Remove dummy.pt in babyllama + update README + mvoe babyllama test to new examples/examples_test.cc file * fix spell check * Move cpp babyllama example to main example folder * Add last successful location to error message in handle function * Fix babyllama batching by changing input/output from tensor to IValue * rename prompt file * Fix spellcheck --------- Co-authored-by: Shrinath Suresh <[email protected]>
1 parent 9e6f1c2 commit 3ecaf0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1812
-470
lines changed

cpp/README.md

+46-23
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ python ts_scripts/install_dependencies.py --cpp [--cuda=cu121|cu118]
1212
### Building the backend
1313
```
1414
## Dev Build
15-
cd serve/cpp
15+
cd serve/cpp
1616
./build.sh [-g cu121|cu118]
1717
1818
## Install TorchServe from source
@@ -34,32 +34,60 @@ cd serve
3434
torchserve torchserve --ncs --start --model-store model_store
3535
```
3636
## Backend
37-
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. [src/backends/core/backend.hh](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh) defines the APIs of backend to support multiple different platforms such as MxNet, ONNX and so on.
38-
* [Backend](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L60) defines function `LoadModelInternal` to support model loading on different platforms.
39-
* [ModelInstance](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L25) represents a model copy. The function `Predict` is to support prediction on different platforms.
40-
### TorchScripted Backend
41-
By default, TorchServe cpp provides [TorchScripted backend](https://github.com/pytorch/serve/tree/cpp_backend/cpp/src/backends/torch_scripted). Its [base handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh) defines APIs to customize handler.
42-
* [Initialize](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L29)
43-
* [LoadModel](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L37)
44-
* [Preprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L40)
45-
* [Inference](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L46)
46-
* [Postprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L53)
37+
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. Other platforms such as MxNet, ONNX can be supported through custom handlers following the TorchScript example [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh).
38+
### Custom Handler
39+
By default, TorchServe cpp provides a handler for TorchScript [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh). Its uses the [BaseHandler](https://github.com/pytorch/serve/blob/master/src/backends/handler/base_handler.hh) which defines the APIs to customize handler.
40+
* [Initialize](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L29)
41+
* [LoadModel](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L37)
42+
* [Preprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L40)
43+
* [Inference](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L46)
44+
* [Postprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L53)
4745
#### Example
48-
##### Using BaseHandler
49-
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
50-
* set handler as "BaseHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
46+
##### Using TorchScriptHandler
47+
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
48+
* set handler as "TorchScriptHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5149
```
52-
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler BaseHandler --runtime LSP
50+
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
5351
```
5452
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
55-
##### Using customized handler
53+
##### Using Custom Handler
5654
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
57-
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
55+
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5856
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5957
```
6058
torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
6159
```
6260
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
61+
##### BabyLLama Example
62+
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
63+
To run the example we need to download the weights as well as tokenizer files:
64+
```bash
65+
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
66+
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
67+
```
68+
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
69+
```bash
70+
{
71+
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
72+
"tokenizer_path" : "/home/ubuntu/serve/cpp/src/examples/babyllama/tokenizer.bin"
73+
}
74+
```
75+
Then we can create the mar file and deploy it with:
76+
```bash
77+
cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
78+
torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
79+
mkdir model_store && mv llm.mar model_store/
80+
torchserve --ncs --start --model-store model_store
81+
82+
curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar"
83+
```
84+
The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared library name (as defined in our [CMakeLists.txt](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/CMakeLists.txt)) as well as the class name we chose for our [custom handler class](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/babyllama/baby_llama_handler.cc) which derives its properties from BaseHandler.
85+
86+
To test the model we can run:
87+
```bash
88+
cd serve/cpp/test/resources/torchscript_model/babyllama/
89+
curl http://localhost:8080/predictions/llm -T prompt.txt
90+
```
6391
##### Mnist example
6492
* Transform data on client side. For example:
6593
```
@@ -75,9 +103,4 @@ image = Image.open("examples/image_classifier/mnist/test_data/0.png")
75103
image = image_processing(image)
76104
torch.save(image, "0_png.pt")
77105
```
78-
* Run model registration and prediction: [Using BaseHandler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).
79-
80-
81-
82-
83-
106+
* Run model registration and prediction: [Using BaseHandler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).

cpp/build.sh

+4
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,10 @@ function build() {
212212
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
213213
fi
214214

215+
if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
216+
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
217+
fi
218+
215219
cd $DEPS_DIR/../..
216220
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
217221
$DEPS_DIR/../test/torchserve_cpp_test

cpp/src/backends/CMakeLists.txt

+14-27
Original file line numberDiff line numberDiff line change
@@ -15,40 +15,27 @@ target_link_libraries(ts_backends_protocol PRIVATE ts_utils ${FOLLY_LIBRARIES})
1515
install(TARGETS ts_backends_protocol DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)
1616

1717
# build library ts_backend_core
18-
set(TS_BACKENDS_CORE_SOURCE_FILES "")
19-
list(APPEND TS_BACKENDS_CORE_SOURCE_FILES ${TS_BACKENDS_CORE_SRC_DIR}/backend.cc)
20-
add_library(ts_backends_core SHARED ${TS_BACKENDS_CORE_SOURCE_FILES})
18+
set(BACKEND_SOURCE_FILES "")
19+
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/backend.cc)
20+
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/model_instance.cc)
21+
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/base_handler.cc)
22+
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/torch_scripted_handler.cc)
23+
add_library(ts_backends_core SHARED ${BACKEND_SOURCE_FILES})
2124
target_include_directories(ts_backends_core PUBLIC ${TS_BACKENDS_CORE_SRC_DIR})
22-
target_link_libraries(ts_backends_core PRIVATE ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
25+
target_link_libraries(ts_backends_core PUBLIC ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
2326
install(TARGETS ts_backends_core DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)
2427

25-
# build library ts_backend_torch_scripted
26-
set(TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES "")
27-
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/torch_scripted_backend.cc)
28-
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler/base_handler.cc)
29-
add_library(ts_backends_torch_scripted SHARED ${TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES})
30-
target_include_directories(ts_backends_torch_scripted PUBLIC
31-
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR} ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler ${TORCH_INCLUDE_DIRS})
32-
target_link_libraries(ts_backends_torch_scripted PUBLIC ts_utils ts_backends_core ${TORCH_LIBRARIES})
33-
install(TARGETS ts_backends_torch_scripted DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)
34-
35-
# build library ts_backend_torch_deploy
36-
#set(TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES "")
37-
#add_library(ts_backends_torch_deploy SHARED ${TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES})
38-
#target_include_directories(ts_backends_torch_deploy PUBLIC ${TS_BACKENDS_TORCH_DEPLOY_SRC_DIR})
39-
#target_link_libraries(ts_backends_torch_deploy PRIVATE ts_utils ts_backends_core ${TORCH_LIBRARIES})
40-
4128
# build exe model_worker_socket
42-
add_executable(model_worker_socket
29+
add_executable(model_worker_socket
4330
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker_socket.cc"
4431
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker.cc"
4532
)
46-
target_include_directories(model_worker_socket PRIVATE
33+
target_include_directories(model_worker_socket PRIVATE
4734
${TS_BACKENDS_CORE_SRC_DIR}
48-
${TS_BACKENDS_PROTOCOL_SRC_DIR}
49-
${TS_BACKENDS_PROCESS_SRC_DIR}
50-
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
35+
${TS_BACKENDS_PROTOCOL_SRC_DIR}
36+
${TS_BACKENDS_PROCESS_SRC_DIR}
37+
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
5138
)
52-
target_link_libraries(model_worker_socket
53-
PRIVATE ts_backends_core ts_backends_protocol ts_backends_torch_scripted ${FOLLY_LIBRARIES})
39+
target_link_libraries(model_worker_socket
40+
PRIVATE ts_backends_core ts_backends_protocol ${FOLLY_LIBRARIES} ${TORCH_LIBRARIES})
5441
install(TARGETS model_worker_socket DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/bin)

cpp/src/backends/core/backend.cc

+92-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,63 @@
11
#include "src/backends/core/backend.hh"
22

3+
#include <memory>
4+
5+
#include "src/backends/handler/handler_factory.hh"
6+
37
namespace torchserve {
8+
Backend::Backend() {}
9+
10+
Backend::~Backend() {
11+
handler_.reset();
12+
model_instance_table_.clear();
13+
// Todo: do proper cleanup
14+
// dl_loader_->CloseDL();
15+
}
16+
17+
bool Backend::Initialize(const std::string &model_dir) {
18+
random_generator_.seed(time(0));
19+
manifest_ = std::make_shared<torchserve::Manifest>();
20+
// TODO: windows
21+
if (!manifest_->Initialize(
22+
fmt::format("{}/MAR-INF/MANIFEST.json", model_dir))) {
23+
return false;
24+
}
25+
26+
LoadHandler(model_dir);
27+
28+
if (!handler_) {
29+
return false;
30+
}
31+
32+
handler_->Initialize(model_dir, manifest_);
33+
34+
return true;
35+
}
36+
37+
void Backend::LoadHandler(const std::string &model_dir) {
38+
const std::string &handler_str = manifest_->GetModel().handler;
39+
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter);
40+
if (delimiter_pos != std::string::npos) {
41+
#ifdef __APPLE__
42+
std::string lib_path = fmt::format("{}/{}.dylib", model_dir,
43+
handler_str.substr(0, delimiter_pos));
44+
#else
45+
std::string lib_path = fmt::format("{}/{}.so", model_dir,
46+
handler_str.substr(0, delimiter_pos));
47+
#endif
48+
std::string handler_class_name = handler_str.substr(delimiter_pos + 1);
49+
std::string allocator_func = fmt::format("allocator{}", handler_class_name);
50+
std::string deleter_func = fmt::format("deleter{}", handler_class_name);
51+
52+
dl_loader_ = std::make_unique<DLLoader<BaseHandler>>(
53+
lib_path, allocator_func, deleter_func);
54+
dl_loader_->OpenDL();
55+
handler_ = dl_loader_->GetInstance();
56+
} else {
57+
handler_ = HandlerFactory::GetInstance().createHandler(handler_str);
58+
}
59+
}
60+
461
std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
562
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
663
/**
@@ -13,12 +70,43 @@ std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
1370
* - status_READY: return the model instance if it is already.
1471
*
1572
* Common steps:
16-
* https://github.com/pytorch/serve/blob/master/ts/model_loader.py#L62
73+
* serve/blob/master/ts/model_loader.py#L62
1774
*/
1875

76+
// TODO: support request envelope:
77+
// serve/tree/master/ts/torch_handler/request_envelope
78+
1979
return LoadModelInternal(std::move(load_model_request));
2080
}
2181

82+
std::unique_ptr<LoadModelResponse> Backend::LoadModelInternal(
83+
std::shared_ptr<LoadModelRequest> load_model_request) {
84+
std::string model_instance_id = BuildModelInstanceId(load_model_request);
85+
try {
86+
model_instance_table_[model_instance_id] = {
87+
ModelInstanceStatus::INIT, std::shared_ptr<ModelInstance>(nullptr)};
88+
89+
auto result = handler_->LoadModel(load_model_request);
90+
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::READY,
91+
std::make_shared<ModelInstance>(
92+
model_instance_id, std::move(result.first),
93+
handler_, std::move(result.second)));
94+
95+
ready_model_instance_ids_.emplace_back(model_instance_id);
96+
std::string message =
97+
fmt::format("loaded model {}", load_model_request->model_name);
98+
return std::make_unique<LoadModelResponse>(
99+
// TODO: check current response msg content
100+
200, message);
101+
} catch (const c10::Error &e) {
102+
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::FAILED,
103+
std::shared_ptr<ModelInstance>(nullptr));
104+
return std::make_unique<LoadModelResponse>(
105+
// TODO: check existing
106+
500, e.msg());
107+
}
108+
}
109+
22110
std::string Backend::BuildModelInstanceId(
23111
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
24112
std::string device_type("cpu");
@@ -30,15 +118,15 @@ std::string Backend::BuildModelInstanceId(
30118
}
31119

32120
void Backend::SetModelInstanceInfo(
33-
const std::string& model_instance_id, ModelInstanceStatus new_status,
121+
const std::string &model_instance_id, ModelInstanceStatus new_status,
34122
std::shared_ptr<torchserve::ModelInstance> new_model_instance) {
35123
model_instance_table_[model_instance_id].status = new_status;
36124
model_instance_table_[model_instance_id].model_instance =
37125
std::move(new_model_instance);
38126
}
39127

40128
torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
41-
const std::string& model_instance_id) {
129+
const std::string &model_instance_id) {
42130
auto model_instance_info = model_instance_table_.find(model_instance_id);
43131
if (model_instance_info == model_instance_table_.end()) {
44132
return torchserve::Backend::ModelInstanceStatus::NOT_INIT;
@@ -47,7 +135,7 @@ torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
47135
}
48136

49137
std::shared_ptr<torchserve::ModelInstance> Backend::GetModelInstance(
50-
const std::string& model_instance_id) {
138+
const std::string &model_instance_id) {
51139
auto model_instance_info = model_instance_table_.find(model_instance_id);
52140
if (model_instance_info == model_instance_table_.end()) {
53141
return std::shared_ptr<torchserve::ModelInstance>(nullptr);

0 commit comments

Comments
 (0)