Skip to content

Commit a07b7d9

Browse files
Llama.cpp example for cpp backend (#2904)
* Version1 of llm inference with cpp backend Signed-off-by: Shrinath Suresh <[email protected]> Updating llm handler - loadmodel, preprocess, inference methods Signed-off-by: Shrinath Suresh <[email protected]> Fixed infinite lock by adding request ids to the preprocess method Signed-off-by: Shrinath Suresh <[email protected]> Adding test script for finding tokens per second llama-7b-chat and ggml version Signed-off-by: Shrinath Suresh <[email protected]> GGUF Compatibility Signed-off-by: Shrinath Suresh <[email protected]> Fixing unit tests Signed-off-by: Shrinath Suresh <[email protected]> Fix typo Signed-off-by: Shrinath Suresh <[email protected]> Using folly to read config path Signed-off-by: Shrinath Suresh <[email protected]> Removing debug couts Signed-off-by: Shrinath Suresh <[email protected]> Processing all the items in the batch Signed-off-by: Shrinath Suresh <[email protected]> Adopted llama.cpp api changes * Adapt to removal of TS backend * Re-add test for llama.cpp example * Add llama.cpp as a submodule * Point to correct llama.cpp installation * Build llama.cpp in build.sh * Skip llama.cpp example test if model weights are not available * renamed torchscript_model folder into examples * Adjust to new base_handler interface * Remove debug statement * Rename llamacpp class + remove dummy.pt file * Move llamacpp config.json * Moved and created prompt file * Reset context for mutiple batch entries * Add doc for llamacpp example * Fix spell check * Replace output example in llamacpp example * Move cpp example src into main examples folder * Convert cerr/cout into logs --------- Co-authored-by: Shrinath Suresh <[email protected]>
1 parent 3ecaf0b commit a07b7d9

File tree

40 files changed

+564
-67
lines changed

40 files changed

+564
-67
lines changed

.gitmodules

+3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
[submodule "third_party/google/rpc"]
22
path = third_party/google/rpc
33
url = https://github.com/googleapis/googleapis.git
4+
[submodule "cpp/third-party/llama.cpp"]
5+
path = cpp/third-party/llama.cpp
6+
url = https://github.com/ggerganov/llama.cpp.git

cpp/README.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -49,23 +49,23 @@ By default, TorchServe cpp provides a handler for TorchScript [src/backends/hand
4949
```
5050
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
5151
```
52-
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
52+
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/base_handler) of unzipped model mar file.
5353
##### Using Custom Handler
5454
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
5555
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5656
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5757
```
5858
torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
5959
```
60-
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
60+
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/mnist_handler) of unzipped model mar file.
6161
##### BabyLLama Example
6262
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
6363
To run the example we need to download the weights as well as tokenizer files:
6464
```bash
6565
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
6666
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
6767
```
68-
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
68+
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/examples/babyllama/babyllama_handler/config.json).
6969
```bash
7070
{
7171
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
@@ -74,7 +74,7 @@ Subsequently, we need to adjust the paths according to our local file structure
7474
```
7575
Then we can create the mar file and deploy it with:
7676
```bash
77-
cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
77+
cd serve/cpp/test/resources/examples/babyllama/babyllama_handler
7878
torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
7979
mkdir model_store && mv llm.mar model_store/
8080
torchserve --ncs --start --model-store model_store
@@ -85,7 +85,7 @@ The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared
8585

8686
To test the model we can run:
8787
```bash
88-
cd serve/cpp/test/resources/torchscript_model/babyllama/
88+
cd serve/cpp/test/resources/examples/babyllama/
8989
curl http://localhost:8080/predictions/llm -T prompt.txt
9090
```
9191
##### Mnist example

cpp/build.sh

+11-10
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ function install_yaml_cpp() {
136136
cd "$BWD" || exit
137137
}
138138

139+
function build_llama_cpp() {
140+
BWD=$(pwd)
141+
LLAMA_CPP_SRC_DIR=$BASE_DIR/third-party/llama.cpp
142+
cd "${LLAMA_CPP_SRC_DIR}"
143+
make
144+
cd "$BWD" || exit
145+
}
146+
139147
function build() {
140148
MAYBE_BUILD_QUIC=""
141149
if [ "$WITH_QUIC" == true ] ; then
@@ -206,16 +214,6 @@ function build() {
206214
echo -e "${COLOR_GREEN}torchserve_cpp build is complete. To run unit test: \
207215
./_build/test/torchserve_cpp_test ${COLOR_OFF}"
208216

209-
if [ -f "$DEPS_DIR/../src/examples/libmnist_handler.dylib" ]; then
210-
mv $DEPS_DIR/../src/examples/libmnist_handler.dylib $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.dylib
211-
elif [ -f "$DEPS_DIR/../src/examples/libmnist_handler.so" ]; then
212-
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
213-
fi
214-
215-
if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
216-
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
217-
fi
218-
219217
cd $DEPS_DIR/../..
220218
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
221219
$DEPS_DIR/../test/torchserve_cpp_test
@@ -311,10 +309,13 @@ mkdir -p "$LIBS_DIR"
311309
# Must execute from the directory containing this script
312310
cd $BASE_DIR
313311

312+
git submodule update --init --recursive
313+
314314
install_folly
315315
install_kineto
316316
install_libtorch
317317
install_yaml_cpp
318+
build_llama_cpp
318319
build
319320
symlink_torch_libs
320321
symlink_yaml_cpp_lib

cpp/src/examples/CMakeLists.txt

+3-13
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,6 @@
1-
set(MNIST_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/image_classifier/mnist")
21

3-
set(MNIST_SOURCE_FILES "")
4-
list(APPEND MNIST_SOURCE_FILES ${MNIST_SRC_DIR}/mnist_handler.cc)
5-
add_library(mnist_handler SHARED ${MNIST_SOURCE_FILES})
6-
target_include_directories(mnist_handler PUBLIC ${MNIST_SRC_DIR})
7-
target_link_libraries(mnist_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
2+
add_subdirectory("../../../examples/cpp/babyllama/" "../../../test/resources/examples/babyllama/babyllama_handler/")
83

4+
add_subdirectory("../../../examples/cpp/llamacpp/" "../../../test/resources/examples/llamacpp/llamacpp_handler/")
95

10-
set(BABYLLAMA_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/babyllama")
11-
set(BABYLLAMA_SOURCE_FILES "")
12-
list(APPEND BABYLLAMA_SOURCE_FILES ${BABYLLAMA_SRC_DIR}/baby_llama_handler.cc)
13-
add_library(babyllama_handler SHARED ${BABYLLAMA_SOURCE_FILES})
14-
target_include_directories(babyllama_handler PUBLIC ${BABYLLAMA_SRC_DIR})
15-
target_link_libraries(babyllama_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
16-
target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)
6+
add_subdirectory("../../../examples/cpp/mnist/" "../../../test/resources/examples/mnist/mnist_handler/")

cpp/test/backends/otf_protocol_and_handler_test.cc

+6-7
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
2424
// model_name length
2525
.WillOnce(::testing::Return(5))
2626
// model_path length
27-
.WillOnce(::testing::Return(51))
27+
.WillOnce(::testing::Return(42))
2828
// batch_size
2929
.WillOnce(::testing::Return(1))
3030
// handler length
@@ -44,9 +44,8 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
4444
strncpy(data, "mnist", length);
4545
}))
4646
.WillOnce(testing::Invoke([=](size_t length, char* data) {
47-
ASSERT_EQ(length, 51);
48-
strncpy(data, "test/resources/torchscript_model/mnist/base_handler",
49-
length);
47+
ASSERT_EQ(length, 42);
48+
strncpy(data, "test/resources/examples/mnist/base_handler", length);
5049
}))
5150
.WillOnce(testing::Invoke([=](size_t length, char* data) {
5251
ASSERT_EQ(length, 11);
@@ -60,7 +59,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
6059
EXPECT_CALL(*client_socket, SendAll(testing::_, testing::_)).Times(1);
6160
auto load_model_request = OTFMessage::RetrieveLoadMsg(*client_socket);
6261
ASSERT_EQ(load_model_request->model_dir,
63-
"test/resources/torchscript_model/mnist/base_handler");
62+
"test/resources/examples/mnist/base_handler");
6463
ASSERT_EQ(load_model_request->model_name, "mnist");
6564
ASSERT_EQ(load_model_request->envelope, "");
6665
ASSERT_EQ(load_model_request->model_name, "mnist");
@@ -71,7 +70,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
7170
auto backend = std::make_shared<torchserve::Backend>();
7271
MetricsRegistry::Initialize("test/resources/metrics/default_config.yaml",
7372
MetricsContext::BACKEND);
74-
backend->Initialize("test/resources/torchscript_model/mnist/base_handler");
73+
backend->Initialize("test/resources/examples/mnist/base_handler");
7574

7675
// load the model
7776
auto load_model_response = backend->LoadModel(load_model_request);
@@ -126,7 +125,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
126125
.WillOnce(testing::Invoke([=](size_t length, char* data) {
127126
ASSERT_EQ(length, 3883);
128127
// strncpy(data, "valu", length);
129-
std::ifstream input("test/resources/torchscript_model/mnist/0_png.pt",
128+
std::ifstream input("test/resources/examples/mnist/0_png.pt",
130129
std::ios::in | std::ios::binary);
131130
std::vector<char> image((std::istreambuf_iterator<char>(input)),
132131
(std::istreambuf_iterator<char>()));

cpp/test/examples/examples_test.cc

+32-4
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,38 @@
1+
#include <fstream>
2+
13
#include "test/utils/common.hh"
24

35
TEST_F(ModelPredictTest, TestLoadPredictBabyLlamaHandler) {
6+
std::string base_dir = "test/resources/examples/babyllama/";
7+
std::string file1 = base_dir + "babyllama_handler/stories15M.bin";
8+
std::string file2 = base_dir + "babyllama_handler/tokenizer.bin";
9+
10+
std::ifstream f1(file1);
11+
std::ifstream f2(file2);
12+
13+
if (!f1.good() && !f2.good())
14+
GTEST_SKIP()
15+
<< "Skipping TestLoadPredictBabyLlamaHandler because of missing files: "
16+
<< file1 << " or " << file2;
17+
18+
this->LoadPredict(
19+
std::make_shared<torchserve::LoadModelRequest>(
20+
base_dir + "babyllama_handler", "llm", -1, "", "", 1, false),
21+
base_dir + "babyllama_handler", base_dir + "prompt.txt", "llm_ts", 200);
22+
}
23+
24+
TEST_F(ModelPredictTest, TestLoadPredictLlmHandler) {
25+
std::string base_dir = "test/resources/examples/llamacpp/";
26+
std::string file1 = base_dir + "llamacpp_handler/llama-2-7b-chat.Q5_0.gguf";
27+
std::ifstream f(file1);
28+
29+
if (!f.good())
30+
GTEST_SKIP()
31+
<< "Skipping TestLoadPredictLlmHandler because of missing file: "
32+
<< file1;
33+
434
this->LoadPredict(
535
std::make_shared<torchserve::LoadModelRequest>(
6-
"test/resources/torchscript_model/babyllama/babyllama_handler", "llm",
7-
-1, "", "", 1, false),
8-
"test/resources/torchscript_model/babyllama/babyllama_handler",
9-
"test/resources/torchscript_model/babyllama/prompt.txt", "llm_ts", 200);
36+
base_dir + "llamacpp_handler", "llamacpp", -1, "", "", 1, false),
37+
base_dir + "llamacpp_handler", base_dir + "prompt.txt", "llm_ts", 200);
1038
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"checkpoint_path" : "test/resources/examples/babyllama/babyllama_handler/stories15M.bin",
3+
"tokenizer_path" : "test/resources/examples/babyllama/babyllama_handler/tokenizer.bin"
4+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"createdOn": "28/07/2020 06:32:08",
3+
"runtime": "LSP",
4+
"model": {
5+
"modelName": "llamacpp",
6+
"handler": "libllamacpp_handler:LlamaCppHandler",
7+
"modelVersion": "2.0"
8+
},
9+
"archiverVersion": "0.2.0"
10+
}

cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json

-5
This file was deleted.

cpp/test/torch_scripted/torch_scripted_test.cc

+15-18
Original file line numberDiff line numberDiff line change
@@ -9,47 +9,44 @@
99

1010
TEST_F(ModelPredictTest, TestLoadPredictBaseHandler) {
1111
this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
12-
"test/resources/torchscript_model/mnist/mnist_handler",
12+
"test/resources/examples/mnist/mnist_handler",
1313
"mnist_scripted_v2", -1, "", "", 1, false),
14-
"test/resources/torchscript_model/mnist/base_handler",
15-
"test/resources/torchscript_model/mnist/0_png.pt",
16-
"mnist_ts", 200);
14+
"test/resources/examples/mnist/base_handler",
15+
"test/resources/examples/mnist/0_png.pt", "mnist_ts", 200);
1716
}
1817

1918
TEST_F(ModelPredictTest, TestLoadPredictMnistHandler) {
2019
this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
21-
"test/resources/torchscript_model/mnist/mnist_handler",
20+
"test/resources/examples/mnist/mnist_handler",
2221
"mnist_scripted_v2", -1, "", "", 1, false),
23-
"test/resources/torchscript_model/mnist/mnist_handler",
24-
"test/resources/torchscript_model/mnist/0_png.pt",
25-
"mnist_ts", 200);
22+
"test/resources/examples/mnist/mnist_handler",
23+
"test/resources/examples/mnist/0_png.pt", "mnist_ts", 200);
2624
}
2725

2826
TEST_F(ModelPredictTest, TestBackendInitWrongModelDir) {
29-
auto result = backend_->Initialize("test/resources/torchscript_model/mnist");
27+
auto result = backend_->Initialize("test/resources/examples/mnist");
3028
ASSERT_EQ(result, false);
3129
}
3230

3331
TEST_F(ModelPredictTest, TestBackendInitWrongHandler) {
34-
auto result = backend_->Initialize(
35-
"test/resources/torchscript_model/mnist/wrong_handler");
32+
auto result =
33+
backend_->Initialize("test/resources/examples/mnist/wrong_handler");
3634
ASSERT_EQ(result, false);
3735
}
3836

3937
TEST_F(ModelPredictTest, TestLoadModelFailure) {
40-
backend_->Initialize("test/resources/torchscript_model/mnist/wrong_model");
38+
backend_->Initialize("test/resources/examples/mnist/wrong_model");
4139
auto result =
4240
backend_->LoadModel(std::make_shared<torchserve::LoadModelRequest>(
43-
"test/resources/torchscript_model/mnist/wrong_model",
44-
"mnist_scripted_v2", -1, "", "", 1, false));
41+
"test/resources/examples/mnist/wrong_model", "mnist_scripted_v2", -1,
42+
"", "", 1, false));
4543
ASSERT_EQ(result->code, 500);
4644
}
4745

4846
TEST_F(ModelPredictTest, TestLoadPredictMnistHandlerFailure) {
4947
this->LoadPredict(std::make_shared<torchserve::LoadModelRequest>(
50-
"test/resources/torchscript_model/mnist/mnist_handler",
48+
"test/resources/examples/mnist/mnist_handler",
5149
"mnist_scripted_v2", -1, "", "", 1, false),
52-
"test/resources/torchscript_model/mnist/mnist_handler",
53-
"test/resources/torchscript_model/mnist/0.png", "mnist_ts",
54-
500);
50+
"test/resources/examples/mnist/mnist_handler",
51+
"test/resources/examples/mnist/0.png", "mnist_ts", 500);
5552
}

cpp/test/utils/model_archiver_test.cc

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ namespace torchserve {
66
TEST(ManifestTest, TestInitialize) {
77
torchserve::Manifest manifest;
88
manifest.Initialize(
9-
"test/resources/torchscript_model/mnist/base_handler/MAR-INF/"
9+
"test/resources/examples/mnist/base_handler/MAR-INF/"
1010
"MANIFEST.json");
1111
ASSERT_EQ(manifest.GetCreatOn(), "28/07/2020 06:32:08");
1212
ASSERT_EQ(manifest.GetArchiverVersion(), "0.2.0");

cpp/third-party/llama.cpp

Submodule llama.cpp added at cd4fddb

examples/cpp/babyllama/CMakeLists.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
add_library(babyllama_handler SHARED src/baby_llama_handler.cc)
3+
4+
target_link_libraries(babyllama_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
5+
target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)

examples/cpp/babyllama/config.json

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"checkpoint_path" : "/home/ubuntu/serve/examples/cpp/babyllama/stories15M.bin",
3+
"tokenizer_path" : "/home/ubuntu/serve/examples/cpp/babyllama/tokenizer.bin"
4+
}

cpp/src/examples/babyllama/baby_llama_handler.cc examples/cpp/babyllama/src/baby_llama_handler.cc

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
#include "src/examples/babyllama/baby_llama_handler.hh"
1+
#include "baby_llama_handler.hh"
22

33
#include <folly/FileUtil.h>
44
#include <folly/json.h>
55

66
#include <typeinfo>
77

8-
#include "src/examples/babyllama/llama2.c/run.c"
8+
#include "llama2.c/run.c"
99

1010
namespace llm {
1111

@@ -233,7 +233,6 @@ c10::IValue BabyLlamaHandler::Inference(
233233
} catch (...) {
234234
TS_LOG(ERROR, "Failed to run inference on this batch");
235235
}
236-
std::cout << "WOOT?" << std::endl;
237236
return batch_output_vector;
238237
}
239238

examples/cpp/llamacpp/CMakeLists.txt

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
set(LLAMACPP_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/third-party/llama.cpp")
2+
3+
add_library(llamacpp_handler SHARED src/llamacpp_handler.cc)
4+
5+
set(MY_OBJECT_FILES
6+
${LLAMACPP_SRC_DIR}/ggml.o
7+
${LLAMACPP_SRC_DIR}/llama.o
8+
${LLAMACPP_SRC_DIR}/common.o
9+
${LLAMACPP_SRC_DIR}/ggml-quants.o
10+
${LLAMACPP_SRC_DIR}/ggml-alloc.o
11+
${LLAMACPP_SRC_DIR}/grammar-parser.o
12+
${LLAMACPP_SRC_DIR}/console.o
13+
${LLAMACPP_SRC_DIR}/build-info.o
14+
${LLAMACPP_SRC_DIR}/ggml-backend.o
15+
16+
)
17+
18+
target_sources(llamacpp_handler PRIVATE ${MY_OBJECT_FILES})
19+
target_include_directories(llamacpp_handler PUBLIC ${LLAMACPP_SRC_DIR})
20+
target_link_libraries(llamacpp_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})

0 commit comments

Comments
 (0)