Add note about gpu placement + simplify call to model->run

mreso · mreso · commit af7ee3dc9562 · 2024-03-09T04:19:49.000Z
diff --git a/examples/cpp/aot_inductor/bert/README.md b/examples/cpp/aot_inductor/bert/README.md
@@ -2,6 +2,8 @@ This example uses AOTInductor to compile the [google-bert/bert-base-uncased](htt
 
 Then, this example loads model and runs prediction using libtorch. The handler C++ source code for this examples can be found [here](src).
 
+**Note**: Please note that due to an issue in Pytorch 2.2.1 the AOTInductor model can not be placed on a specific GPU through the API. This issue is resolved in the PT 2.3 nightlies. Please install the nightlies if you want to run multiple model worker on different GPUs.
+
 ### Setup
 1. Follow the instructions in [README.md](../../../../cpp/README.md) to build the TorchServe C++ backend.
 
diff --git a/examples/cpp/aot_inductor/bert/src/bert_handler.cc b/examples/cpp/aot_inductor/bert/src/bert_handler.cc
@@ -157,12 +157,8 @@ c10::IValue BertCppHandler::Inference(
     } else {
       runner = std::static_pointer_cast<torch::inductor::AOTIModelContainerRunnerCpu>(model);
     }
-    #if TORCH_VERSION_MAJOR == 2 && TORCH_VERSION_MINOR == 2
-      auto batch_output_tensor_vector = runner->run(inputs.toTensorVector());
-    #else
-      std::vector<torch::Tensor> tmp = inputs.toTensorVector();
+      auto tmp = inputs.toTensorVector();
       auto batch_output_tensor_vector = runner->run(tmp);
-    #endif
     return c10::IValue(batch_output_tensor_vector[0]);
   } catch (std::runtime_error& e) {
     TS_LOG(ERROR, e.what());
diff --git a/examples/cpp/aot_inductor/resnet/README.md b/examples/cpp/aot_inductor/resnet/README.md
@@ -1,6 +1,8 @@
 This example uses AOTInductor to compile the Resnet50 into an so file which is then executed using libtorch.
 The handler C++ source code for this examples can be found [here](src).
 
+**Note**: Please note that due to an issue in Pytorch 2.2.1 the AOTInductor model can not be placed on a specific GPU through the API. This issue is resolved in the PT 2.3 nightlies. Please install the nightlies if you want to run multiple model worker on different GPUs.
+
 ### Setup
 1. Follow the instructions in [README.md](../../../../cpp/README.md) to build the TorchServe C++ backend.