Adding mps support to base handler and regression test (#3048)

udaij12 · agunapal · web-flow · commit 89c5389d5dba · 2024-04-09T00:09:49.000Z
* adding mps support to base handler and regression test

* fixed method

* mps support

* fix format

* changes to detection

* testing x86

* adding m1 check

* adding test cases

* adding test workflow

* modifiying tests

* removing python tests

* remove workflow

* removing test config file

* adding docs

* fixing spell check

* lint fix

---------

Co-authored-by: Ankith Gunapal &lt;agunapal@ischool.Berkeley.edu&gt;
diff --git a/docs/apple_silicon_support.md b/docs/apple_silicon_support.md
@@ -0,0 +1,129 @@
+# Apple Silicon Support 
+
+## What is supported 
+* TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware.
+    - [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)  
+    - [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml) 
+* For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker)
+
+## Experimental Support
+
+* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml. 
+    * This is an experimental feature and NOT ALL models are guaranteed to work.  
+* Number of GPUs now reports GPUs on Apple Silicon
+
+### Testing 
+* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices 
+* Models that have been tested and work: Resnet-18, Densenet161, Alexnet
+* Models that have been tested and DO NOT work: MNIST
+
+
+#### Example Resnet-18 Using MPS On Mac M1 Pro
+```
+serve % torchserve --start --model-store model_store_gen --models resnet-18=resnet-18.mar --ncs
+
+Torchserve version: 0.10.0
+Number of GPUs: 16
+Number of CPUs: 10
+Max heap size: 8192 M
+Python executable: /Library/Frameworks/Python.framework/Versions/3.11/bin/python3.11
+Config file: N/A
+Inference address: http://127.0.0.1:8080
+Management address: http://127.0.0.1:8081
+Metrics address: http://127.0.0.1:8082
+Model Store: 
+Initial Models: resnet-18=resnet-18.mar
+Log dir: 
+Metrics dir: 
+Netty threads: 0
+Netty client threads: 0
+Default workers per model: 16
+Blacklist Regex: N/A
+Maximum Response Size: 6553500
+Maximum Request Size: 6553500
+Limit Maximum Image Pixels: true
+Prefer direct buffer: false
+Allowed Urls: [file://.*|http(s)?://.*]
+Custom python dependency for model allowed: false
+Enable metrics API: true
+Metrics mode: LOG
+Disable system metrics: false
+Workflow Store: 
+CPP log config: N/A
+Model config: N/A
+024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
+2024-04-08T14:18:02,391 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: resnet-18.mar
+2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model resnet-18
+2024-04-08T14:18:02,699 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
+2024-04-08T14:18:02,699 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: resnet-18, count: 16
+...
+...
+serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
+...
+{
+  "tabby": 0.40966302156448364,
+  "tiger_cat": 0.3467046618461609,
+  "Egyptian_cat": 0.1300288736820221,
+  "lynx": 0.02391958422958851,
+  "bucket": 0.011532187461853027
+}
+...
+```
+#### Conda Example 
+
+```
+(myenv) serve % pip list | grep torch                                                                   
+torch                     2.2.1
+torchaudio                2.2.1
+torchdata                 0.7.1
+torchtext                 0.17.1
+torchvision               0.17.1
+(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
+(myenv3) serve % pip list | grep torch                                                                   
+torch                     2.2.1
+torch-model-archiver      0.10.0b20240312
+torch-workflow-archiver   0.2.12b20240312
+torchaudio                2.2.1
+torchdata                 0.7.1
+torchserve                0.10.0b20240312
+torchtext                 0.17.1
+torchvision               0.17.1
+(myenv3) serve % torchserve --start --ncs  --models densenet161.mar --model-store ./model_store_gen/
+Torchserve version: 0.10.0
+Number of GPUs: 0
+Number of CPUs: 10
+Max heap size: 8192 M
+Config file: N/A
+Inference address: http://127.0.0.1:8080
+Management address: http://127.0.0.1:8081
+Metrics address: http://127.0.0.1:8082
+Initial Models: densenet161.mar
+Netty threads: 0
+Netty client threads: 0
+Default workers per model: 10
+Blacklist Regex: N/A
+Maximum Response Size: 6553500
+Maximum Request Size: 6553500
+Limit Maximum Image Pixels: true
+Prefer direct buffer: false
+Allowed Urls: [file://.*|http(s)?://.*]
+Custom python dependency for model allowed: false
+Enable metrics API: true
+Metrics mode: LOG
+Disable system metrics: false
+CPP log config: N/A
+Model config: N/A
+System metrics command: default
+...
+2024-03-12T15:58:54,702 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model densenet161 loaded.
+2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
+Model server started.
+...
+(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg 
+{
+  "tabby": 0.46661922335624695,
+  "tiger_cat": 0.46449029445648193,
+  "Egyptian_cat": 0.0661405548453331,
+  "lynx": 0.001292439759708941,
+  "plastic_bag": 0.00022909720428287983
+}
diff --git a/frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java b/frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java
@@ -5,9 +5,11 @@
 import io.netty.handler.ssl.SslContext;
 import io.netty.handler.ssl.SslContextBuilder;
 import io.netty.handler.ssl.util.SelfSignedCertificate;
+import java.io.BufferedReader;
 import java.io.File;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.InputStreamReader;
 import java.lang.reflect.Field;
 import java.lang.reflect.Type;
 import java.net.InetAddress;
@@ -835,6 +837,28 @@ private static int getAvailableGpu() {
                 for (String id : ids) {
                     gpuIds.add(Integer.parseInt(id));
                 }
+            } else if (System.getProperty("os.name").startsWith("Mac")) {
+                Process process = Runtime.getRuntime().exec("system_profiler SPDisplaysDataType");
+                int ret = process.waitFor();
+                if (ret != 0) {
+                    return 0;
+                }
+
+                BufferedReader reader =
+                        new BufferedReader(new InputStreamReader(process.getInputStream()));
+                String line;
+                while ((line = reader.readLine()) != null) {
+                    if (line.contains("Chipset Model:") && !line.contains("Apple M1")) {
+                        return 0;
+                    }
+                    if (line.contains("Total Number of Cores:")) {
+                        String[] parts = line.split(":");
+                        if (parts.length >= 2) {
+                            return (Integer.parseInt(parts[1].trim()));
+                        }
+                    }
+                }
+                throw new AssertionError("Unexpected response.");
             } else {
                 Process process =
                         Runtime.getRuntime().exec("nvidia-smi --query-gpu=index --format=csv");
diff --git a/frontend/server/src/test/java/org/pytorch/serve/util/ConfigManagerTest.java b/frontend/server/src/test/java/org/pytorch/serve/util/ConfigManagerTest.java
@@ -105,4 +105,18 @@ public void testNoWorkflowState() throws ReflectiveOperationException, IOExcepti
                 workingDir + "/frontend/archive/src/test/resources/models",
                 configManager.getWorkflowStore());
     }
+
+    @Test
+    public void testNumGpuM1() throws ReflectiveOperationException, IOException {
+        System.setProperty("tsConfigFile", "src/test/resources/config_test_env.properties");
+        ConfigManager.Arguments args = new ConfigManager.Arguments();
+        args.setModels(new String[] {"noop_v0.1"});
+        args.setSnapshotDisabled(true);
+        ConfigManager.init(args);
+        ConfigManager configManager = ConfigManager.getInstance();
+        String arch = System.getProperty("os.arch");
+        if (arch.equals("aarch64")) {
+            Assert.assertTrue(configManager.getNumberOfGpu() > 0);
+        }
+    }
 }
diff --git a/test/pytest/test_device_config.py b/test/pytest/test_device_config.py
@@ -0,0 +1,168 @@
+import os
+import platform
+import shutil
+import tempfile
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+import requests
+import test_utils
+from model_archiver import ModelArchiverConfig
+
+CURR_FILE_PATH = Path(__file__).parent
+REPO_ROOT_DIR = CURR_FILE_PATH.parent.parent
+ROOT_DIR = os.path.join(tempfile.gettempdir(), "workspace")
+REPO_ROOT = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../../")
+data_file_zero = os.path.join(REPO_ROOT, "test/pytest/test_data/0.png")
+config_file = os.path.join(REPO_ROOT, "test/resources/config_token.properties")
+mnist_scriptes_py = os.path.join(REPO_ROOT, "examples/image_classifier/mnist/mnist.py")
+
+HANDLER_PY = """
+from ts.torch_handler.base_handler import BaseHandler
+
+class deviceHandler(BaseHandler):
+
+    def initialize(self, context):
+        super().initialize(context)
+        assert self.get_device().type == "mps"
+"""
+
+MODEL_CONFIG_YAML = """
+    #frontend settings
+    # TorchServe frontend parameters
+    minWorkers: 1
+    batchSize: 4
+    maxWorkers: 4
+    """
+
+MODEL_CONFIG_YAML_GPU = """
+    #frontend settings
+    # TorchServe frontend parameters
+    minWorkers: 1
+    batchSize: 4
+    maxWorkers: 4
+    deviceType: "gpu"
+    """
+
+MODEL_CONFIG_YAML_CPU = """
+    #frontend settings
+    # TorchServe frontend parameters
+    minWorkers: 1
+    batchSize: 4
+    maxWorkers: 4
+    deviceType: "cpu"
+    """
+
+
+@pytest.fixture(scope="module")
+def model_name():
+    yield "mnist"
+
+
+@pytest.fixture(scope="module")
+def work_dir(tmp_path_factory, model_name):
+    return Path(tmp_path_factory.mktemp(model_name))
+
+
+@pytest.fixture(scope="module")
+def model_config_name(request):
+    def get_config(param):
+        if param == "cpu":
+            return MODEL_CONFIG_YAML_CPU
+        elif param == "gpu":
+            return MODEL_CONFIG_YAML_GPU
+        else:
+            return MODEL_CONFIG_YAML
+
+    return get_config(request.param)
+
+
+@pytest.fixture(scope="module", name="mar_file_path")
+def create_mar_file(work_dir, model_archiver, model_name, model_config_name):
+    mar_file_path = work_dir.joinpath(model_name + ".mar")
+
+    model_config_yaml_file = work_dir / "model_config.yaml"
+    model_config_yaml_file.write_text(model_config_name)
+
+    model_py_file = work_dir / "model.py"
+
+    model_py_file.write_text(mnist_scriptes_py)
+
+    handler_py_file = work_dir / "handler.py"
+    handler_py_file.write_text(HANDLER_PY)
+
+    config = ModelArchiverConfig(
+        model_name=model_name,
+        version="1.0",
+        serialized_file=None,
+        model_file=mnist_scriptes_py,  # model_py_file.as_posix(),
+        handler=handler_py_file.as_posix(),
+        extra_files=None,
+        export_path=work_dir,
+        requirements_file=None,
+        runtime="python",
+        force=False,
+        archive_format="default",
+        config_file=model_config_yaml_file.as_posix(),
+    )
+
+    with patch("archiver.ArgParser.export_model_args_parser", return_value=config):
+        model_archiver.generate_model_archive()
+
+        assert mar_file_path.exists()
+
+        yield mar_file_path.as_posix()
+
+        # Clean up files
+
+    mar_file_path.unlink(missing_ok=True)
+
+    # Clean up files
+
+
+@pytest.fixture(scope="module", name="model_name")
+def register_model(mar_file_path, model_store, torchserve):
+    """
+    Register the model in torchserve
+    """
+    shutil.copy(mar_file_path, model_store)
+
+    file_name = Path(mar_file_path).name
+
+    model_name = Path(file_name).stem
+
+    params = (
+        ("model_name", model_name),
+        ("url", file_name),
+        ("initial_workers", "1"),
+        ("synchronous", "true"),
+        ("batch_size", "1"),
+    )
+
+    test_utils.reg_resp = test_utils.register_model_with_params(params)
+
+    yield model_name
+
+    test_utils.unregister_model(model_name)
+
+
+@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
+@pytest.mark.parametrize("model_config_name", ["gpu"], indirect=True)
+def test_m1_device(model_name, model_config_name):
+    response = requests.get(f"http://localhost:8081/models/{model_name}")
+    assert response.status_code == 200, "Describe Failed"
+
+
+@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
+@pytest.mark.parametrize("model_config_name", ["cpu"], indirect=True)
+def test_m1_device_cpu(model_name, model_config_name):
+    response = requests.get(f"http://localhost:8081/models/{model_name}")
+    assert response.status_code == 404, "Describe Worked"
+
+
+@pytest.mark.skipif(platform.machine() != "arm64", reason="Skip on Mac M1")
+@pytest.mark.parametrize("model_config_name", ["default"], indirect=True)
+def test_m1_device_default(model_name, model_config_name):
+    response = requests.get(f"http://localhost:8081/models/{model_name}")
+    assert response.status_code == 200, "Describe Failed"
diff --git a/ts/torch_handler/base_handler.py b/ts/torch_handler/base_handler.py
@@ -144,11 +144,15 @@ def initialize(self, context):
             self.model_yaml_config = context.model_yaml_config
 
         properties = context.system_properties
+
         if torch.cuda.is_available() and properties.get("gpu_id") is not None:
             self.map_location = "cuda"
             self.device = torch.device(
                 self.map_location + ":" + str(properties.get("gpu_id"))
             )
+        elif torch.backends.mps.is_available() and properties.get("gpu_id") is not None:
+            self.map_location = "mps"
+            self.device = torch.device("mps")
         elif XLA_AVAILABLE:
             self.device = xm.xla_device()
         else:
@@ -524,3 +528,11 @@ def describe_handle(self):
         # pylint: disable=unnecessary-pass
         pass
         # pylint: enable=unnecessary-pass
+
+    def get_device(self):
+        """Get device
+
+        Returns:
+            string : self device
+        """
+        return self.device
diff --git a/ts_scripts/spellcheck_conf/wordlist.txt b/ts_scripts/spellcheck_conf/wordlist.txt
@@ -1216,3 +1216,5 @@ libomp
 rpath
 venv
 TorchInductor
+Pytests
+deviceType