Skip to content

Commit 70c500c

Browse files
committed
Add AMD documentation
1 parent 5e180e6 commit 70c500c

8 files changed

+138
-46
lines changed

CONTRIBUTING.md

+18-25
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,7 @@ Your contributions will fall into two categories:
1111
- Search for your issue here: https://github.com/pytorch/serve/issues (look for the "good first issue" tag if you're a first time contributor)
1212
- Pick an issue and comment on the task that you want to work on this feature.
1313
- To ensure your changes doesn't break any of the existing features run the sanity suite as follows from serve directory:
14-
- Install dependencies (if not already installed)
15-
For CPU
16-
17-
```bash
18-
python ts_scripts/install_dependencies.py --environment=dev
19-
```
20-
21-
For GPU
22-
```bash
23-
python ts_scripts/install_dependencies.py --environment=dev --cuda=cu121
24-
```
25-
> Supported cuda versions as cu121, cu118, cu117, cu116, cu113, cu111, cu102, cu101, cu92
14+
- [Install dependencies](#Install-TorchServe-for-development) (if not already installed)
2615
- Install `pre-commit` to your Git flow:
2716
```bash
2817
pre-commit install
@@ -60,26 +49,30 @@ pytest -k test/pytest/test_mnist_template.py
6049

6150
If you plan to develop with TorchServe and change some source code, you must install it from source code.
6251

63-
Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.
52+
1. Clone the repository, including third-party modules, with `git clone --recurse-submodules --remote-submodules [email protected]:pytorch/serve.git`
53+
2. Ensure that you have `python3` installed, and the user has access to the site-packages or `~/.local/bin` is added to the `PATH` environment variable.
54+
3. Run the following script from the top of the source directory. NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found
6455

65-
Run the following script from the top of the source directory.
56+
#### For Debian Based Systems/MacOS
6657

67-
NOTE: This script force re-installs `torchserve`, `torch-model-archiver` and `torch-workflow-archiver` if existing installations are found
58+
```
59+
python ./ts_scripts/install_dependencies.py --environment=dev
60+
python ./ts_scripts/install_from_src.py --environment=dev
61+
```
62+
##### Installing Dependencies for Accelerator Support
63+
Use the optional `--rocm` or `--cuda` flag with `install_dependencies.py` for installing accelerator specific dependencies.
6864

69-
#### For Debian Based Systems/ MacOS
70-
71-
```
72-
python ./ts_scripts/install_dependencies.py --environment=dev
73-
python ./ts_scripts/install_from_src.py --environment=dev
74-
```
65+
Possible values are
66+
- rocm: `rocm61`, `rocm60`
67+
- cuda: `cu111`, `cu102`, `cu101`, `cu92`
7568

76-
Use `--cuda` flag with `install_dependencies.py` for installing cuda version specific dependencies. Possible values are `cu111`, `cu102`, `cu101`, `cu92`
69+
For example `python ./ts_scripts/install_dependencies.py --environment=dev --rocm=rocm61`
7770

78-
#### For Windows
71+
#### For Windows
7972

80-
Refer to the documentation [here](docs/torchserve_on_win_native.md).
73+
Refer to the documentation [here](docs/torchserve_on_win_native.md).
8174

82-
For information about the model archiver, see [detailed documentation](model-archiver/README.md).
75+
For information about the model archiver, see [detailed documentation](model-archiver/README.md).
8376

8477
### What to Contribute?
8578

README.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,10 @@ curl http://127.0.0.1:8080/predictions/bert -T input.txt
2222

2323
```bash
2424
# Install dependencies
25-
# cuda is optional
25+
python ./ts_scripts/install_dependencies.py
26+
27+
# Include dependencies for accelerator support with the relevant optional flags
28+
python ./ts_scripts/install_dependencies.py --rocm=rocm61
2629
python ./ts_scripts/install_dependencies.py --cuda=cu121
2730

2831
# Latest release
@@ -36,7 +39,10 @@ pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archi
3639

3740
```bash
3841
# Install dependencies
39-
# cuda is optional
42+
python ./ts_scripts/install_dependencies.py
43+
44+
# Include depeendencies for accelerator support with the relevant optional flags
45+
python ./ts_scripts/install_dependencies.py --rocm=rocm61
4046
python ./ts_scripts/install_dependencies.py --cuda=cu121
4147

4248
# Latest release

docs/contents.rst

+6-2
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,7 @@
1616
model_zoo
1717
request_envelopes
1818
server
19-
nvidia_mps
2019
snapshot
21-
intel_extension_for_pytorch <https://github.com/pytorch/serve/tree/master/examples/intel_extension_for_pytorch>
2220
torchserve_on_win_native
2321
torchserve_on_wsl
2422
use_cases
@@ -27,6 +25,12 @@
2725
Security
2826
FAQs
2927

28+
.. toctree::
29+
:maxdepth: 0
30+
:caption: Hardware Support:
31+
32+
hardware_support/hardware_support
33+
3034
.. toctree::
3135
:maxdepth: 0
3236
:caption: Service APIs:

docs/hardware_support/amd_support.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# AMD Support
2+
3+
TorchServe can be run on any combination of operating system and device that is
4+
[supported by ROCm](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html).
5+
6+
## Supported Versions of ROCm
7+
8+
The current stable `major.patch` version of ROCm and the previous path version will be supported. For example version `N.2` and `N.1` where `N` is the current major version.
9+
10+
## Installation
11+
12+
- Make sure you have **python >= 3.8 installed** on your system.
13+
- clone the repo
14+
```bash
15+
git clone [email protected]:pytorch/serve.git
16+
```
17+
18+
- cd into the cloned folder
19+
20+
```bash
21+
cd serve
22+
```
23+
24+
- create a virtual environment for python
25+
26+
```bash
27+
python -m venv venv
28+
```
29+
30+
- activate the virtual environment. If you use another shell (fish, csh, powershell) use the relevant option in from `/venv/bin/`
31+
```bash
32+
source venv/bin/activate
33+
```
34+
35+
- install the dependencies needed for ROCm support.
36+
37+
```bash
38+
python ./ts_scripts/install_dependencies.py --rocm=rocm61
39+
python ./ts_scripts/install_from_src.py
40+
```
41+
- enable amd-smi in the python virtual environment
42+
```bash
43+
sudo chown -R $USER:$USER /opt/rocm/share/amd_smi/
44+
pip install -e /opt/rocm/share/amd_smi/
45+
```
46+
47+
### Selecting Accelerators Using `HIP_VISIBLE_DEVICES`
48+
49+
If you have multiple accelerators on the system where you are running TorchServe you can select which accelerators should be visible to TorchServe
50+
by setting the environment variable `HIP_VISIBLE_DEVICES` to a string of 0-indexed comma-separated integers representing the ids of the accelerators.
51+
52+
If you have 8 accelerators but only want TorchServe to see the last four of them do `export HIP_VISIBLE_DEVICES=4,5,6,7`.
53+
54+
>ℹ️ **Not setting** `HIP_VISIBLE_DEVICES` will cause TorchServe to use all available accelerators on the system it is running on.
55+
56+
> ⚠️ You can run into trouble if you set `HIP_VISIBLE_DEVICES` to an empty string.
57+
> eg. `export HIP_VISIBLE_DEVICES=` or `export HIP_VISIBLE_DEVICES=""`
58+
> use `unset HIP_VISIBLE_DEVICES` if you want to remove its effect.
59+
60+
> ⚠️ Setting both `CUDA_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES` may cause unintended behaviour and should be avoided.
61+
> Doing so may cause an exception in the future.
62+
63+
## Docker
64+
65+
**In Development**
66+
67+
`Dockerfile.rocm` provides preliminary ROCm support for TorchServe.
68+
69+
Building and running `dev-image`:
70+
71+
```bash
72+
docker build --file docker/Dockerfile.rocm --target dev-image -t torch-serve-dev-image-rocm --build-arg USE_ROCM_VERSION=rocm62 --build-arg BUILD_FROM_SRC=true .
73+
74+
docker run -it --rm --device=/dev/kfd --device=/dev/dri torch-serve-dev-image-rocm bash
75+
```
76+
77+
## Example Usage
78+
79+
After installing TorchServe with the required dependencies for ROCm you should be ready to serve your model.
80+
81+
For a simple example, refer to `serve/examples/image_classifier/mnist/`.

docs/apple_silicon_support.md docs/hardware_support/apple_silicon_support.md

+17-17
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
# Apple Silicon Support
1+
# Apple Silicon Support
22

3-
## What is supported
3+
## What is supported
44
* TorchServe CI jobs now include M1 hardware in order to ensure support, [documentation](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) on github M1 hardware.
5-
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
6-
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
5+
- [Regression Tests](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu.yml)
6+
- [Regression binaries Test](https://github.com/pytorch/serve/blob/master/.github/workflows/regression_tests_cpu_binaries.yml)
77
* For [Docker](https://docs.docker.com/desktop/install/mac-install/) ensure Docker for Apple silicon is installed then follow [setup steps](https://github.com/pytorch/serve/tree/master/docker)
88

99
## Experimental Support
1010

11-
* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
12-
* This is an experimental feature and NOT ALL models are guaranteed to work.
11+
* For GPU jobs on Apple Silicon, [MPS](https://pytorch.org/docs/master/notes/mps.html) is now auto detected and enabled. To prevent TorchServe from using MPS, users have to set `deviceType: "cpu"` in model-config.yaml.
12+
* This is an experimental feature and NOT ALL models are guaranteed to work.
1313
* Number of GPUs now reports GPUs on Apple Silicon
1414

15-
### Testing
16-
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
15+
### Testing
16+
* [Pytests](https://github.com/pytorch/serve/tree/master/test/pytest/test_device_config.py) that checks for MPS on MacOS M1 devices
1717
* Models that have been tested and work: Resnet-18, Densenet161, Alexnet
1818
* Models that have been tested and DO NOT work: MNIST
1919

@@ -31,10 +31,10 @@ Config file: N/A
3131
Inference address: http://127.0.0.1:8080
3232
Management address: http://127.0.0.1:8081
3333
Metrics address: http://127.0.0.1:8082
34-
Model Store:
34+
Model Store:
3535
Initial Models: resnet-18=resnet-18.mar
36-
Log dir:
37-
Metrics dir:
36+
Log dir:
37+
Metrics dir:
3838
Netty threads: 0
3939
Netty client threads: 0
4040
Default workers per model: 16
@@ -48,7 +48,7 @@ Custom python dependency for model allowed: false
4848
Enable metrics API: true
4949
Metrics mode: LOG
5050
Disable system metrics: false
51-
Workflow Store:
51+
Workflow Store:
5252
CPP log config: N/A
5353
Model config: N/A
5454
024-04-08T14:18:02,380 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
@@ -69,17 +69,17 @@ serve % curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_cla
6969
}
7070
...
7171
```
72-
#### Conda Example
72+
#### Conda Example
7373

7474
```
75-
(myenv) serve % pip list | grep torch
75+
(myenv) serve % pip list | grep torch
7676
torch 2.2.1
7777
torchaudio 2.2.1
7878
torchdata 0.7.1
7979
torchtext 0.17.1
8080
torchvision 0.17.1
8181
(myenv3) serve % conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver
82-
(myenv3) serve % pip list | grep torch
82+
(myenv3) serve % pip list | grep torch
8383
torch 2.2.1
8484
torch-model-archiver 0.10.0b20240312
8585
torch-workflow-archiver 0.2.12b20240312
@@ -119,11 +119,11 @@ System metrics command: default
119119
2024-03-12T15:58:54,702 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: densenet161, count: 10
120120
Model server started.
121121
...
122-
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
122+
(myenv3) serve % curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg
123123
{
124124
"tabby": 0.46661922335624695,
125125
"tiger_cat": 0.46449029445648193,
126126
"Egyptian_cat": 0.0661405548453331,
127127
"lynx": 0.001292439759708941,
128128
"plastic_bag": 0.00022909720428287983
129-
}
129+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. toctree::
2+
:caption: Hardware Support:
3+
4+
amd_support
5+
apple_silicon_support
6+
linux_aarch64
7+
nvidia_mps
8+
Intel Extension for PyTorch <https://github.com/pytorch/serve/tree/master/examples/intel_extension_for_pytorch>
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)