removed hf token from cpu based example

nirrozenbaum · nirrozenbaum · commit e2c381bacfdd · 2025-03-09T13:31:20.000+02:00
Signed-off-by: Nir Rozenbaum &lt;nirro@il.ibm.com&gt;
diff --git a/config/manifests/vllm/cpu-deployment.yaml b/config/manifests/vllm/cpu-deployment.yaml
@@ -29,11 +29,6 @@ spec:
           env:
             - name: PORT
               value: "8000"
-            - name: HUGGING_FACE_HUB_TOKEN
-              valueFrom:
-                secretKeyRef:
-                  name: hf-token
-                  key: token
             - name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
               value: "true"
           ports:
@@ -78,11 +73,6 @@ spec:
             - --duplicate-count
             - "4"
           env:
-            - name: HF_TOKEN 
-              valueFrom:
-                secretKeyRef:
-                  name: hf-token
-                  key: token
             - name: HF_HOME
               value: /adapters
           volumeMounts:
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -18,7 +18,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
       Requirements: a Hugging Face access token that grants access to the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf).
    
    1. CPU-based model server (not using GPUs).  
-      Requirements: a Hugging Face access token that grants access to the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).  
+      The sample usees the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).  
 
    Choose one of these options and follow the steps below. Please do not deploy both, as the deployments have the same name and will override each other.
    
@@ -34,10 +34,8 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
 
 #### CPU-Based Model Server
 
-   Create a Hugging Face secret to download the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Ensure that the token grants access to this model.
    Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
    ```bash
-   kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
    ```