实例分割模Mask-RT-DETR-L型推理问题 #3548

flysssss · 2025-03-07T06:40:08Z

Checklist:

查找历史相关issue寻求解答
翻阅FAQ
翻阅PaddleX 文档
确认bug是否在新版本里还未修复

描述问题

官方教程中Mask-RT-DETR-L模型GPU 推理耗时基于 NVIDIA Tesla T4 机器，精度类型为 FP32，推理耗时在
46.5059ms，

复现机器V100
图片分辨率尺寸1920x1080

推理代码
from paddlex import create_model
import time,os

#model = create_model("./output/best_model/inference")
model = create_model("Mask-RT-DETR-L")
path = './test/'
for filename in os.listdir(path):
print(filename)
img_path = path + filename
name =filename.split('.')
file_name = name[0]
for i in range(20):
start = time.time()
output = model.predict(img_path, batch_size=1)
for res in output:
res.print() # 打印预测的结构化输出
res.save_to_img("./res/") # 保存结果可视化图像
res.save_to_json("./res/") # 保存预测的结构化输出
end = time.time()
cost =end - start
print('花费 {:.5f} 秒'.format(cost))

耗时在8-9s左右
目前测试官方模型与自己微调训练的模型耗时一致都在8-9s，显存占用3GB左右，不是CPU推理
导致耗时长的原因是：
1.图片输入分辨率大？
2.未启用高性能推理插件吗？
8-9s与46ms差距有点大

flysssss · 2025-03-07T06:44:07Z

环境官方镜像：
docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.0b1-paddlepaddle3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash

flysssss · 2025-03-07T11:10:21Z

启用高性能推理插件，耗时明显减少。
目前脚本可以正常推理
paddlex --pipeline ./instance_segmentation.yaml --input ./test/ --device gpu:0 --use_hpip --serial_number xxxxxxxx
但是写成API调用方式报错提示
TypeError: _SingleModelPipeline.init() got an unexpected keyword argument 'serial_number'
推理代码如下：
from paddlex import create_pipeline
import os,time
pipeline = create_pipeline(
pipeline="./instance_segmentation.yaml",
use_hpip=True，
serial_number=“xxxxxxxxx”
)

目前paddlex版本是3.0.0b1
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/whl/paddlex-3.0.0b1-py3-none-any.whl

我看官方文档更新

模型产线命令行推理¶
新增支持：
推理超参数，具体参数与产线相关，详见产线文档。例如，图像分类产线支持 --topk 参数，指定返回的前 n 个结果。
删除：
--serial_number，高性能推理不再需要传入序列号。
create_pipeline()¶
删除：
高性能推理 hpi_params 参数中的 serial_number 参数，高性能推理不再需要传入序列号。
不再支持：
产线推理超参数设置，相关参数设置均需通过产线配置文件完成，如 batch_size、阈值等。

为什么3.0.0b1版本的create_pipeline不支持serial_number入参，该如何传入serial_number呢？

flysssss · 2025-03-07T11:40:50Z

对应paddlex版本的推理示例文档3.0.0b1，create_pipeline()到底怎么设置入参格式，也没有具体接口文档
升级后的3.0.0rc0版本该如何进行高性能推理？create_pipeline()入参到底有哪些？

flysssss · 2025-03-07T11:51:12Z

PaddleX 高性能推理指南

https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/pipeline_deploy/high_performance_inference.md

from paddlex import create_pipeline

pipeline = create_pipeline(
pipeline="image_classification",

use_hpip=True,
serial_number="{序列号}",
)

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")

这一部分示例目前在3.0.0b1，3.0.0rc0版本均不可用，在启用use_hpip的情况下，麻烦给出可用的create_pipeline入参列表。

flysssss · 2025-03-07T11:53:33Z

还有一个疑问，在3.0.0rc0版本是计划删除开源产线部署SDK序列号管理这一步骤，直接可以进行use_hpip高性能推理吗？

flysssss · 2025-03-07T14:18:22Z

pipeline = create_pipeline(
pipeline="./instance_segmentation.yaml",
use_hpip=True,
device="gpu:3",
hpi_params={"serial_number": "xxxx-D532-xxxx-863B"},
)
3.0.0b1版本如上示例可以开启hpip高性能加速，请问3.0.0rc0版本
删除：
高性能推理 hpi_params 参数中的 serial_number 参数，高性能推理不再需要传入序列号。
该如何进行高性能推理？

zhang-prog · 2025-03-10T06:55:05Z

您的问题都可以在3.0rc0版本的高性能推理文档中得到解决哈。现在不需要再传入序列号了，不需要传入hpi_params={"serial_number": "xxxx-D532-xxxx-863B"},，文档中有例子可以参考下。

flysssss · 2025-03-10T10:53:47Z

@zhang-prog 按照文档重新安装镜像，还是推理报错
步骤1：
docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash

步骤2：
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cublas-cu11==11.11.3.6; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-cupti-cu11==11.8.87; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-runtime-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cudnn-cu11==8.9.6.50; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cufft-cu11==10.9.0.58; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-curand-cu11==10.3.0.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cusolver-cu11==11.4.1.48; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cusparse-cu11==11.7.5.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-nccl-cu11==2.19.3; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-nvtx-cu11==11.8.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlex 3.0.0rc0 requires albumentations==1.4.10, but you have albumentations 1.3.1 which is incompatible.
paddlex 3.0.0rc0 requires opencv-python-headless==4.10.0.84, but you have opencv-python-headless 4.6.0.66 which is incompatible.

安装好第一次报冲突的库

步骤3：
paddlex --install hpi-gpu

步骤4：
pip install erniebot -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

步骤5：

pip install erniebot_agent -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

步骤6：
执行推理，报错

λ ys-ai-GPU-6 /paddle/44seg/PaddleX python test.py
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvcaffe_parser.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvcaffe_parser.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer_plugin.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer_plugin.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvonnxparser.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvonnxparser.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvparsers.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvparsers.so.
Traceback (most recent call last):
File "/paddle/44seg/PaddleX/test.py", line 39, in
pipeline = create_pipeline(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/init.py", line 119, in create_pipeline
return create_pipeline_from_config(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/init.py", line 94, in create_pipeline_from_config
pipeline = BasePipeline.get(pipeline_name)(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 39, in patched___init_
ret = ctx.run(init_func, self, *args, **kwargs)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 39, in patched___init_
ret = ctx.run(init_func, self, *args, **kwargs)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/single_model_pipeline.py", line 22, in init
self._build_predictor(model)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/single_model_pipeline.py", line 26, in _build_predictor
self.model = self._create(model)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 71, in _create
return create_predictor(
File "/paddle/44seg/PaddleX/paddlex/inference/models/init.py", line 78, in create_predictor
return _create_hp_predictor(
File "/paddle/44seg/PaddleX/paddlex/inference/models/init.py", line 46, in _create_hp_predictor
raise RuntimeError(
RuntimeError: The PaddleX HPI plugin is not properly installed, and the high-performance model inference features are not available.

flysssss · 2025-03-10T11:58:54Z

还有一个训练问题，训练的时候val验证推理速度太慢，可以在训练时开启高性能推理框架吗？

zhang-prog · 2025-03-11T04:13:51Z

paddlepaddle-gpu 3.0.0rc1 requires nvidia-cublas-cu11==11.11.3.6; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-cupti-cu11==11.8.87; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-runtime-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
......

这些提示信息不用关心，您使用paddlex3.0.0rc0镜像创建一个新容器之后直接执行paddlex --install hpi-gpu即可。建议重新创建一个容器再试试。

zhang-prog · 2025-03-11T04:14:13Z

高性能推理只针对推理阶段哈，目前不支持在训练的时候使用

flysssss · 2025-03-11T08:47:31Z

@zhang-prog 感谢解答，发现错误是因为脚本在旧版本的代码路径里执行的，更换目录与新instance_segmentation.yaml后可以正常使用高性能推理。

目前推理3张图片后仍然会出现段错误
图片输入分辨率1920x1080

Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=False trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=False
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 08:39:32.403381 5694 gpu_resources.cc:119] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 08:39:32.405112 5694 gpu_resources.cc:164] device: 3, cuDNN Version: 8.6.
[INFO] ultra_infer/runtime/runtime.cc(265)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
{'res': {'input_path': './test2/LD20250306141656_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.977288544178009, 'coordinate': [648.0316162109375, 605.0195922851562, 1212.0203857421875, 962.4086303710938]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9665425419807434, 'coordinate': [1005.0983276367188, 800.2238159179688, 1222.9437255859375, 1001.8690795898438]}], 'masks': '...'}}
{'res': {'input_path': './test2/LD20250306141730_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9753938913345337, 'coordinate': [643.3614501953125, 605.8696899414062, 1213.97119140625, 962.4130249023438]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9663092494010925, 'coordinate': [1006.2348022460938, 800.6912841796875, 1223.3809814453125, 1001.9779663085938]}], 'masks': '...'}}
{'res': {'input_path': './test2/LD20250306141751_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9743572473526001, 'coordinate': [651.3355102539062, 602.456787109375, 1203.0601806640625, 970.91943359375]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9699466824531555, 'coordinate': [1244.004150390625, 876.125, 1483.8446044921875, 1045.2181396484375]}], 'masks': '...'}}
Segmentation fault (core dumped)

flysssss · 2025-03-11T11:02:21Z

日志开关打开后，详细信息如下：

λ ys-ai-gpu03 /paddle/03test/test paddlex --pipeline instance_segmentation.yaml --input ./test2/LD20250306142040_0307_sweetpotato_shrimp_yiwu.jpg --device gpu:0 --use_hpip
Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=False trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=True
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(338)::InitFromPaddle Finish paddle inference config with summary as:
[INFO]
+--------------------------+------------------------------------------+
| Option | Value |
+--------------------------+------------------------------------------+
| model_file | best_model/inference/inference.pdmodel |
| params_file | best_model/inference/inference.pdiparams |
+--------------------------+------------------------------------------+
| cpu_math_thread | 8 |
| enable_mkldnn | true |
| mkldnn_cache_capacity | 10 |
+--------------------------+------------------------------------------+
| use_gpu | true |
| use_cutlass | false |
| gpu_device_id | 0 |
| enable_gpu_mixed | 0 |
| mixed_precision_mode | fp32 |
| memory_pool_init_size | 100MB |
| use_external_stream | false |
| thread_local_stream | false |
| use_tensorrt | false |
+--------------------------+------------------------------------------+
| use_xpu | false |
+--------------------------+------------------------------------------+
| use_cinn_compiler | false |
| save_optimized_model | false |
| ir_optim | true |
| ir_debug | false |
| use_optimized_model | false |
| memory_optim | true |
| enable_profile | false |
| enable_log | true |
| collect_shape_range_info | false |
+--------------------------+------------------------------------------+

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0311 10:53:54.710409 42149 analysis_predictor.cc:2057] Ir optimization is turned off, no ir pass will be executed.
--- Running analysis [ir_graph_build_pass]
I0311 10:53:54.741750 42149 executor.cc:183] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_graph_to_program_pass]
I0311 10:53:55.211637 42149 analysis_predictor.cc:2146] ======= ir optimization completed =======
--- Running PIR pass [add_shadow_output_after_dead_parameter_pass]
--- Running PIR pass [delete_quant_dequant_linear_op_pass]
--- Running PIR pass [delete_weight_dequant_linear_op_pass]
--- Running PIR pass [map_op_to_another_pass]
I0311 10:53:55.312294 42149 print_statistics.cc:50] --- detected [27] subgraphs!
--- Running PIR pass [identity_op_clean_pass]
I0311 10:53:55.323707 42149 print_statistics.cc:50] --- detected [12] subgraphs!
--- Running PIR pass [silu_fuse_pass]
I0311 10:53:55.337136 42149 print_statistics.cc:50] --- detected [19] subgraphs!
--- Running PIR pass [conv2d_bn_fuse_pass]
--- Running PIR pass [conv2d_add_act_fuse_pass]
--- Running PIR pass [conv2d_add_fuse_pass]
I0311 10:53:55.353192 42149 print_statistics.cc:50] --- detected [13] subgraphs!
--- Running PIR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running PIR pass [fused_rotary_position_embedding_pass]
--- Running PIR pass [multihead_matmul_fuse_pass]
--- Running PIR pass [matmul_add_act_fuse_pass]
I0311 10:53:55.446043 42149 print_statistics.cc:50] --- detected [133] subgraphs!
--- Running PIR pass [fc_elementwise_layernorm_fuse_pass]
--- Running PIR pass [add_norm_fuse_pass]
I0311 10:53:55.487458 42149 print_statistics.cc:50] --- detected [20] subgraphs!
--- Running PIR pass [group_norm_silu_fuse_pass]
--- Running PIR pass [matmul_scale_fuse_pass]
--- Running PIR pass [matmul_transpose_fuse_pass]
--- Running PIR pass [transpose_flatten_concat_fuse_pass]
--- Running PIR pass [remove_redundant_transpose_pass]
--- Running PIR pass [horizontal_fuse_pass]
I0311 10:53:55.502025 42149 print_statistics.cc:50] --- detected [1] subgraphs!
--- Running PIR pass [common_subexpression_elimination_pass]
I0311 10:53:55.512122 42149 print_statistics.cc:50] --- detected [558] subgraphs!
--- Running PIR pass [params_sync_among_devices_pass]
I0311 10:53:55.575439 42149 print_statistics.cc:50] --- detected [721] subgraphs!
--- Running PIR pass [constant_folding_pass]
I0311 10:53:55.577561 42149 pir_interpreter.cc:1586] New Executor is Running ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 10:53:55.578130 42149 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 10:53:55.579555 42149 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
I0311 10:53:55.580062 42149 pir_interpreter.cc:1610] pir interpreter is running by multi-thread mode ...
I0311 10:53:55.741286 42149 print_statistics.cc:44] --- detected [160, 2088] subgraphs!
--- Running PIR pass [dead_code_elimination_pass]
I0311 10:53:55.743919 42149 print_statistics.cc:50] --- detected [91] subgraphs!
--- Running PIR pass [replace_fetch_with_shadow_output_pass]
I0311 10:53:55.745721 42149 print_statistics.cc:50] --- detected [3] subgraphs!
--- Running PIR pass [remove_shadow_feed_pass]
I0311 10:53:55.782218 42149 print_statistics.cc:50] --- detected [3] subgraphs!
--- Running PIR pass [inplace_pass]
I0311 10:53:56.059289 42149 print_statistics.cc:50] --- detected [325] subgraphs!
I0311 10:53:56.060429 42149 analysis_predictor.cc:1142] ======= pir optimization completed =======
[INFO] ultra_infer/runtime/runtime.cc(265)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
I0311 10:53:56.906312 42149 pir_interpreter.cc:1607] pir interpreter is running by trace mode ...
Segmentation fault (core dumped)

将 enable_trt=True时，转换模型会报错

Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=True trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=False
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(67)::BuildOption Will try to use tensorrt inference with Paddle Backend.
[WARNING] ultra_infer/runtime/backends/paddle/paddle_backend.cc(79)::BuildOption Detect that tensorrt cache file has been set to best_model/inference/trt_serialized.trt, but while enable paddle2trt, please notice that the cache file will save to the directory where paddle model saved.
[WARNING] ultra_infer/runtime/backends/paddle/paddle_backend.cc(173)::BuildOption Currently, Paddle-TensorRT does not support the new IR, and the old IR will be used.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(288)::InitFromPaddle Start generating shape range info file.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0311 10:28:32.021153 382 analysis_config.cc:1475] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
I0311 10:28:35.095312 382 analysis_predictor.cc:2057] Ir optimization is turned off, no ir pass will be executed.
--- Running analysis [ir_graph_build_pass]
I0311 10:28:35.124984 382 executor.cc:183] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0311 10:28:35.302951 382 ir_params_sync_among_devices_pass.cc:50] Sync params from CPU to GPU
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 10:28:35.304216 382 gpu_resources.cc:119] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 10:28:35.305624 382 gpu_resources.cc:164] device: 3, cuDNN Version: 8.6.
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_graph_to_program_pass]
I0311 10:28:37.701107 382 analysis_predictor.cc:2146] ======= ir optimization completed =======
I0311 10:28:37.713642 382 naive_executor.cc:211] --- skip [feed], feed -> scale_factor
I0311 10:28:37.713673 382 naive_executor.cc:211] --- skip [feed], feed -> image
I0311 10:28:37.713681 382 naive_executor.cc:211] --- skip [feed], feed -> im_shape
I0311 10:28:37.724295 382 naive_executor.cc:211] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0311 10:28:37.724318 382 naive_executor.cc:211] --- skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
I0311 10:28:37.724325 382 naive_executor.cc:211] --- skip [save_infer_model/scale_2.tmp_0], fetch -> fetch
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption im_shape: the max shape = [8, 2], the min shape = [1, 2], the opt shape = [1, 2]
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption image: the max shape = [8, 3, 640, 640], the min shape = [1, 3, 640, 640], the opt shape = [1, 3, 640, 640]
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption scale_factor: the max shape = [8, 2], the min shape = [1, 2], the opt shape = [1, 2]
W0311 10:28:37.766763 382 analysis_predictor.cc:2646] When collecting shapes, it is recommended to run multiple loops to obtain more accurate shape information.
I0311 10:28:37.766825 382 program_interpreter.cc:243] New Executor is Running.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(321)::InitFromPaddle Finish generating shape range info file.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(323)::InitFromPaddle Start loading shape range info file best_model/inference/shape_range_info.pbtxt to set TensorRT dynamic shape.
W0311 10:29:15.523947 382 place.cc:253] The paddle::PlaceType::kCPU/kGPU is deprecated since version 2.3, and will be removed in version 2.4! Please use Tensor::is_cpu()/is_gpu() method to determine the type of place.
E0311 10:33:30.730705 382 helper.h:131] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 5) [Shuffle]...cast (Output: save_infer_model/scale_2.tmp_0_subgraph_4394)]}.)
E0311 10:33:30.741441 382 helper.h:131] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
File "/paddle/03test/test/test.py", line 39, in
pipeline = create_pipeline(
File "/root/PaddleX/paddlex/inference/pipelines/init.py", line 155, in create_pipeline
pipeline = BasePipeline.get(pipeline_name)(
File "/root/PaddleX/paddlex/inference/pipelines/instance_segmentation/pipeline.py", line 49, in init
self.instance_segmentation_model = self.create_model(
File "/root/PaddleX/paddlex/inference/pipelines/base.py", line 86, in create_model
model = create_predictor(
File "/root/PaddleX/paddlex/inference/models/init.py", line 102, in create_predictor
return _create_hp_predictor(
File "/root/PaddleX/paddlex/inference/models/init.py", line 66, in _create_hp_predictor
predictor = HPPredictor.get(model_name)(
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/instance_segmentation.py", line 46, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/base.py", line 67, in init
self._ui_model = self.build_ui_model()
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/base.py", line 102, in build_ui_model
return self._build_ui_model(option)
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/instance_segmentation.py", line 61, in _build_ui_model
model = ui.vision.detection.PaddleDetectionModel(
File "/usr/local/lib/python3.10/dist-packages/ultra_infer/vision/detection/ppdet/init.py", line 900, in init
self._model = C.vision.detection.PaddleDetectionModel(
RuntimeError:

C++ Traceback (most recent call last):

0 paddle_infer::CreatePredictor(paddle::AnalysisConfig const&)
1 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&)
2 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
3 paddle::AnalysisPredictor::Init(std::shared_ptrpaddle::framework::Scope const&, std::shared_ptrpaddle::framework::ProgramDesc const&)
4 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptrpaddle::framework::ProgramDesc const&)
5 paddle::AnalysisPredictor::OptimizeInferenceProgram()
6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_deletepaddle::framework::ir::Graph >)
8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
9 paddle::inference::analysis::TensorRtSubgraphPass::ApplyImpl(paddle::framework::ir::Graph*) const
10 paddle::inference::analysis::TensorRtSubgraphPass::CreateTensorRTOp(paddle::framework::ir::Node*, paddle::framework::ir::Graph*, std::vector<std::string, std::allocator<std::string > > const&, std::vector<std::string, std::allocator<std::string > >*, bool) const
11 common::enforce::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

FatalError: Build TensorRT serialized network failed! Please recheck you configurations related to paddle-TensorRT.
[Hint: ihost_memory_ should not be null.] (at /paddle/Paddle/paddle/fluid/inference/tensorrt/engine.cc:434)

zhang-prog · 2025-03-11T12:12:58Z

您使用的是什么实例分割模型？麻烦提供一下模型名称。

flysssss · 2025-03-11T12:38:12Z

@zhang-prog 模型是RT-DETR-L，复现应该很简单，我用官方实例分割通用pipeline instance_segmentation 应该是RT-DETR-S模型，input传入1920*1080测试图片文件夹，开启高性能推理，就会出现段错误。下午的时候我试了，也是必现。关闭高性能推理，所有图片都可以正常推理

zhang-prog · 2025-03-11T12:45:29Z

您好，您应该使用的是Mask-RT-DETR-S吧？这个在我们列表里是不支持的，您试试其他类型的实例分割模型。

flysssss · 2025-03-11T12:57:29Z

@zhang-prog 你好，开始出错的是mask-RT-DETR-L，因为这个模型是我在私有数据集上微调训练的。然后又试了一下官方通用实例分割类型，也是一样的错误。目前是mask-RT-DETR系列，高性能推理框架都不支持吗？我删除出现段错误的图片之后剩下的50张是可以正常推理的，速度也很快，相比普通推理2s一张，提升到了100ms左右一张。

zhang-prog · 2025-03-11T13:01:25Z

您是说使用Mask-RT-DETR-L 模型进行高性能推理，除了 1920*1080 的图片出错，其他的图片都推理正常吗？

flysssss · 2025-03-11T13:11:41Z

您是说使用Mask-RT-DETR-L 模型进行高性能推理，除了 1920*1080 的图片出错，其他的图片都推理正常吗？

60张1920*1080的测试图片，10张是出现段错误的，删除后剩下的50张是可以正常推理的。这跟我是用3.0.0b1版本训练Mask-RT-DETR-L模型有关吗？

flysssss · 2025-03-11T13:14:59Z

训练集也全是19201080分辨率，Mask-RT-DETR-L输入尺寸应该是640640，我在对比开启高性能推理跟关闭高性能推理，同一张图输出json数据时发现，关闭高性能推理输出的box坐标为0的时候，高性能推理的box坐标会出现负值-1.3左右

flysssss · 2025-03-11T13:26:50Z

mode: paddle
draw_threshold: 0.5
metric: COCO
use_dynamic_shape: false
Global:
model_name: Mask-RT-DETR-L
arch: DETR
min_subgraph_size: 3
mask: true
Preprocess:

interp: 2
keep_ratio: false
target_size:
- 640
- 640
  type: Resize
mean:
- 0.0
- 0.0
- 0.0
  norm_type: none
  std:
- 1.0
- 1.0
- 1.0
  type: NormalizeImage
type: Permute
label_list:
sweetpotato
shrimp
chiffon
Hpi:
backend_configs:
paddle_infer:
enable_log_info: True
trt_dynamic_shapes: &id001
im_shape:
- - 1
- 2
- - 1
- 2
- - 8
- 2
image:
- - 1
- 3
- 640
- 640
- - 1
- 3
- 640
- 640
- - 8
- 3
- 640
- 640
scale_factor:
- - 1
- 2
- - 1
- 2
- - 8
- 2
trt_dynamic_shape_input_data:
im_shape:
- - 640
- 640
- - 640
- 640
- - 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
- 640
scale_factor:
- - 2
- 2
- - 1
- 1
- - 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
- 0.67
tensorrt:
dynamic_shapes: *id001

模型的推理inference.yml文件

zhang-prog · 2025-03-11T13:28:45Z

好的，我这边复现定位一下，您现在是在3.0rc分支上使用3.0b1时自训练的Mask-RT-DETR-L模型吗？

flysssss · 2025-03-11T13:40:59Z

好的，我这边复现定位一下，您现在是在3.0rc分支上使用3.0b1时自训练的Mask-RT-DETR-L模型吗？

是的，
然后3.0b1分支，采用在线激活的方式执行高性能推理框架也是出现一样的问题，就是issues #3569 的问题

zhang-prog · 2025-03-11T14:10:06Z

您好，我使用3.0rc分支，使用2560x1600的图片对Mask-RT-DETR-L模型进行高性能推理测试，结果显示推理正常。

命令如下：

docker run --gpus all --name paddlex_test -it ccr-2vdh3abv-pub.cnc.bj.baidubce
.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash
paddlex --install hpi-gpu
vim paddlex/configs/pipelines/instance_segmentation.yaml # Mask-RT-DETR-S 改为 Mask-RT-DETR-L
paddlex --pipeline instance_segmentation --input https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/application/semantic_segmentation/makassaridn-road_demo.png --device gpu:0 --use_hpip

运行截图如下：

您看是否方便把出现错误的图片发上来我试着推理一下，如果仍然成功，那就很有可能是模型本身的问题，因为3.0b1的模型确实有点老了。这中间的不确定因素有点多。

flysssss · 2025-03-11T15:43:02Z

您好，我使用3.0rc分支，使用2560x1600的图片对Mask-RT-DETR-L模型进行高性能推理测试，结果显示推理正常。

命令如下：
docker run --gpus all --name paddlex_test -it ccr-2vdh3abv-pub.cnc.bj.baidubce
.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash
paddlex --install hpi-gpu
vim paddlex/configs/pipelines/instance_segmentation.yaml # Mask-RT-DETR-S 改为 Mask-RT-DETR-L
paddlex --pipeline instance_segmentation --input https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/application/semantic_segmentation/makassaridn-road_demo.png --device gpu:0 --use_hpip
运行截图如下：
您看是否方便把出现错误的图片发上来我试着推理一下，如果仍然成功，那就很有可能是模型本身的问题，因为3.0b1的模型确实有点老了。这中间的不确定因素有点多。

你好，这张图片Mask-RT-DETR-L官方模型是会出现段错误的

flysssss · 2025-03-11T16:11:47Z

flysssss · 2025-03-12T06:41:37Z

3.0rc版本重新训练了1epoch， Mask-RT-DETR-H模型，高性能推理时也会出现段错误，关闭后推理正常

zhang-prog · 2025-03-12T08:52:45Z

您好，我用这张1920x1080的图片确实复现了错误，正在排查问题中，您可以先用其他系列模型进行训练和推理～

TingquanGao assigned zhang-prog Mar 7, 2025

zhang-prog mentioned this issue Mar 11, 2025

高性能推理概率出现段错误 #3569

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

实例分割模Mask-RT-DETR-L型推理问题 #3548

实例分割模Mask-RT-DETR-L型推理问题 #3548

flysssss commented Mar 7, 2025 •

edited

Loading

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

zhang-prog commented Mar 10, 2025

flysssss commented Mar 10, 2025

flysssss commented Mar 10, 2025

zhang-prog commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025 •

edited

Loading

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 12, 2025

zhang-prog commented Mar 12, 2025

实例分割模Mask-RT-DETR-L型推理问题 #3548

实例分割模Mask-RT-DETR-L型推理问题 #3548

Comments

flysssss commented Mar 7, 2025 • edited Loading

Checklist:

描述问题

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

flysssss commented Mar 7, 2025

zhang-prog commented Mar 10, 2025

flysssss commented Mar 10, 2025

flysssss commented Mar 10, 2025

zhang-prog commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

C++ Traceback (most recent call last):

Error Message Summary:

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025 • edited Loading

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

zhang-prog commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 11, 2025

flysssss commented Mar 12, 2025

zhang-prog commented Mar 12, 2025

flysssss commented Mar 7, 2025 •

edited

Loading

flysssss commented Mar 11, 2025 •

edited

Loading