Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实例分割模Mask-RT-DETR-L型推理问题 #3548

Open
4 tasks
flysssss opened this issue Mar 7, 2025 · 28 comments
Open
4 tasks

实例分割模Mask-RT-DETR-L型推理问题 #3548

flysssss opened this issue Mar 7, 2025 · 28 comments
Assignees

Comments

@flysssss
Copy link

flysssss commented Mar 7, 2025

Checklist:

描述问题

官方教程中Mask-RT-DETR-L模型GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32,推理耗时在
46.5059ms,

复现机器V100
图片分辨率尺寸1920x1080

推理代码
from paddlex import create_model
import time,os

#model = create_model("./output/best_model/inference")
model = create_model("Mask-RT-DETR-L")
path = './test/'
for filename in os.listdir(path):
print(filename)
img_path = path + filename
name =filename.split('.')
file_name = name[0]
for i in range(20):
start = time.time()
output = model.predict(img_path, batch_size=1)
for res in output:
res.print() # 打印预测的结构化输出
res.save_to_img("./res/") # 保存结果可视化图像
res.save_to_json("./res/") # 保存预测的结构化输出
end = time.time()
cost =end - start
print('花费 {:.5f} 秒'.format(cost))

耗时在8-9s左右
目前测试官方模型与自己微调训练的模型耗时一致都在8-9s,显存占用3GB左右,不是CPU推理
导致耗时长的原因是:
1.图片输入分辨率大?
2.未启用高性能推理插件吗?
8-9s与46ms差距有点大

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

环境官方镜像:
docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.0b1-paddlepaddle3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

启用高性能推理插件,耗时明显减少。
目前脚本可以正常推理
paddlex --pipeline ./instance_segmentation.yaml --input ./test/ --device gpu:0 --use_hpip --serial_number xxxxxxxx
但是写成API调用方式报错提示
TypeError: _SingleModelPipeline.init() got an unexpected keyword argument 'serial_number'
推理代码如下:
from paddlex import create_pipeline
import os,time
pipeline = create_pipeline(
pipeline="./instance_segmentation.yaml",
use_hpip=True,
serial_number=“xxxxxxxxx”
)

目前paddlex版本是3.0.0b1
pip install https://paddle-model-ecology.bj.bcebos.com/paddlex/whl/paddlex-3.0.0b1-py3-none-any.whl

我看官方文档更新

  1. 模型产线命令行推理
    新增支持:
    推理超参数,具体参数与产线相关,详见产线文档。例如,图像分类产线支持 --topk 参数,指定返回的前 n 个结果。
    删除:
    --serial_number,高性能推理不再需要传入序列号。
  2. create_pipeline()
    删除:
    高性能推理 hpi_params 参数中的 serial_number 参数,高性能推理不再需要传入序列号。
    不再支持:
    产线推理超参数设置,相关参数设置均需通过产线配置文件完成,如 batch_size、阈值等。

为什么3.0.0b1版本的create_pipeline不支持serial_number入参,该如何传入serial_number呢?

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

对应paddlex版本的推理示例文档3.0.0b1,create_pipeline()到底怎么设置入参格式,也没有具体接口文档
升级后的3.0.0rc0版本该如何进行高性能推理?create_pipeline()入参到底有哪些?

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

PaddleX 高性能推理指南

https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-beta1/docs/pipeline_deploy/high_performance_inference.md

from paddlex import create_pipeline

pipeline = create_pipeline(
pipeline="image_classification",

  • use_hpip=True,
  • serial_number="{序列号}",
    )

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")

这一部分示例目前在3.0.0b1,3.0.0rc0版本均不可用,在启用use_hpip的情况下,麻烦给出可用的create_pipeline入参列表。

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

还有一个疑问,在3.0.0rc0版本是计划删除 开源产线部署SDK序列号管理 这一步骤,直接可以进行use_hpip高性能推理吗?

@flysssss
Copy link
Author

flysssss commented Mar 7, 2025

pipeline = create_pipeline(
pipeline="./instance_segmentation.yaml",
use_hpip=True,
device="gpu:3",
hpi_params={"serial_number": "xxxx-D532-xxxx-863B"},
)
3.0.0b1版本如上示例可以开启hpip高性能加速,请问3.0.0rc0版本
删除:
高性能推理 hpi_params 参数中的 serial_number 参数,高性能推理不再需要传入序列号。
该如何进行高性能推理?

@zhang-prog
Copy link
Collaborator

您的问题都可以在3.0rc0版本的高性能推理文档中得到解决哈。现在不需要再传入序列号了,不需要传入hpi_params={"serial_number": "xxxx-D532-xxxx-863B"},,文档中有例子可以参考下。

@flysssss
Copy link
Author

@zhang-prog 按照文档重新安装镜像,还是推理报错
步骤1:
docker run --gpus all --name paddlex -v $PWD:/paddle --shm-size=8g --network=host -it ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash

步骤2:
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cublas-cu11==11.11.3.6; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-cupti-cu11==11.8.87; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-runtime-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cudnn-cu11==8.9.6.50; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cufft-cu11==10.9.0.58; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-curand-cu11==10.3.0.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cusolver-cu11==11.4.1.48; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cusparse-cu11==11.7.5.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-nccl-cu11==2.19.3; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-nvtx-cu11==11.8.86; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlex 3.0.0rc0 requires albumentations==1.4.10, but you have albumentations 1.3.1 which is incompatible.
paddlex 3.0.0rc0 requires opencv-python-headless==4.10.0.84, but you have opencv-python-headless 4.6.0.66 which is incompatible.

安装好第一次报冲突的库

步骤3:
paddlex --install hpi-gpu

步骤4:
pip install erniebot -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

步骤5:

pip install erniebot_agent -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

步骤6:
执行推理,报错

λ ys-ai-GPU-6 /paddle/44seg/PaddleX python test.py
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvcaffe_parser.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvcaffe_parser.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer_plugin.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer_plugin.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvinfer.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvonnxparser.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvonnxparser.so.
INFO:root:Create a symbolic link pointing to /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvparsers.so.8 named /usr/local/lib/python3.10/dist-packages/ultra_infer/libs/third_libs/tensorrt/lib/libnvparsers.so.
Traceback (most recent call last):
File "/paddle/44seg/PaddleX/test.py", line 39, in
pipeline = create_pipeline(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/init.py", line 119, in create_pipeline
return create_pipeline_from_config(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/init.py", line 94, in create_pipeline_from_config
pipeline = BasePipeline.get(pipeline_name)(
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 39, in patched___init_
ret = ctx.run(init_func, self, *args, **kwargs)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 39, in patched___init_
ret = ctx.run(init_func, self, *args, **kwargs)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/single_model_pipeline.py", line 22, in init
self._build_predictor(model)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/single_model_pipeline.py", line 26, in _build_predictor
self.model = self._create(model)
File "/paddle/44seg/PaddleX/paddlex/inference/pipelines/base.py", line 71, in _create
return create_predictor(
File "/paddle/44seg/PaddleX/paddlex/inference/models/init.py", line 78, in create_predictor
return _create_hp_predictor(
File "/paddle/44seg/PaddleX/paddlex/inference/models/init.py", line 46, in _create_hp_predictor
raise RuntimeError(
RuntimeError: The PaddleX HPI plugin is not properly installed, and the high-performance model inference features are not available.

@flysssss
Copy link
Author

还有一个训练问题,训练的时候val验证推理速度太慢,可以在训练时开启高性能推理框架吗?

@zhang-prog
Copy link
Collaborator

paddlepaddle-gpu 3.0.0rc1 requires nvidia-cublas-cu11==11.11.3.6; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-cupti-cu11==11.8.87; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
paddlepaddle-gpu 3.0.0rc1 requires nvidia-cuda-runtime-cu11==11.8.89; platform_system == "Linux" and platform_machine == "x86_64", which is not installed.
......

这些提示信息不用关心,您使用paddlex3.0.0rc0镜像创建一个新容器之后直接执行paddlex --install hpi-gpu即可。建议重新创建一个容器再试试。

@zhang-prog
Copy link
Collaborator

高性能推理只针对推理阶段哈,目前不支持在训练的时候使用

@flysssss
Copy link
Author

@zhang-prog 感谢解答,发现错误是因为脚本在旧版本的代码路径里执行的,更换目录与新instance_segmentation.yaml后可以正常使用高性能推理。

目前推理3张图片后仍然会出现段错误
图片输入分辨率1920x1080

Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=False trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=False
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 08:39:32.403381 5694 gpu_resources.cc:119] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 08:39:32.405112 5694 gpu_resources.cc:164] device: 3, cuDNN Version: 8.6.
[INFO] ultra_infer/runtime/runtime.cc(265)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
{'res': {'input_path': './test2/LD20250306141656_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.977288544178009, 'coordinate': [648.0316162109375, 605.0195922851562, 1212.0203857421875, 962.4086303710938]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9665425419807434, 'coordinate': [1005.0983276367188, 800.2238159179688, 1222.9437255859375, 1001.8690795898438]}], 'masks': '...'}}
{'res': {'input_path': './test2/LD20250306141730_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9753938913345337, 'coordinate': [643.3614501953125, 605.8696899414062, 1213.97119140625, 962.4130249023438]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9663092494010925, 'coordinate': [1006.2348022460938, 800.6912841796875, 1223.3809814453125, 1001.9779663085938]}], 'masks': '...'}}
{'res': {'input_path': './test2/LD20250306141751_0307_sweetpotato_shrimp_yiwu.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9743572473526001, 'coordinate': [651.3355102539062, 602.456787109375, 1203.0601806640625, 970.91943359375]}, {'cls_id': 0, 'label': 'sweetpotato', 'score': 0.9699466824531555, 'coordinate': [1244.004150390625, 876.125, 1483.8446044921875, 1045.2181396484375]}], 'masks': '...'}}
Segmentation fault (core dumped)

@flysssss
Copy link
Author

日志开关打开后,详细信息如下:

λ ys-ai-gpu03 /paddle/03test/test paddlex --pipeline instance_segmentation.yaml --input ./test2/LD20250306142040_0307_sweetpotato_shrimp_yiwu.jpg --device gpu:0 --use_hpip
Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=False trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=True
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(338)::InitFromPaddle Finish paddle inference config with summary as:
[INFO]
+--------------------------+------------------------------------------+
| Option | Value |
+--------------------------+------------------------------------------+
| model_file | best_model/inference/inference.pdmodel |
| params_file | best_model/inference/inference.pdiparams |
+--------------------------+------------------------------------------+
| cpu_math_thread | 8 |
| enable_mkldnn | true |
| mkldnn_cache_capacity | 10 |
+--------------------------+------------------------------------------+
| use_gpu | true |
| use_cutlass | false |
| gpu_device_id | 0 |
| enable_gpu_mixed | 0 |
| mixed_precision_mode | fp32 |
| memory_pool_init_size | 100MB |
| use_external_stream | false |
| thread_local_stream | false |
| use_tensorrt | false |
+--------------------------+------------------------------------------+
| use_xpu | false |
+--------------------------+------------------------------------------+
| use_cinn_compiler | false |
| save_optimized_model | false |
| ir_optim | true |
| ir_debug | false |
| use_optimized_model | false |
| memory_optim | true |
| enable_profile | false |
| enable_log | true |
| collect_shape_range_info | false |
+--------------------------+------------------------------------------+

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0311 10:53:54.710409 42149 analysis_predictor.cc:2057] Ir optimization is turned off, no ir pass will be executed.
--- Running analysis [ir_graph_build_pass]
I0311 10:53:54.741750 42149 executor.cc:183] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_graph_to_program_pass]
I0311 10:53:55.211637 42149 analysis_predictor.cc:2146] ======= ir optimization completed =======
--- Running PIR pass [add_shadow_output_after_dead_parameter_pass]
--- Running PIR pass [delete_quant_dequant_linear_op_pass]
--- Running PIR pass [delete_weight_dequant_linear_op_pass]
--- Running PIR pass [map_op_to_another_pass]
I0311 10:53:55.312294 42149 print_statistics.cc:50] --- detected [27] subgraphs!
--- Running PIR pass [identity_op_clean_pass]
I0311 10:53:55.323707 42149 print_statistics.cc:50] --- detected [12] subgraphs!
--- Running PIR pass [silu_fuse_pass]
I0311 10:53:55.337136 42149 print_statistics.cc:50] --- detected [19] subgraphs!
--- Running PIR pass [conv2d_bn_fuse_pass]
--- Running PIR pass [conv2d_add_act_fuse_pass]
--- Running PIR pass [conv2d_add_fuse_pass]
I0311 10:53:55.353192 42149 print_statistics.cc:50] --- detected [13] subgraphs!
--- Running PIR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running PIR pass [fused_rotary_position_embedding_pass]
--- Running PIR pass [multihead_matmul_fuse_pass]
--- Running PIR pass [matmul_add_act_fuse_pass]
I0311 10:53:55.446043 42149 print_statistics.cc:50] --- detected [133] subgraphs!
--- Running PIR pass [fc_elementwise_layernorm_fuse_pass]
--- Running PIR pass [add_norm_fuse_pass]
I0311 10:53:55.487458 42149 print_statistics.cc:50] --- detected [20] subgraphs!
--- Running PIR pass [group_norm_silu_fuse_pass]
--- Running PIR pass [matmul_scale_fuse_pass]
--- Running PIR pass [matmul_transpose_fuse_pass]
--- Running PIR pass [transpose_flatten_concat_fuse_pass]
--- Running PIR pass [remove_redundant_transpose_pass]
--- Running PIR pass [horizontal_fuse_pass]
I0311 10:53:55.502025 42149 print_statistics.cc:50] --- detected [1] subgraphs!
--- Running PIR pass [common_subexpression_elimination_pass]
I0311 10:53:55.512122 42149 print_statistics.cc:50] --- detected [558] subgraphs!
--- Running PIR pass [params_sync_among_devices_pass]
I0311 10:53:55.575439 42149 print_statistics.cc:50] --- detected [721] subgraphs!
--- Running PIR pass [constant_folding_pass]
I0311 10:53:55.577561 42149 pir_interpreter.cc:1586] New Executor is Running ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 10:53:55.578130 42149 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 10:53:55.579555 42149 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
I0311 10:53:55.580062 42149 pir_interpreter.cc:1610] pir interpreter is running by multi-thread mode ...
I0311 10:53:55.741286 42149 print_statistics.cc:44] --- detected [160, 2088] subgraphs!
--- Running PIR pass [dead_code_elimination_pass]
I0311 10:53:55.743919 42149 print_statistics.cc:50] --- detected [91] subgraphs!
--- Running PIR pass [replace_fetch_with_shadow_output_pass]
I0311 10:53:55.745721 42149 print_statistics.cc:50] --- detected [3] subgraphs!
--- Running PIR pass [remove_shadow_feed_pass]
I0311 10:53:55.782218 42149 print_statistics.cc:50] --- detected [3] subgraphs!
--- Running PIR pass [inplace_pass]
I0311 10:53:56.059289 42149 print_statistics.cc:50] --- detected [325] subgraphs!
I0311 10:53:56.060429 42149 analysis_predictor.cc:1142] ======= pir optimization completed =======
[INFO] ultra_infer/runtime/runtime.cc(265)::CreatePaddleBackend Runtime initialized with Backend::PDINFER in Device::GPU.
I0311 10:53:56.906312 42149 pir_interpreter.cc:1607] pir interpreter is running by trace mode ...
Segmentation fault (core dumped)

将 enable_trt=True时,转换模型会报错

Only Paddle model is detected. Paddle model will be used by default.
Backend: paddle_infer
Backend config: cpu_num_threads=8 enable_mkldnn=True enable_trt=True trt_dynamic_shapes={'im_shape': [[1, 2], [1, 2], [8, 2]], 'image': [[1, 3, 640, 640], [1, 3, 640, 640], [8, 3, 640, 640]], 'scale_factor': [[1, 2], [1, 2], [8, 2]]} trt_dynamic_shape_input_data={'im_shape': [[640.0, 640.0], [640.0, 640.0], [640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0, 640.0]], 'scale_factor': [[2.0, 2.0], [1.0, 1.0], [0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67, 0.67]]} trt_precision='FP32' enable_log_info=False
[INFO] ultra_infer/vision/common/processors/transform.cc(44)::FuseNormalizeCast Normalize and Cast are fused to Normalize in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(91)::FuseNormalizeHWC2CHW Normalize and HWC2CHW are fused to NormalizeAndPermute in preprocessing pipeline.
[INFO] ultra_infer/vision/common/processors/transform.cc(157)::FuseNormalizeColorConvert BGR2RGB and NormalizeAndPermute are fused to NormalizeAndPermute with swap_rb=1
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(28)::BuildOption Will inference_precision float32
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(67)::BuildOption Will try to use tensorrt inference with Paddle Backend.
[WARNING] ultra_infer/runtime/backends/paddle/paddle_backend.cc(79)::BuildOption Detect that tensorrt cache file has been set to best_model/inference/trt_serialized.trt, but while enable paddle2trt, please notice that the cache file will save to the directory where paddle model saved.
[WARNING] ultra_infer/runtime/backends/paddle/paddle_backend.cc(173)::BuildOption Currently, Paddle-TensorRT does not support the new IR, and the old IR will be used.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(288)::InitFromPaddle Start generating shape range info file.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0311 10:28:32.021153 382 analysis_config.cc:1475] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape.
I0311 10:28:35.095312 382 analysis_predictor.cc:2057] Ir optimization is turned off, no ir pass will be executed.
--- Running analysis [ir_graph_build_pass]
I0311 10:28:35.124984 382 executor.cc:183] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0311 10:28:35.302951 382 ir_params_sync_among_devices_pass.cc:50] Sync params from CPU to GPU
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0311 10:28:35.304216 382 gpu_resources.cc:119] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 11.8, Runtime API Version: 11.8
W0311 10:28:35.305624 382 gpu_resources.cc:164] device: 3, cuDNN Version: 8.6.
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_graph_to_program_pass]
I0311 10:28:37.701107 382 analysis_predictor.cc:2146] ======= ir optimization completed =======
I0311 10:28:37.713642 382 naive_executor.cc:211] --- skip [feed], feed -> scale_factor
I0311 10:28:37.713673 382 naive_executor.cc:211] --- skip [feed], feed -> image
I0311 10:28:37.713681 382 naive_executor.cc:211] --- skip [feed], feed -> im_shape
I0311 10:28:37.724295 382 naive_executor.cc:211] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
I0311 10:28:37.724318 382 naive_executor.cc:211] --- skip [save_infer_model/scale_1.tmp_0], fetch -> fetch
I0311 10:28:37.724325 382 naive_executor.cc:211] --- skip [save_infer_model/scale_2.tmp_0], fetch -> fetch
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption im_shape: the max shape = [8, 2], the min shape = [1, 2], the opt shape = [1, 2]
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption image: the max shape = [8, 3, 640, 640], the min shape = [1, 3, 640, 640], the opt shape = [1, 3, 640, 640]
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(551)::GetDynamicShapeFromOption scale_factor: the max shape = [8, 2], the min shape = [1, 2], the opt shape = [1, 2]
W0311 10:28:37.766763 382 analysis_predictor.cc:2646] When collecting shapes, it is recommended to run multiple loops to obtain more accurate shape information.
I0311 10:28:37.766825 382 program_interpreter.cc:243] New Executor is Running.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(321)::InitFromPaddle Finish generating shape range info file.
[INFO] ultra_infer/runtime/backends/paddle/paddle_backend.cc(323)::InitFromPaddle Start loading shape range info file best_model/inference/shape_range_info.pbtxt to set TensorRT dynamic shape.
W0311 10:29:15.523947 382 place.cc:253] The paddle::PlaceType::kCPU/kGPU is deprecated since version 2.3, and will be removed in version 2.4! Please use Tensor::is_cpu()/is_gpu() method to determine the type of place.
E0311 10:33:30.730705 382 helper.h:131] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 5) [Shuffle]...cast (Output: save_infer_model/scale_2.tmp_0_subgraph_4394)]}.)
E0311 10:33:30.741441 382 helper.h:131] 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
Traceback (most recent call last):
File "/paddle/03test/test/test.py", line 39, in
pipeline = create_pipeline(
File "/root/PaddleX/paddlex/inference/pipelines/init.py", line 155, in create_pipeline
pipeline = BasePipeline.get(pipeline_name)(
File "/root/PaddleX/paddlex/inference/pipelines/instance_segmentation/pipeline.py", line 49, in init
self.instance_segmentation_model = self.create_model(
File "/root/PaddleX/paddlex/inference/pipelines/base.py", line 86, in create_model
model = create_predictor(
File "/root/PaddleX/paddlex/inference/models/init.py", line 102, in create_predictor
return _create_hp_predictor(
File "/root/PaddleX/paddlex/inference/models/init.py", line 66, in _create_hp_predictor
predictor = HPPredictor.get(model_name)(
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/instance_segmentation.py", line 46, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/base.py", line 67, in init
self._ui_model = self.build_ui_model()
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/base.py", line 102, in build_ui_model
return self._build_ui_model(option)
File "/usr/local/lib/python3.10/dist-packages/paddlex_hpi/models/instance_segmentation.py", line 61, in _build_ui_model
model = ui.vision.detection.PaddleDetectionModel(
File "/usr/local/lib/python3.10/dist-packages/ultra_infer/vision/detection/ppdet/init.py", line 900, in init
self._model = C.vision.detection.PaddleDetectionModel(
RuntimeError:


C++ Traceback (most recent call last):

0 paddle_infer::CreatePredictor(paddle::AnalysisConfig const&)
1 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&)
2 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
3 paddle::AnalysisPredictor::Init(std::shared_ptrpaddle::framework::Scope const&, std::shared_ptrpaddle::framework::ProgramDesc const&)
4 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptrpaddle::framework::ProgramDesc const&)
5 paddle::AnalysisPredictor::OptimizeInferenceProgram()
6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_deletepaddle::framework::ir::Graph >)
8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
9 paddle::inference::analysis::TensorRtSubgraphPass::ApplyImpl(paddle::framework::ir::Graph*) const
10 paddle::inference::analysis::TensorRtSubgraphPass::CreateTensorRTOp(paddle::framework::ir::Node*, paddle::framework::ir::Graph*, std::vector<std::string, std::allocator<std::string > > const&, std::vector<std::string, std::allocator<std::string > >*, bool) const
11 common::enforce::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

FatalError: Build TensorRT serialized network failed! Please recheck you configurations related to paddle-TensorRT.
[Hint: ihost_memory_ should not be null.] (at /paddle/Paddle/paddle/fluid/inference/tensorrt/engine.cc:434)

@zhang-prog
Copy link
Collaborator

您使用的是什么实例分割模型?麻烦提供一下模型名称。

@flysssss
Copy link
Author

flysssss commented Mar 11, 2025

@zhang-prog 模型是RT-DETR-L,复现应该很简单,我用官方实例分割通用pipeline instance_segmentation 应该是RT-DETR-S模型,input传入1920*1080测试图片文件夹,开启高性能推理,就会出现段错误。下午的时候我试了,也是必现。关闭高性能推理,所有图片都可以正常推理

@zhang-prog
Copy link
Collaborator

您好,您应该使用的是Mask-RT-DETR-S吧?这个在我们列表里是不支持的,您试试其他类型的实例分割模型。

Image

@flysssss
Copy link
Author

@zhang-prog 你好,开始出错的是mask-RT-DETR-L,因为这个模型是我在私有数据集上微调训练的。然后又试了一下官方通用实例分割类型,也是一样的错误。 目前是mask-RT-DETR系列,高性能推理框架都不支持吗? 我删除出现段错误的图片之后剩下的50张是可以正常推理的,速度也很快,相比普通推理2s一张,提升到了100ms左右一张。

@zhang-prog
Copy link
Collaborator

您是说使用Mask-RT-DETR-L 模型进行高性能推理,除了 1920*1080 的图片出错,其他的图片都推理正常吗?

@flysssss
Copy link
Author

您是说使用Mask-RT-DETR-L 模型进行高性能推理,除了 1920*1080 的图片出错,其他的图片都推理正常吗?

60张1920*1080的测试图片,10张是出现段错误的,删除后剩下的50张是可以正常推理的。这跟我是用3.0.0b1版本训练Mask-RT-DETR-L模型有关吗?

@flysssss
Copy link
Author

训练集也全是19201080分辨率,Mask-RT-DETR-L输入尺寸应该是640640,我在对比开启高性能推理跟关闭高性能推理,同一张图输出json数据时发现,关闭高性能推理输出的box坐标为0的时候,高性能推理的box坐标会出现负值-1.3左右

@flysssss
Copy link
Author

mode: paddle
draw_threshold: 0.5
metric: COCO
use_dynamic_shape: false
Global:
model_name: Mask-RT-DETR-L
arch: DETR
min_subgraph_size: 3
mask: true
Preprocess:

  • interp: 2
    keep_ratio: false
    target_size:
    • 640
    • 640
      type: Resize
  • mean:
    • 0.0
    • 0.0
    • 0.0
      norm_type: none
      std:
    • 1.0
    • 1.0
    • 1.0
      type: NormalizeImage
  • type: Permute
    label_list:
  • sweetpotato
  • shrimp
  • chiffon
    Hpi:
    backend_configs:
    paddle_infer:
    enable_log_info: True
    trt_dynamic_shapes: &id001
    im_shape:
    - - 1
    - 2
    - - 1
    - 2
    - - 8
    - 2
    image:
    - - 1
    - 3
    - 640
    - 640
    - - 1
    - 3
    - 640
    - 640
    - - 8
    - 3
    - 640
    - 640
    scale_factor:
    - - 1
    - 2
    - - 1
    - 2
    - - 8
    - 2
    trt_dynamic_shape_input_data:
    im_shape:
    - - 640
    - 640
    - - 640
    - 640
    - - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    - 640
    scale_factor:
    - - 2
    - 2
    - - 1
    - 1
    - - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    - 0.67
    tensorrt:
    dynamic_shapes: *id001

模型的推理inference.yml文件

@zhang-prog
Copy link
Collaborator

好的,我这边复现定位一下,您现在是在3.0rc分支上使用3.0b1时自训练的Mask-RT-DETR-L模型吗?

@flysssss
Copy link
Author

好的,我这边复现定位一下,您现在是在3.0rc分支上使用3.0b1时自训练的Mask-RT-DETR-L模型吗?

是的,
然后3.0b1分支,采用在线激活的方式执行高性能推理框架也是出现一样的问题,就是issues #3569 的问题

@zhang-prog
Copy link
Collaborator

您好,我使用3.0rc分支,使用2560x1600的图片对Mask-RT-DETR-L模型进行高性能推理测试,结果显示推理正常。

命令如下:

docker run --gpus all --name paddlex_test -it ccr-2vdh3abv-pub.cnc.bj.baidubce
.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash
paddlex --install hpi-gpu
vim paddlex/configs/pipelines/instance_segmentation.yaml # Mask-RT-DETR-S 改为 Mask-RT-DETR-L
paddlex --pipeline instance_segmentation --input https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/application/semantic_segmentation/makassaridn-road_demo.png --device gpu:0 --use_hpip

运行截图如下:

Image

您看是否方便把出现错误的图片发上来我试着推理一下,如果仍然成功,那就很有可能是模型本身的问题,因为3.0b1的模型确实有点老了。这中间的不确定因素有点多。

@flysssss
Copy link
Author

您好,我使用3.0rc分支,使用2560x1600的图片对Mask-RT-DETR-L模型进行高性能推理测试,结果显示推理正常。

命令如下:

docker run --gpus all --name paddlex_test -it ccr-2vdh3abv-pub.cnc.bj.baidubce
.com/paddlex/paddlex:paddlex3.0.0rc0-paddlepaddle3.0.0rc0-gpu-cuda11.8-cudnn8.6-trt8.5 /bin/bash
paddlex --install hpi-gpu
vim paddlex/configs/pipelines/instance_segmentation.yaml # Mask-RT-DETR-S 改为 Mask-RT-DETR-L
paddlex --pipeline instance_segmentation --input https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/application/semantic_segmentation/makassaridn-road_demo.png --device gpu:0 --use_hpip

运行截图如下:

Image 您看是否方便把出现错误的图片发上来我试着推理一下,如果仍然成功,那就很有可能是模型本身的问题,因为3.0b1的模型确实有点老了。这中间的不确定因素有点多。

Image
你好,这张图片Mask-RT-DETR-L官方模型是会出现段错误的

@flysssss
Copy link
Author

Image

@flysssss
Copy link
Author

3.0rc版本重新训练了1epoch, Mask-RT-DETR-H模型,高性能推理时也会出现段错误,关闭后推理正常

@zhang-prog
Copy link
Collaborator

您好,我用这张1920x1080的图片确实复现了错误,正在排查问题中,您可以先用其他系列模型进行训练和推理~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants