PaddleOCR-VL 模型部署备忘录

WARNING

吐槽

PaddleOCR 的文档从 2021 年写得稀巴烂，到 2026 年了，依旧他妈的稀巴烂。各种依赖冲突、版本不兼容、文档和实际代码对不上，浪费了我整整 4 个小时。写这篇文章就是为了让后来者少踩坑，因为指望百度更新文档是不可能的。

架构说明#

PaddleOCR-VL 的部署需要两个独立的服务配合运行：

vLLM 后端服务（端口 8118）：负责 VL 模型的高性能推理
PaddleX Serving 前端服务（端口 8000）：负责文档预处理、版面分析，并调用 vLLM 后端

1
用户请求 → PaddleX Serving (8000) → vLLM Backend (8118)
2
                  ↓
3
             返回 OCR 结果

前置条件#

Linux 服务器（本文以 Ubuntu 为例）
NVIDIA GPU（需要 CUDA 支持）
足够的显存（建议 8GB+）

Step 1：安装 uv 包管理器#

为什么用 uv？因为 pip 在处理 Paddle 这种依赖地狱的时候慢得要死，uv 至少能让你在绝望中少等一会儿。

1
# 下载并安装 uv
2
# ⚠️ 替换：版本号 0.9.27 可根据需要更换为最新版本
3
wget https://speed.oo9.dpdns.org/gh/astral-sh/uv/releases/download/0.9.27/uv-x86_64-unknown-linux-gnu.tar.gz
4
tar -zxf uv-x86_64-unknown-linux-gnu.tar.gz
5
mv uv-x86_64-unknown-linux-gnu/* /usr/local/bin
6

7
# 创建工作目录
8
mkdir -p /paddle

Step 2：部署 vLLM 后端服务（端口 8118）#

必须先启动这个服务，因为 PaddleX Serving 依赖它。

1
cd /paddle
2

3
# 创建 vLLM 专用虚拟环境
4
# ⚠️ 替换：Python 版本可根据需要调整，推荐 3.10-3.12
5
uv venv .venv_vllm --python 3.12
6
source .venv_vllm/bin/activate
7

8
# 先装 pip（是的，uv 环境里还得装 pip，因为后面的脚本依赖它）
9
uv pip install pip
10

11
# 安装 PaddleOCR
12
uv pip install -U "paddleocr[doc-parser]"
13

14
# 安装 Flash Attention（加速推理的关键）
15
# ⚠️ 替换：根据你的 CUDA 和 PyTorch 版本选择对应的 wheel
16
# 这里是 CUDA 12.8 + PyTorch 2.8 + Python 3.12 的版本
17
# 其他版本请去 https://github.com/mjun0812/flash-attention-prebuild-wheels/releases 找
18
uv pip install https://speed.oo9.dpdns.org/gh/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu128torch2.8-cp312-cp312-linux_x86_64.whl
19

20
# 安装 vLLM 依赖
21
paddleocr install_genai_server_deps vllm
22

23
# 安装 transformers（版本很重要，不要乱改）
24
uv pip install transformers==4.57.6
25

26
# 安装 Python 开发头文件（编译某些依赖需要）
27
# ⚠️ 替换：python3.12-dev 根据你的 Python 版本调整
28
apt update && apt install -y python3.12-dev
29

30
# 启动 vLLM 服务
31
# ⚠️ 替换：
32
#   - --model_name → 可选模型：PaddleOCR-VL-0.9B、PaddleOCR-VL-2B 等
33
#   - --port 8118 → vLLM 后端端口，需要和 PaddleOCR-VL.yaml 中的配置一致
34
paddlex_genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --port 8118

WARNING

Flash Attention 的坑

Flash Attention 的预编译 wheel 对 CUDA、PyTorch、Python 版本有严格要求。如果版本不匹配，要么安装失败，要么运行时 segfault。

百度的文档？不存在的。他们只会告诉你 pip install flash-attn，然后让你自己编译 2 小时，最后还可能失败。

建议直接去 mjun0812 的预编译仓库找对应版本。

Step 3：部署 PaddleX Serving 前端服务（端口 8000）#

新开一个终端，部署前端服务。

1
cd /paddle
2

3
# 创建 PaddleX 专用虚拟环境（不要和 vLLM 环境混用！）
4
# ⚠️ 替换：Python 版本可根据需要调整
5
uv venv .venv_paddle --python 3.12
6
source .venv_paddle/bin/activate
7

8
# 安装 PaddlePaddle GPU 版本
9
# ⚠️ 替换：cu126 表示 CUDA 12.6，根据你的 CUDA 版本选择对应的包
10
# 可选：cu118（CUDA 11.8）、cu120（CUDA 12.0）、cu126（CUDA 12.6）
11
uv pip install paddlepaddle-gpu==3.2.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
12

13
# 安装 PaddleOCR
14
uv pip install -U "paddleocr[doc-parser]" pip
15

16
# 安装 PaddleX Serving 组件
17
paddlex --install serving
18

19
# 启动服务
20
# ⚠️ 替换：
21
#   - gpu:0 → 改为你的 GPU 设备号，多卡可用 gpu:0,1
22
#   - --port 8000 → 改为你想要的端口号
23
paddlex --serve --device gpu:0 --pipeline PaddleOCR-VL --port 8000

NOTE

关于 CUDA 版本

查看你的 CUDA 版本：nvcc --version 或 nvidia-smi

百度的文档根本不会告诉你这些，你得自己去 PaddlePaddle 官网找对应的安装命令。而且官网的安装命令还经常和实际的包名对不上，真是谢谢您嘞。

Step 4：配置 PaddleOCR-VL.yaml（可选）#

如果你需要自定义配置（比如调整 batch_size、阈值等），可以覆盖默认的配置文件。

1
# ⚠️ 替换：路径中的 python3.12 根据你的 Python 版本调整
2
# 如果你用的是 uv venv，路径类似：/paddle/.venv_paddle/lib/python3.12/site-packages/paddlex/configs/pipelines/
3
cp /root/PaddleOCR-VL.yaml /paddle/.venv_paddle/lib/python3.12/site-packages/paddlex/configs/pipelines/ -r

配置文件内容如下：

1
# ⚠️ 替换：server_url 中的端口需要和 vLLM 服务端口一致
2

3
pipeline_name: PaddleOCR-VL
4

5
batch_size: 64
6

7
use_queues: True
8

9
use_doc_preprocessor: True
10
use_layout_detection: True
11
use_chart_recognition: False
12
format_block_content: False
13

14
SubModules:
15
  LayoutDetection:
16
    module_name: layout_detection
17
    model_name: PP-DocLayoutV2
18
    model_dir: null
19
    batch_size: 8
20
    threshold:
21
      0: 0.5 # abstract
22
      1: 0.5 # algorithm
23
      2: 0.5 # aside_text
24
      3: 0.5 # chart
25
      4: 0.5 # content
26
      5: 0.4 # formula
27
      6: 0.4 # doc_title
28
      7: 0.5 # figure_title
29
      8: 0.5 # footer
30
      9: 0.5 # footer
31
      10: 0.5 # footnote
32
      11: 0.5 # formula_number
33
      12: 0.5 # header
34
      13: 0.5 # header
35
      14: 0.5 # image
36
      15: 0.4 # formula
37
      16: 0.5 # number
38
      17: 0.4 # paragraph_title
39
      18: 0.5 # reference
40
      19: 0.5 # reference_content
41
      20: 0.45 # seal
42
      21: 0.5 # table
43
      22: 0.4 # text
44
      23: 0.4 # text
45
      24: 0.5 # vision_footnote
46
    layout_nms: True
47
    layout_unclip_ratio: [1.0, 1.0]
48
    layout_merge_bboxes_mode:
49
      0: "union" # abstract
50
      1: "union" # algorithm
51
      2: "union" # aside_text
52
      3: "large" # chart
53
      4: "union" # content
54
      5: "large" # display_formula
55
      6: "large" # doc_title
56
      7: "union" # figure_title
57
      8: "union" # footer
58
      9: "union" # footer
59
      10: "union" # footnote
60
      11: "union" # formula_number
61
      12: "union" # header
62
      13: "union" # header
63
      14: "union" # image
64
      15: "large" # inline_formula
65
      16: "union" # number
66
      17: "large" # paragraph_title
67
      18: "union" # reference
68
      19: "union" # reference_content
69
      20: "union" # seal
70
      21: "union" # table
71
      22: "union" # text
72
      23: "union" # text
73
      24: "union" # vision_footnote
74
  VLRecognition:
75
    module_name: vl_recognition
76
    model_name: PaddleOCR-VL-0.9B
77
    model_dir: null
78
    batch_size: 2048
79
    genai_config:
80
      backend: vllm-server
81
      # ⚠️ 替换：这里的端口必须和 Step 2 中 vLLM 服务的端口一致
82
      server_url: http://127.0.0.1:8118/v1
83

84
SubPipelines:
85
  DocPreprocessor:
86
    pipeline_name: doc_preprocessor
87
    batch_size: 8
88
    use_doc_orientation_classify: True
89
    use_doc_unwarping: True
90
    SubModules:
91
      DocOrientationClassify:
92
        module_name: doc_text_orientation
93
        model_name: PP-LCNet_x1_0_doc_ori
94
        model_dir: null
95
        batch_size: 8
96
      DocUnwarping:
97
        module_name: image_unwarping
98
        model_name: UVDoc
99
        model_dir: null
100

101
Serving:
102
  extra:
103
    max_num_input_imgs: null

Step 5：测试服务#

两个服务都启动后，可以用以下 Python 代码测试：

1
import requests
2
import base64
3
from concurrent.futures import ThreadPoolExecutor, as_completed
4

5
# ⚠️ 替换：改为你的 PaddleX Serving 地址和端口
6
API_URL = "http://localhost:8000/layout-parsing"
7

8
# ⚠️ 替换：改为你的 PDF 文件路径
9
with open('xxxx.pdf', 'rb') as f:
10
    data = f.read()
11

12
content = base64.b64encode(data).decode("ascii")
13

14
def test_ocr(file_data, file_type=0):
15
    headers = {
16
        "Content-Type": "application/json"
17
    }
18
    payload = {
19
        "file": file_data,
20
        "fileType": file_type,
21
        "useDocOrientationClassify": True,
22
        "useLayoutDetection": True,
23
        "useDocUnwarping": True,
24
        "useChartRecognition": True,
25
        "repetitionPenalty": 1,
26
        "temperature": 0.1,
27
        "topP": 1,
28
        "minPixels": 147384,
29
        "maxPixels": 2822400,
30
        "layoutNms": True,
31
        "visualize": False
32
    }
33
    response = requests.post(API_URL, json=payload, headers=headers)
34
    return response.json()
35

36
# 并发测试
37
with ThreadPoolExecutor(max_workers=4) as executor:
38
    futures = [executor.submit(test_ocr, content, 0) for _ in range(10)]
39
    for future in as_completed(futures):
40
        try:
41
            result = future.result()
42
            print("OCR 处理完成")
43
        except Exception as e:
44
            print(f"OCR 处理失败: {e}")

需要根据实际情况替换的部分汇总#

参数	说明	示例值
uv 版本	uv 包管理器版本号	`0.9.27`
Python 版本	虚拟环境 Python 版本	`3.12`
CUDA 版本 (PaddlePaddle)	PaddlePaddle 对应的 CUDA 版本	`cu126`（CUDA 12.6）
Flash Attention wheel	预编译的 FA 包 URL	根据 CUDA/PyTorch/Python 版本选择
python-dev 包名	Python 开发头文件包	`python3.12-dev`
vLLM 端口	vLLM 后端服务端口	`8118`
PaddleX Serving 端口	前端服务端口（对外暴露）	`8000`
GPU 设备号	使用的 GPU 编号	`gpu:0` 或 `gpu:0,1`
模型名称	PaddleOCR-VL 模型版本	`PaddleOCR-VL-0.9B`
YAML 配置路径	PaddleOCR-VL.yaml 的位置	根据 Python 环境路径调整
server_url	YAML 中 vLLM 服务地址	`http://127.0.0.1:8118/v1`

最后#

如果你按照这篇文章还是跑不起来，那大概率是版本又变了。去骂百度吧，反正他们也不会改文档的。

祝你好运。