本文主要信息
本文介绍如何使用Docker部署Milvus(v2.5.19)向量数据库，并使用Nginx Proxy Manager反代gRPC协议，同时使用Cloudflare CDN功能。

为什么要这么麻烦？主要原因是我的甲骨文新加坡西服务器太闲了，想物尽其用。然后最近开始使用Claude Code进行Vibe Coding，发现一个项目叫claude-context。一直苦于这类AI Agent不能很好的获取上下文，这个项目应该是用了类似语义检索（虽然不是LSP）和向量检索的方式来实现的。然后这个项目使用的是Milvus作为向量数据库，所以就想把Milvus部署起来试试。

本文仅仅涉及Milvus的Standalone模式，集群部署过于复杂，目前暂时也用不上。以后用上了再更新部署教程吧…

组件说明#

Nginx Proxy Manager：反向代理 Milvus gRPC端口。
Milvus：生产级向量数据库，支持高效的向量检索。

部署流程#

部署Milvus#

基本配置#

创建对应文件夹和文件

1
mkdir -p ~/docker_data/milvus && cd ~/docker_data/milvus

下载Milvus的milvus.yaml文件

1
wget https://raw.githubusercontent.com/milvus-io/milvus/v2.5.19/configs/milvus.yaml -O milvus.yaml

修改milvus.yaml文件，配置一些基础信息。nano milvus.yaml(使用ctrl+w进行搜索)


91 collapsed lines
1
# Licensed to the LF AI & Data foundation under one
2
# or more contributor license agreements. See the NOTICE file
3
# distributed with this work for additional information
4
# regarding copyright ownership. The ASF licenses this file
5
# to you under the Apache License, Version 2.0 (the
6
# "License"); you may not use this file except in compliance
7
# with the License. You may obtain a copy of the License at
8
#
9
#     http://www.apache.org/licenses/LICENSE-2.0
10
#
11
# Unless required by applicable law or agreed to in writing, software
12
# distributed under the License is distributed on an "AS IS" BASIS,
13
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
# See the License for the specific language governing permissions and
15
# limitations under the License.
16

17
# Related configuration of etcd, used to store Milvus metadata & service discovery.
18
etcd:
19
  # Endpoints used to access etcd service. You can change this parameter as the endpoints of your own etcd cluster.
20
  # Environment variable: ETCD_ENDPOINTS
21
  # etcd preferentially acquires valid address from environment variable ETCD_ENDPOINTS when Milvus is started.
22
  endpoints: localhost:2379
23
  # Root prefix of the key to where Milvus stores data in etcd.
24
  # It is recommended to change this parameter before starting Milvus for the first time.
25
  # To share an etcd instance among multiple Milvus instances, consider changing this to a different value for each Milvus instance before you start them.
26
  # Set an easy-to-identify root path for Milvus if etcd service already exists.
27
  # Changing this for an already running Milvus instance may result in failures to read legacy data.
28
  rootPath: by-dev
29
  # Sub-prefix of the key to where Milvus stores metadata-related information in etcd.
30
  # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
31
  # It is recommended to change this parameter before starting Milvus for the first time.
32
  metaSubPath: meta
33
  # Sub-prefix of the key to where Milvus stores timestamps in etcd.
34
  # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
35
  # It is recommended not to change this parameter if there is no specific reason.
36
  kvSubPath: kv
37
  log:
38
    level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
39
    # path is one of:
40
    #  - "default" as os.Stderr,
41
    #  - "stderr" as os.Stderr,
42
    #  - "stdout" as os.Stdout,
43
    #  - file path to append server logs to.
44
    # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
45
    path: stdout
46
  ssl:
47
    enabled: false # Whether to support ETCD secure connection mode
48
    tlsCert: /path/to/etcd-client.pem # path to your cert file
49
    tlsKey: /path/to/etcd-client-key.pem # path to your key file
50
    tlsCACert: /path/to/ca.pem # path to your CACert file
51
    # TLS min version
52
    # Optional values: 1.0, 1.1, 1.2, 1.3。
53
    # We recommend using version 1.2 and above.
54
    tlsMinVersion: 1.3
55
  requestTimeout: 10000 # Etcd operation timeout in milliseconds
56
  use:
57
    embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
58
  data:
59
    dir: default.etcd # Embedded Etcd only. please adjust in embedded Milvus: /tmp/milvus/etcdData/
60
  auth:
61
    enabled: false # Whether to enable authentication
62
    userName:  # username for etcd authentication
63
    password:  # password for etcd authentication
64

65
metastore:
66
  type: etcd # Default value: etcd, Valid values: [etcd, tikv]
67
  snapshot:
68
    ttl: 86400 # snapshot ttl in seconds
69
    reserveTime: 3600 # snapshot reserve time in seconds
70

71
# Related configuration of tikv, used to store Milvus metadata.
72
# Notice that when TiKV is enabled for metastore, you still need to have etcd for service discovery.
73
# TiKV is a good option when the metadata size requires better horizontal scalability.
74
tikv:
75
  endpoints: 127.0.0.1:2389 # Note that the default pd port of tikv is 2379, which conflicts with etcd.
76
  rootPath: by-dev # The root path where data is stored in tikv
77
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
78
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
79
  requestTimeout: 10000 # ms, tikv request timeout
80
  snapshotScanSize: 256 # batch size of tikv snapshot scan
81
  ssl:
82
    enabled: false # Whether to support TiKV secure connection mode
83
    tlsCert:  # path to your cert file
84
    tlsKey:  # path to your key file
85
    tlsCACert:  # path to your CACert file
86

87
localStorage:
88
  # Local path to where vector data are stored during a search or a query to avoid repetitve access to MinIO or S3 service.
89
  # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
90
  # It is recommended to change this parameter before starting Milvus for the first time.
91
  path: /var/lib/milvus/data/
92

93
# Related configuration of MinIO/S3/GCS or any other service supports S3 API, which is responsible for data persistence for Milvus.
94
# We refer to the storage service as MinIO/S3 in the following description for simplicity.
95
minio:
96
  # IP address of MinIO or S3 service.
97
  # Environment variable: MINIO_ADDRESS
98
  # minio.address and minio.port together generate the valid access to MinIO or S3 service.
99
  # MinIO preferentially acquires the valid IP address from the environment variable MINIO_ADDRESS when Milvus is started.
100
  # Default value applies when MinIO or S3 is running on the same network with Milvus.
101
  address: 172.17.0.1
102
  port: 65010 # Port of MinIO or S3 service.
103
  # Access key ID that MinIO or S3 issues to user for authorized access.
104
  # Environment variable: MINIO_ACCESS_KEY_ID or minio.accessKeyID
105
  # minio.accessKeyID and minio.secretAccessKey together are used for identity authentication to access the MinIO or S3 service.
106
  # This configuration must be set identical to the environment variable MINIO_ACCESS_KEY_ID, which is necessary for starting MinIO or S3.
107
  # The default value applies to MinIO or S3 service that started with the default docker-compose.yml file.
108
  accessKeyID: xxxxx
109
  # Secret key used to encrypt the signature string and verify the signature string on server. It must be kept strictly confidential and accessible only to the MinIO or S3 server and users.
110
  # Environment variable: MINIO_SECRET_ACCESS_KEY or minio.secretAccessKey
111
  # minio.accessKeyID and minio.secretAccessKey together are used for identity authentication to access the MinIO or S3 service.
112
  # This configuration must be set identical to the environment variable MINIO_SECRET_ACCESS_KEY, which is necessary for starting MinIO or S3.
113
  # The default value applies to MinIO or S3 service that started with the default docker-compose.yml file.
114
  secretAccessKey: xxxxxx
115
  useSSL: false # Switch value to control if to access the MinIO or S3 service through SSL.
116
  ssl:
117
    tlsCACert: /path/to/public.crt # path to your CACert file
118
  # Name of the bucket where Milvus stores data in MinIO or S3.
119
  # Milvus 2.0.0 does not support storing data in multiple buckets.
120
  # Bucket with this name will be created if it does not exist. If the bucket already exists and is accessible, it will be used directly. Otherwise, there will be an error.
121
  # To share an MinIO instance among multiple Milvus instances, consider changing this to a different value for each Milvus instance before you start them. For details, see Operation FAQs.
122
  # The data will be stored in the local Docker if Docker is used to start the MinIO service locally. Ensure that there is sufficient storage space.
123
  # A bucket name is globally unique in one MinIO or S3 instance.
124
  bucketName: milvus
125
  # Root prefix of the key to where Milvus stores data in MinIO or S3.
126
  # It is recommended to change this parameter before starting Milvus for the first time.
127
  # To share an MinIO instance among multiple Milvus instances, consider changing this to a different value for each Milvus instance before you start them. For details, see Operation FAQs.
128
  # Set an easy-to-identify root key prefix for Milvus if etcd service already exists.
129
  # Changing this for an already running Milvus instance may result in failures to read legacy data.
130
  rootPath: files
131
  # Whether to useIAM role to access S3/GCS instead of access/secret keys
132
  # For more information, refer to
133
  # aws: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
134
  # gcp: https://cloud.google.com/storage/docs/access-control/iam
135
  # aliyun (ack): https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/use-rrsa-to-enforce-access-control
136
  # aliyun (ecs): https://www.alibabacloud.com/help/en/elastic-compute-service/latest/attach-an-instance-ram-role
137
  useIAM: false
138
  # Cloud Provider of S3. Supports: "aws", "gcp", "aliyun".
139
  # Cloud Provider of Google Cloud Storage. Supports: "gcpnative".
140
  # You can use "aws" for other cloud provider supports S3 API with signature v4, e.g.: minio
141
  # You can use "gcp" for other cloud provider supports S3 API with signature v2
142
  # You can use "aliyun" for other cloud provider uses virtual host style bucket
143
  # You can use "gcpnative" for the Google Cloud Platform provider. Uses service account credentials
144
  # for authentication.
145
  # When useIAM enabled, only "aws", "gcp", "aliyun" is supported for now
146
  cloudProvider: aws
147
  # The JSON content contains the gcs service account credentials.
148
  # Used only for the "gcpnative" cloud provider.
149
  gcpCredentialJSON:
150
  # Custom endpoint for fetch IAM role credentials. when useIAM is true & cloudProvider is "aws".
151
  # Leave it empty if you want to use AWS default endpoint
152
  iamEndpoint:
153
  logLevel: fatal # Log level for aws sdk log. Supported level:  off, fatal, error, warn, info, debug, trace
154
  region:  # Specify minio storage system location region
155
  useVirtualHost: false # Whether use virtual host mode for bucket
156
  requestTimeoutMs: 10000 # minio timeout for request time in milliseconds
157
  # The maximum number of objects requested per batch in minio ListObjects rpc,
158
  # 0 means using oss client by default, decrease these configration if ListObjects timeout
159
  listObjectsMaxKeys: 0
731 collapsed lines
160

161
# Milvus supports four MQ: rocksmq(based on RockDB), natsmq(embedded nats-server), Pulsar and Kafka.
162
# You can change your mq by setting mq.type field.
163
# If you don't set mq.type field as default, there is a note about enabling priority if we config multiple mq in this file.
164
# 1. standalone(local) mode: rocksmq(default) > natsmq > Pulsar > Kafka
165
# 2. cluster mode:  Pulsar(default) > Kafka (rocksmq and natsmq is unsupported in cluster mode)
166
mq:
167
  # Default value: "default"
168
  # Valid values: [default, pulsar, kafka, rocksmq, natsmq]
169
  type: default
170
  enablePursuitMode: true # Default value: "true"
171
  pursuitLag: 10 # time tick lag threshold to enter pursuit mode, in seconds
172
  pursuitBufferSize: 8388608 # pursuit mode buffer size in bytes
173
  pursuitBufferTime: 60 # pursuit mode buffer time in seconds
174
  mqBufSize: 16 # MQ client consumer buffer length
175
  dispatcher:
176
    mergeCheckInterval: 0.1 # the interval time(in seconds) for dispatcher to check whether to merge
177
    targetBufSize: 16 # the lenth of channel buffer for targe
178
    maxTolerantLag: 3 # Default value: "3", the timeout(in seconds) that target sends msgPack
179

180
# Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
181
pulsar:
182
  # IP address of Pulsar service.
183
  # Environment variable: PULSAR_ADDRESS
184
  # pulsar.address and pulsar.port together generate the valid access to Pulsar.
185
  # Pulsar preferentially acquires the valid IP address from the environment variable PULSAR_ADDRESS when Milvus is started.
186
  # Default value applies when Pulsar is running on the same network with Milvus.
187
  address: localhost
188
  port: 6650 # Port of Pulsar service.
189
  webport: 80 # Web port of of Pulsar service. If you connect direcly without proxy, should use 8080.
190
  # The maximum size of each message in Pulsar. Unit: Byte.
191
  # By default, Pulsar can transmit at most 2MB of data in a single message. When the size of inserted data is greater than this value, proxy fragments the data into multiple messages to ensure that they can be transmitted correctly.
192
  # If the corresponding parameter in Pulsar remains unchanged, increasing this configuration will cause Milvus to fail, and reducing it produces no advantage.
193
  maxMessageSize: 2097152
194
  # Pulsar can be provisioned for specific tenants with appropriate capacity allocated to the tenant.
195
  # To share a Pulsar instance among multiple Milvus instances, you can change this to an Pulsar tenant rather than the default one for each Milvus instance before you start them. However, if you do not want Pulsar multi-tenancy, you are advised to change msgChannel.chanNamePrefix.cluster to the different value.
196
  tenant: public
197
  namespace: default # A Pulsar namespace is the administrative unit nomenclature within a tenant.
198
  requestTimeout: 60 # pulsar client global request timeout in seconds
199
  enableClientMetrics: false # Whether to register pulsar client metrics into milvus metrics path.
200

201
# If you want to enable kafka, needs to comment the pulsar configs
202
# kafka:
203
#   brokerList: localhost:9092
204
#   saslUsername:
205
#   saslPassword:
206
#   saslMechanisms:
207
#   securityProtocol:
208
#   ssl:
209
#     enabled: false # whether to enable ssl mode
210
#     tlsCert:  # path to client's public key (PEM) used for authentication
211
#     tlsKey:  # path to client's private key (PEM) used for authentication
212
#     tlsCaCert:  # file or directory path to CA certificate(s) for verifying the broker's key
213
#     tlsKeyPassword:  # private key passphrase for use with ssl.key.location and set_ssl_cert(), if any
214
#   readTimeout: 10
215

216
rocksmq:
217
  # Prefix of the key to where Milvus stores data in RocksMQ.
218
  # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
219
  # It is recommended to change this parameter before starting Milvus for the first time.
220
  # Set an easy-to-identify root key prefix for Milvus if etcd service already exists.
221
  path: /var/lib/milvus/rdb_data
222
  lrucacheratio: 0.06 # rocksdb cache memory ratio
223
  rocksmqPageSize: 67108864 # The maximum size of messages in each page in RocksMQ. Messages in RocksMQ are checked and cleared (when expired) in batch based on this parameters. Unit: Byte.
224
  retentionTimeInMinutes: 4320 # The maximum retention time of acked messages in RocksMQ. Acked messages in RocksMQ are retained for the specified period of time and then cleared. Unit: Minute.
225
  retentionSizeInMB: 8192 # The maximum retention size of acked messages of each topic in RocksMQ. Acked messages in each topic are cleared if their size exceed this parameter. Unit: MB.
226
  compactionInterval: 86400 # Time interval to trigger rocksdb compaction to remove deleted data. Unit: Second
227
  compressionTypes: 0,0,7,7,7 # compaction compression type, only support use 0,7. 0 means not compress, 7 will use zstd. Length of types means num of rocksdb level.
228

229
# natsmq configuration.
230
# more detail: https://docs.nats.io/running-a-nats-service/configuration
231
natsmq:
232
  server:
233
    port: 4222 # Listening port of the NATS server.
234
    storeDir: /var/lib/milvus/nats # Directory to use for JetStream storage of nats
235
    maxFileStore: 17179869184 # Maximum size of the 'file' storage
236
    maxPayload: 8388608 # Maximum number of bytes in a message payload
237
    maxPending: 67108864 # Maximum number of bytes buffered for a connection Applies to client connections
238
    initializeTimeout: 4000 # waiting for initialization of natsmq finished
239
    monitor:
240
      trace: false # If true enable protocol trace log messages
241
      debug: false # If true enable debug log messages
242
      logTime: true # If set to false, log without timestamps.
243
      logFile: /tmp/milvus/logs/nats.log # Log file path relative to .. of milvus binary if use relative path
244
      logSizeLimit: 536870912 # Size in bytes after the log file rolls over to a new one
245
    retention:
246
      maxAge: 4320 # Maximum age of any message in the P-channel
247
      maxBytes:  # How many bytes the single P-channel may contain. Removing oldest messages if the P-channel exceeds this size
248
      maxMsgs:  # How many message the single P-channel may contain. Removing oldest messages if the P-channel exceeds this limit
249

250
# Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
251
rootCoord:
252
  dmlChannelNum: 16 # The number of DML-Channels to create at the root coord startup.
253
  # The maximum number of partitions in each collection.
254
  # New partitions cannot be created if this parameter is set as 0 or 1.
255
  # Range: [0, INT64MAX]
256
  maxPartitionNum: 1024
257
  # The minimum row count of a segment required for creating index.
258
  # Segments with smaller size than this parameter will not be indexed, and will be searched with brute force.
259
  minSegmentSizeToEnableIndex: 1024
260
  enableActiveStandby: false
261
  maxDatabaseNum: 64 # Maximum number of database
262
  maxGeneralCapacity: 65536 # upper limit for the sum of of product of partitionNumber and shardNumber
263
  gracefulStopTimeout: 5 # seconds. force stop node without graceful stop
264
  ip:  # TCP/IP address of rootCoord. If not specified, use the first unicastable address
265
  port: 53100 # TCP port of rootCoord
266
  grpc:
267
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the rootCoord can send, unit: byte
268
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the rootCoord can receive, unit: byte
269
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on rootCoord can send, unit: byte
270
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on rootCoord can receive, unit: byte
271

272
# Related configuration of proxy, used to validate client requests and reduce the returned results.
273
proxy:
274
  timeTickInterval: 200 # The interval at which proxy synchronizes the time tick, unit: ms.
275
  healthCheckTimeout: 3000 # ms, the interval that to do component healthy check
276
  msgStream:
277
    timeTick:
278
      bufSize: 512 # The maximum number of messages can be buffered in the timeTick message stream of the proxy when producing messages.
279
  maxNameLength: 255 # The maximum length of the name or alias that can be created in Milvus, including the collection name, collection alias, partition name, and field name.
280
  maxFieldNum: 64 # The maximum number of field can be created when creating in a collection. It is strongly DISCOURAGED to set maxFieldNum >= 64.
281
  maxVectorFieldNum: 4 # The maximum number of vector fields that can be specified in a collection. Value range: [1, 10].
282
  maxShardNum: 16 # The maximum number of shards can be created when creating in a collection.
283
  maxDimension: 32768 # The maximum number of dimensions of a vector can have when creating in a collection.
284
  # Whether to produce gin logs.\n
285
  # please adjust in embedded Milvus: false
286
  ginLogging: true
287
  ginLogSkipPaths: / # skip url path for gin log
288
  maxTaskNum: 1024 # The maximum number of tasks in the task queue of the proxy.
289
  ddlConcurrency: 16 # The concurrent execution number of DDL at proxy.
290
  dclConcurrency: 16 # The concurrent execution number of DCL at proxy.
291
  mustUsePartitionKey: false # switch for whether proxy must use partition key for the collection
292
  # maximum number of result entries, typically Nq * TopK * GroupSize.
293
  # It costs additional memory and time to process a large number of result entries.
294
  # If the number of result entries exceeds this limit, the search will be rejected.
295
  # Disabled if the value is less or equal to 0.
296
  maxResultEntries: -1
297
  accessLog:
298
    enable: false # Whether to enable the access log feature.
299
    minioEnable: false # Whether to upload local access log files to MinIO. This parameter can be specified when proxy.accessLog.filename is not empty.
300
    localPath: /tmp/milvus_access # The local folder path where the access log file is stored. This parameter can be specified when proxy.accessLog.filename is not empty.
301
    filename:  # The name of the access log file. If you leave this parameter empty, access logs will be printed to stdout.
302
    maxSize: 64 # The maximum size allowed for a single access log file. If the log file size reaches this limit, a rotation process will be triggered. This process seals the current access log file, creates a new log file, and clears the contents of the original log file. Unit: MB.
303
    rotatedTime: 0 # The maximum time interval allowed for rotating a single access log file. Upon reaching the specified time interval, a rotation process is triggered, resulting in the creation of a new access log file and sealing of the previous one. Unit: seconds
304
    remotePath: access_log/ # The path of the object storage for uploading access log files.
305
    remoteMaxTime: 0 # The time interval allowed for uploading access log files. If the upload time of a log file exceeds this interval, the file will be deleted. Setting the value to 0 disables this feature.
306
    formatters:
307
      base:
308
        format: "[$time_now] [ACCESS] <$user_name: $user_addr> $method_name [status: $method_status] [code: $error_code] [sdk: $sdk_version] [msg: $error_msg] [traceID: $trace_id] [timeCost: $time_cost]"
309
      query:
310
        format: "[$time_now] [ACCESS] <$user_name: $user_addr> $method_name [status: $method_status] [code: $error_code] [sdk: $sdk_version] [msg: $error_msg] [traceID: $trace_id] [timeCost: $time_cost] [database: $database_name] [collection: $collection_name] [partitions: $partition_name] [expr: $method_expr] [params: $query_params]"
311
        methods: "Query, Delete"
312
      search:
313
        format: "[$time_now] [ACCESS] <$user_name: $user_addr> $method_name [status: $method_status] [code: $error_code] [sdk: $sdk_version] [msg: $error_msg] [traceID: $trace_id] [timeCost: $time_cost] [database: $database_name] [collection: $collection_name] [partitions: $partition_name] [expr: $method_expr] [nq: $nq] [params: $search_params]"
314
        methods: "HybridSearch, Search"
315
    cacheSize: 0 # Size of log of write cache, in byte. (Close write cache if size was 0)
316
    cacheFlushInterval: 3 # time interval of auto flush write cache, in seconds. (Close auto flush if interval was 0)
317
  connectionCheckIntervalSeconds: 120 # the interval time(in seconds) for connection manager to scan inactive client info
318
  connectionClientInfoTTLSeconds: 86400 # inactive client info TTL duration, in seconds
319
  maxConnectionNum: 10000 # the max client info numbers that proxy should manage, avoid too many client infos
320
  gracefulStopTimeout: 30 # seconds. force stop node without graceful stop
321
  slowQuerySpanInSeconds: 5 # query whose executed time exceeds the `slowQuerySpanInSeconds` can be considered slow, in seconds.
322
  queryNodePooling:
323
    size: 10 # the size for shardleader(querynode) client pool
324
  http:
325
    enabled: true # Whether to enable the http server
326
    debug_mode: false # Whether to enable http server debug mode
327
    port:  # high-level restful api
328
    acceptTypeAllowInt64: true # high-level restful api, whether http client can deal with int64
329
    enablePprof: true # Whether to enable pprof middleware on the metrics port
330
    enableWebUI: true # Whether to enable setting the WebUI middleware on the metrics port
331
  ip:  # TCP/IP address of proxy. If not specified, use the first unicastable address
332
  port: 19530 # TCP port of proxy
333
  internalPort: 19529
334
  grpc:
335
    serverMaxSendSize: 268435456 # The maximum size of each RPC request that the proxy can send, unit: byte
336
    serverMaxRecvSize: 67108864 # The maximum size of each RPC request that the proxy can receive, unit: byte
337
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on proxy can send, unit: byte
338
    clientMaxRecvSize: 67108864 # The maximum size of each RPC request that the clients on proxy can receive, unit: byte
339

340
# Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
341
queryCoord:
342
  taskMergeCap: 1
343
  taskExecutionCap: 256
344
  # Switch value to control if to automatically replace a growing segment with the corresponding indexed sealed segment when the growing segment reaches the sealing threshold.
345
  # If this parameter is set false, Milvus simply searches the growing segments with brute force.
346
  autoHandoff: true
347
  autoBalance: true # Switch value to control if to automatically balance the memory usage among query nodes by distributing segment loading and releasing operations evenly.
348
  autoBalanceChannel: true # Enable auto balance channel
349
  balancer: ScoreBasedBalancer # auto balancer used for segments on queryNodes
350
  globalRowCountFactor: 0.1 # the weight used when balancing segments among queryNodes
351
  scoreUnbalanceTolerationFactor: 0.05 # the least value for unbalanced extent between from and to nodes when doing balance
352
  reverseUnBalanceTolerationFactor: 1.3 # the largest value for unbalanced extent between from and to nodes after doing balance
353
  overloadedMemoryThresholdPercentage: 90 # The threshold of memory usage (in percentage) in a query node to trigger the sealed segment balancing.
354
  balanceIntervalSeconds: 60 # The interval at which query coord balances the memory usage among query nodes.
355
  memoryUsageMaxDifferencePercentage: 30 # The threshold of memory usage difference (in percentage) between any two query nodes to trigger the sealed segment balancing.
356
  rowCountFactor: 0.4 # the row count weight used when balancing segments among queryNodes
357
  segmentCountFactor: 0.4 # the segment count weight used when balancing segments among queryNodes
358
  globalSegmentCountFactor: 0.1 # the segment count weight used when balancing segments among queryNodes
359
  # the channel count weight used when balancing channels among queryNodes,
360
  #             A higher value reduces the likelihood of assigning channels from the same collection to the same QueryNode. Set to 1 to disable this feature.
361
  collectionChannelCountFactor: 10
362
  segmentCountMaxSteps: 50 # segment count based plan generator max steps
363
  rowCountMaxSteps: 50 # segment count based plan generator max steps
364
  randomMaxSteps: 10 # segment count based plan generator max steps
365
  growingRowCountWeight: 4 # the memory weight of growing segment row count
366
  delegatorMemoryOverloadFactor: 0.1 # the factor of delegator overloaded memory
367
  balanceCostThreshold: 0.001 # the threshold of balance cost, if the difference of cluster's cost after executing the balance plan is less than this value, the plan will not be executed
368
  checkSegmentInterval: 1000
369
  checkChannelInterval: 1000
370
  checkBalanceInterval: 300
371
  autoBalanceInterval: 3000 # the interval for triggerauto balance
372
  checkIndexInterval: 10000
373
  channelTaskTimeout: 60000 # 1 minute
374
  segmentTaskTimeout: 120000 # 2 minute
375
  distPullInterval: 500
376
  heartbeatAvailableInterval: 10000 # 10s, Only QueryNodes which fetched heartbeats within the duration are available
377
  loadTimeoutSeconds: 600
378
  distRequestTimeout: 5000 # the request timeout for querycoord fetching data distribution from querynodes, in milliseconds
379
  heatbeatWarningLag: 5000 # the lag value for querycoord report warning when last heatbeat is too old, in milliseconds
380
  checkHandoffInterval: 5000
381
  enableActiveStandby: false
382
  checkInterval: 1000
383
  checkHealthInterval: 3000 # 3s, the interval when query coord try to check health of query node
384
  checkHealthRPCTimeout: 2000 # 100ms, the timeout of check health rpc to query node
385
  brokerTimeout: 5000 # 5000ms, querycoord broker rpc timeout
386
  collectionRecoverTimes: 3 # if collection recover times reach the limit during loading state, release it
387
  observerTaskParallel: 16 # the parallel observer dispatcher task number
388
  checkAutoBalanceConfigInterval: 10 # the interval of check auto balance config
389
  checkNodeSessionInterval: 60 # the interval(in seconds) of check querynode cluster session
390
  gracefulStopTimeout: 5 # seconds. force stop node without graceful stop
391
  enableStoppingBalance: true # whether enable stopping balance
392
  channelExclusiveNodeFactor: 4 # the least node number for enable channel's exclusive mode
393
  collectionObserverInterval: 200 # the interval of collection observer
394
  checkExecutedFlagInterval: 100 # the interval of check executed flag to force to pull dist
395
  updateCollectionLoadStatusInterval: 5 # 5m, max interval of updating collection loaded status for check health
396
  cleanExcludeSegmentInterval: 60 # the time duration of clean pipeline exclude segment which used for filter invalid data, in seconds
397
  ip:  # TCP/IP address of queryCoord. If not specified, use the first unicastable address
398
  port: 19531 # TCP port of queryCoord
399
  grpc:
400
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the queryCoord can send, unit: byte
401
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the queryCoord can receive, unit: byte
402
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on queryCoord can send, unit: byte
403
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on queryCoord can receive, unit: byte
404

405
# Related configuration of queryNode, used to run hybrid search between vector and scalar data.
406
queryNode:
407
  stats:
408
    publishInterval: 1000 # The interval that query node publishes the node statistics information, including segment status, cpu usage, memory usage, health status, etc. Unit: ms.
409
  segcore:
410
    knowhereThreadPoolNumRatio: 4 # The number of threads in knowhere's thread pool. If disk is enabled, the pool size will multiply with knowhereThreadPoolNumRatio([1, 32]).
411
    chunkRows: 128 # Row count by which Segcore divides a segment into chunks.
412
    interimIndex:
413
      # Whether to create a temporary index for growing segments and sealed segments not yet indexed, improving search performance.
414
      # Milvus will eventually seals and indexes all segments, but enabling this optimizes search performance for immediate queries following data insertion.
415
      # This defaults to true, indicating that Milvus creates temporary index for growing segments and the sealed segments that are not indexed upon searches.
416
      enableIndex: true
417
      nlist: 128 # interim index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
418
      nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist
419
      subDim: 4 # interim index sub dim, recommend to (subDim % vector dim == 0)
420
      refineRatio: 4.5 # interim index parameters, should set to be >= 1.0
421
      indexBuildRatio: 0.1 # the ratio of building interim index rows count with max row count of a flush segment, should set to be < 1.0
422
      refineQuantType: NONE # Data representation of SCANN_DVR index, options: 'NONE', 'FLOAT16', 'BFLOAT16' and 'UINT8'
423
      refineWithQuant: true # whether to use refineQuantType to refine for faster but loss a little precision
424
      denseVectorIndexType: IVF_FLAT_CC # Dense vector intermin index type
425
      memExpansionRate: 1.15 # extra memory needed by building interim index
426
      buildParallelRate: 0.5 # the ratio of building interim index parallel matched with cpu num
427
    multipleChunkedEnable: true # Enable multiple chunked search
428
    deleteDumpBatchSize: 10000 # Batch size for delete snapshot dump in segcore.
429
    knowhereScoreConsistency: false # Enable knowhere strong consistency score computation logic
430
    jsonKeyStatsCommitInterval: 200 # the commit interval for the JSON key Stats to commit
431
  loadMemoryUsageFactor: 1 # The multiply factor of calculating the memory usage while loading segments
432
  enableDisk: false # enable querynode load disk index, and search on disk index
433
  maxDiskUsagePercentage: 95
434
  cache:
435
    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024
436
    readAheadPolicy: willneed # The read ahead policy of chunk cache, options: `normal, random, sequential, willneed, dontneed`
437
    # options: async, sync, disable.
438
    # Specifies the necessity for warming up the chunk cache.
439
    # 1. If set to "sync" or "async" the original vector data will be synchronously/asynchronously loaded into the
440
    # chunk cache during the load process. This approach has the potential to substantially reduce query/search latency
441
    # for a specific duration post-load, albeit accompanied by a concurrent increase in disk usage;
442
    # 2. If set to "disable" original vector data will only be loaded into the chunk cache during search/query.
443
    warmup: disable
444
  mmap:
445
    vectorField: false # Enable mmap for loading vector data
446
    vectorIndex: false # Enable mmap for loading vector index
447
    scalarField: false # Enable mmap for loading scalar data
448
    scalarIndex: false # Enable mmap for loading scalar index
449
    chunkCache: true # Enable mmap for chunk cache (raw vector retrieving).
450
    # Enable memory mapping (mmap) to optimize the handling of growing raw data.
451
    # By activating this feature, the memory overhead associated with newly added or modified data will be significantly minimized.
452
    # However, this optimization may come at the cost of a slight decrease in query latency for the affected data segments.
453
    growingMmapEnabled: false
454
    fixedFileSizeForMmapAlloc: 1 # tmp file size for mmap chunk manager
455
    maxDiskUsagePercentageForMmapAlloc: 50 # disk percentage used in mmap chunk manager
456
  lazyload:
457
    enabled: false # Enable lazyload for loading data
458
    waitTimeout: 30000 # max wait timeout duration in milliseconds before start to do lazyload search and retrieve
459
    requestResourceTimeout: 5000 # max timeout in milliseconds for waiting request resource for lazy load, 5s by default
460
    requestResourceRetryInterval: 2000 # retry interval in milliseconds for waiting request resource for lazy load, 2s by default
461
    maxRetryTimes: 1 # max retry times for lazy load, 1 by default
462
    maxEvictPerRetry: 1 # max evict count for lazy load, 1 by default
463
  indexOffsetCacheEnabled: false # enable index offset cache for some scalar indexes, now is just for bitmap index, enable this param can improve performance for retrieving raw data from index
464
  grouping:
465
    enabled: true
466
    maxNQ: 1000
467
    topKMergeRatio: 20
468
  scheduler:
469
    receiveChanSize: 10240
470
    unsolvedQueueSize: 10240
471
    # maxReadConcurrentRatio is the concurrency ratio of read task (search task and query task).
472
    # Max read concurrency would be the value of hardware.GetCPUNum * maxReadConcurrentRatio.
473
    # It defaults to 2.0, which means max read concurrency would be the value of hardware.GetCPUNum * 2.
474
    # Max read concurrency must greater than or equal to 1, and less than or equal to hardware.GetCPUNum * 100.
475
    # (0, 100]
476
    maxReadConcurrentRatio: 1
477
    cpuRatio: 10 # ratio used to estimate read task cpu usage.
478
    maxTimestampLag: 86400
479
    scheduleReadPolicy:
480
      # fifo: A FIFO queue support the schedule.
481
      # user-task-polling:
482
      #         The user's tasks will be polled one by one and scheduled.
483
      #         Scheduling is fair on task granularity.
484
      #         The policy is based on the username for authentication.
485
      #         And an empty username is considered the same user.
486
      #         When there are no multi-users, the policy decay into FIFO"
487
      name: fifo
488
      taskQueueExpire: 60 # Control how long (many seconds) that queue retains since queue is empty
489
      enableCrossUserGrouping: false # Enable Cross user grouping when using user-task-polling policy. (Disable it if user's task can not merge each other)
490
      maxPendingTaskPerUser: 1024 # Max pending task per user in scheduler
491
  levelZeroForwardPolicy: FilterByBF # delegator level zero deletion forward policy, possible option["FilterByBF", "RemoteLoad"]
492
  streamingDeltaForwardPolicy: FilterByBF # delegator streaming deletion forward policy, possible option["FilterByBF", "Direct"]
493
  forwardBatchSize: 4194304 # the batch size delegator uses for forwarding stream delete in loading procedure
494
  exprCache:
495
    enabled: false # enable expression result cache
496
    capacityBytes: 268435456 # max capacity in bytes for expression result cache
497
  dataSync:
498
    flowGraph:
499
      maxQueueLength: 16 # The maximum size of task queue cache in flow graph in query node.
500
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
501
  enableSegmentPrune: false # use partition stats to prune data in search/query on shard delegator
502
  queryStreamBatchSize: 4194304 # return min batch size of stream query
503
  queryStreamMaxBatchSize: 134217728 # return max batch size of stream query
504
  bloomFilterApplyParallelFactor: 4 # parallel factor when to apply pk to bloom filter, default to 4*CPU_CORE_NUM
505
  workerPooling:
506
    size: 10 # the size for worker querynode client pool
507
  idfOracle:
508
    enableDisk: true
509
    writeConcurrency: 4
510
  ip:  # TCP/IP address of queryNode. If not specified, use the first unicastable address
511
  port: 21123 # TCP port of queryNode
512
  grpc:
513
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the queryNode can send, unit: byte
514
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the queryNode can receive, unit: byte
515
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on queryNode can send, unit: byte
516
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on queryNode can receive, unit: byte
517

518
indexCoord:
519
  bindIndexNodeMode:
520
    enable: false
521
    address: localhost:22930
522
    withCred: false
523
    nodeID: 0
524
  segment:
525
    minSegmentNumRowsToEnableIndex: 1024 # It's a threshold. When the segment num rows is less than this value, the segment will not be indexed
526

527
indexNode:
528
  scheduler:
529
    buildParallel: 1
530
  ip:  # TCP/IP address of indexNode. If not specified, use the first unicastable address
531
  port: 21121 # TCP port of indexNode
532
  grpc:
533
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the indexNode can send, unit: byte
534
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the indexNode can receive, unit: byte
535
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on indexNode can send, unit: byte
536
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on indexNode can receive, unit: byte
537

538
dataCoord:
539
  channel:
540
    watchTimeoutInterval: 300 # Timeout on watching channels (in seconds). Datanode tickler update watch progress will reset timeout timer.
541
    legacyVersionWithoutRPCWatch: 2.4.1 # Datanodes <= this version are considered as legacy nodes, which doesn't have rpc based watch(). This is only used during rolling upgrade where legacy nodes won't get new channels
542
    balanceSilentDuration: 300 # The duration after which the channel manager start background channel balancing
543
    balanceInterval: 360 # The interval with which the channel manager check dml channel balance status
544
    checkInterval: 1 # The interval in seconds with which the channel manager advances channel states
545
    notifyChannelOperationTimeout: 5 # Timeout notifing channel operations (in seconds).
546
  segment:
547
    maxSize: 1024 # The maximum size of a segment, unit: MB. datacoord.segment.maxSize and datacoord.segment.sealProportion together determine if a segment can be sealed.
548
    diskSegmentMaxSize: 2048 # Maximum size of a segment in MB for collection which has Disk index
549
    sealProportion: 0.12 # The minimum proportion to datacoord.segment.maxSize to seal a segment. datacoord.segment.maxSize and datacoord.segment.sealProportion together determine if a segment can be sealed.
550
    sealProportionJitter: 0.1 # segment seal proportion jitter ratio, default value 0.1(10%), if seal proportion is 12%, with jitter=0.1, the actuall applied ratio will be 10.8~12%
551
    assignmentExpiration: 2000 # Expiration time of the segment assignment, unit: ms
552
    allocLatestExpireAttempt: 200 # The time attempting to alloc latest lastExpire from rootCoord after restart
553
    maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60
554
    # If a segment didn't accept dml records in maxIdleTime and the size of segment is greater than
555
    # minSizeFromIdleToSealed, Milvus will automatically seal it.
556
    # The max idle time of segment in seconds, 10*60.
557
    maxIdleTime: 600
558
    minSizeFromIdleToSealed: 16 # The min size in MB of segment which can be idle from sealed.
559
    # The max number of binlog (which is equal to the binlog file num of primary key) for one segment,
560
    # the segment will be sealed if the number of binlog file reaches to max value.
561
    maxBinlogFileNumber: 32
562
    smallProportion: 0.5 # The segment is considered as "small segment" when its # of rows is smaller than
563
    # (smallProportion * segment max # of rows).
564
    # A compaction will happen on small segments if the segment after compaction will have
565
    compactableProportion: 0.85
566
    # over (compactableProportion * segment max # of rows) rows.
567
    # MUST BE GREATER THAN OR EQUAL TO <smallProportion>!!!
568
    # During compaction, the size of segment # of rows is able to exceed segment max # of rows by (expansionRate-1) * 100%.
569
    expansionRate: 1.25
570
  sealPolicy:
571
    channel:
572
      # The size threshold in MB, if the total size of growing segments of each shard
573
      # exceeds this threshold, the largest growing segment will be sealed.
574
      growingSegmentsMemSize: 4096
575
      # If the total entry number of l0 logs of each shard
576
      # exceeds this threshold, the earliest growing segments will be sealed.
577
      blockingL0EntryNum: 5000000
578
      # The size threshold in MB, if the total entry number of l0 logs of each shard
579
      # exceeds this threshold, the earliest growing segments will be sealed.
580
      blockingL0SizeInMB: 64
581
  autoUpgradeSegmentIndex: false # whether auto upgrade segment index to index engine's version
582
  forceRebuildSegmentIndex: false # force rebuild segment index to specify index engine's version
583
  # if param forceRebuildSegmentIndex is enabled, the vector index will be rebuilt to aligned with targetVecIndexVersion.
584
  # if param forceRebuildSegmentIndex is not enabled, the newly created vector index will be aligned with the newer one of index engine's version and targetVecIndexVersion.
585
  # if param targetVecIndexVersion is not set, the default value is -1, which means no target vec index version, then the vector index will be aligned with index engine's version
586
  targetVecIndexVersion: -1
587
  segmentFlushInterval: 2 # the minimal interval duration(unit: Seconds) between flushing operation on same segment
588
  # Switch value to control if to enable segment compaction.
589
  # Compaction merges small-size segments into a large segment, and clears the entities deleted beyond the rentention duration of Time Travel.
590
  enableCompaction: true
591
  compaction:
592
    # Switch value to control if to enable automatic segment compaction during which data coord locates and merges compactable segments in the background.
593
    # This configuration takes effect only when dataCoord.enableCompaction is set as true.
594
    enableAutoCompaction: true
595
    indexBasedCompaction: true
596
    # compaction task prioritizer, options: [default, level, mix].
597
    # default is FIFO.
598
    # level is prioritized by level: L0 compactions first, then mix compactions, then clustering compactions.
599
    # mix is prioritized by level: mix compactions first, then L0 compactions, then clustering compactions.
600
    taskPrioritizer: default
601
    taskQueueCapacity: 100000 # compaction task queue size
602
    rpcTimeout: 10
603
    maxParallelTaskNum: 10
604
    dropTolerance: 86400 # Compaction task will be cleaned after finish longer than this time(in seconds)
605
    gcInterval: 1800 # The time interval in seconds for compaction gc
606
    scheduleInterval: 500 # The time interval in milliseconds for scheduling compaction tasks. If the configuration setting is below 100ms, it will be adjusted upwards to 100ms
607
    mix:
608
      triggerInterval: 60 # The time interval in seconds to trigger mix compaction
609
    levelzero:
610
      triggerInterval: 10 # The time interval in seconds for trigger L0 compaction
611
      forceTrigger:
612
        minSize: 8388608 # The minimum size in bytes to force trigger a LevelZero Compaction, default as 8MB
613
        maxSize: 67108864 # The maxmum size in bytes to force trigger a LevelZero Compaction, default as 64MB
614
        deltalogMinNum: 10 # The minimum number of deltalog files to force trigger a LevelZero Compaction
615
        deltalogMaxNum: 30 # The maxmum number of deltalog files to force trigger a LevelZero Compaction, default as 30
616
    expiry:
617
      tolerance: -1 # tolerant duration in hours for expiry data, negative value means disable force expiry compaction
618
    single:
619
      ratio:
620
        threshold: 0.2 # The ratio threshold of a segment to trigger a single compaction, default as 0.2
621
      deltalog:
622
        maxsize: 16777216 # The deltalog size of a segment to trigger a single compaction, default as 16MB
623
        maxnum: 200 # The deltalog count of a segment to trigger a compaction, default as 200
624
      expiredlog:
625
        maxsize: 10485760 # The expired log size of a segment to trigger a compaction, default as 10MB
626
    clustering:
627
      enable: true # Enable clustering compaction
628
      autoEnable: false # Enable auto clustering compaction
629
      triggerInterval: 600 # clustering compaction trigger interval in seconds
630
      minInterval: 3600 # The minimum interval between clustering compaction executions of one collection, to avoid redundant compaction
631
      maxInterval: 259200 # If a collection haven't been clustering compacted for longer than maxInterval, force compact
632
      newDataSizeThreshold: 512m # If new data size is large than newDataSizeThreshold, execute clustering compaction
633
      preferSegmentSizeRatio: 0.8
634
      maxSegmentSizeRatio: 1
635
      maxTrainSizeRatio: 0.8 # max data size ratio in Kmeans train, if larger than it, will down sampling to meet this limit
636
      maxCentroidsNum: 10240 # maximum centroids number in Kmeans train
637
      minCentroidsNum: 16 # minimum centroids number in Kmeans train
638
      minClusterSizeRatio: 0.01 # minimum cluster size / avg size in Kmeans train
639
      maxClusterSizeRatio: 10 # maximum cluster size / avg size in Kmeans train
640
      maxClusterSize: 5g # maximum cluster size in Kmeans train
641
  syncSegmentsInterval: 300 # The time interval for regularly syncing segments
642
  index:
643
    memSizeEstimateMultiplier: 2 # When the memory size is not setup by index procedure, multiplier to estimate the memory size of index data
644
  enableGarbageCollection: true # Switch value to control if to enable garbage collection to clear the discarded data in MinIO or S3 service.
645
  gc:
646
    interval: 3600 # The interval at which data coord performs garbage collection, unit: second.
647
    missingTolerance: 86400 # The retention duration of the unrecorded binary log (binlog) files. Setting a reasonably large value for this parameter avoids erroneously deleting the newly created binlog files that lack metadata. Unit: second.
648
    dropTolerance: 10800 # The retention duration of the binlog files of the deleted segments before they are cleared, unit: second.
649
    removeConcurrent: 32 # number of concurrent goroutines to remove dropped s3 objects
650
    scanInterval: 168 # orphan file (file on oss but has not been registered on meta) on object storage garbage collection scanning interval in hours
651
  enableActiveStandby: false
652
  brokerTimeout: 5000 # 5000ms, dataCoord broker rpc timeout
653
  autoBalance: true # Enable auto balance
654
  checkAutoBalanceConfigInterval: 10 # the interval of check auto balance config
655
  import:
656
    filesPerPreImportTask: 2 # The maximum number of files allowed per pre-import task.
657
    taskRetention: 10800 # The retention period in seconds for tasks in the Completed or Failed state.
658
    maxSizeInMBPerImportTask: 6144 # To prevent generating of small segments, we will re-group imported files. This parameter represents the sum of file sizes in each group (each ImportTask).
659
    scheduleInterval: 2 # The interval for scheduling import, measured in seconds.
660
    checkIntervalHigh: 2 # The interval for checking import, measured in seconds, is set to a high frequency for the import checker.
661
    checkIntervalLow: 120 # The interval for checking import, measured in seconds, is set to a low frequency for the import checker.
662
    maxImportFileNumPerReq: 1024 # The maximum number of files allowed per single import request.
663
    maxImportJobNum: 1024 # Maximum number of import jobs that are executing or pending.
664
    waitForIndex: true # Indicates whether the import operation waits for the completion of index building.
665
  gracefulStopTimeout: 5 # seconds. force stop node without graceful stop
666
  slot:
667
    clusteringCompactionUsage: 16 # slot usage of clustering compaction job.
668
    mixCompactionUsage: 8 # slot usage of mix compaction job.
669
    l0DeleteCompactionUsage: 8 # slot usage of l0 compaction job.
670
    indexTaskSlotUsage: 64 # slot usage of index task per 512mb
671
    statsTaskSlotUsage: 8 # slot usage of stats task per 512mb
672
    analyzeTaskSlotUsage: 65535 # slot usage of analyze task
673
  jsonStatsTriggerCount: 10 # jsonkey stats task count per trigger
674
  jsonStatsTriggerInterval: 10 # jsonkey task interval per trigger
675
  enabledJSONKeyStatsInSort: false # Indicates whether to enable JSON key stats task with sort
676
  jsonKeyStatsMemoryBudgetInTantivy: 16777216 # the memory budget for the JSON index In Tantivy, the unit is bytes
677
  ip:  # TCP/IP address of dataCoord. If not specified, use the first unicastable address
678
  port: 13333 # TCP port of dataCoord
679
  grpc:
680
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the dataCoord can send, unit: byte
681
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the dataCoord can receive, unit: byte
682
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on dataCoord can send, unit: byte
683
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on dataCoord can receive, unit: byte
684

685
dataNode:
686
  dataSync:
687
    flowGraph:
688
      maxQueueLength: 16 # Maximum length of task queue in flowgraph
689
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
690
    maxParallelSyncMgrTasks: 256 # The max concurrent sync task number of datanode sync mgr globally
691
    skipMode:
692
      enable: true # Support skip some timetick message to reduce CPU usage
693
      skipNum: 4 # Consume one for every n records skipped
694
      coldTime: 60 # Turn on skip mode after there are only timetick msg for x seconds
695
  segment:
696
    # The maximum size of each binlog file in a segment buffered in memory. Binlog files whose size exceeds this value are then flushed to MinIO or S3 service.
697
    # Unit: Byte
698
    # Setting this parameter too small causes the system to store a small amount of data too frequently. Setting it too large increases the system's demand for memory.
699
    insertBufSize: 16777216
700
    deleteBufBytes: 16777216 # Max buffer size in bytes to flush del for a single channel, default as 16MB
701
    syncPeriod: 600 # The period to sync segments if buffer is not empty.
702
  memory:
703
    forceSyncEnable: true # Set true to force sync if memory usage is too high
704
    forceSyncSegmentNum: 1 # number of segments to sync, segments with top largest buffer will be synced.
705
    checkInterval: 3000 # the interal to check datanode memory usage, in milliseconds
706
    forceSyncWatermark: 0.5 # memory watermark for standalone, upon reaching this watermark, segments will be synced.
707
  timetick:
708
    interval: 500
709
  channel:
710
    # specify the size of global work pool of all channels
711
    # if this parameter <= 0, will set it as the maximum number of CPUs that can be executing
712
    # suggest to set it bigger on large collection numbers to avoid blocking
713
    workPoolSize: -1
714
    # specify the size of global work pool for channel checkpoint updating
715
    # if this parameter <= 0, will set it as 10
716
    updateChannelCheckpointMaxParallel: 10
717
    updateChannelCheckpointInterval: 60 # the interval duration(in seconds) for datanode to update channel checkpoint of each channel
718
    updateChannelCheckpointRPCTimeout: 20 # timeout in seconds for UpdateChannelCheckpoint RPC call
719
    maxChannelCheckpointsPerPRC: 128 # The maximum number of channel checkpoints per UpdateChannelCheckpoint RPC.
720
    channelCheckpointUpdateTickInSeconds: 10 # The frequency, in seconds, at which the channel checkpoint updater executes updates.
721
  import:
722
    concurrencyPerCPUCore: 4 # The execution concurrency unit for import/pre-import tasks per CPU core.
723
    maxImportFileSizeInGB: 16 # The maximum file size (in GB) for an import file, where an import file refers to either a Row-Based file or a set of Column-Based files.
724
    readBufferSizeInMB: 64 # The insert buffer size (in MB) during import.
725
    readDeleteBufferSizeInMB: 16 # The delete buffer size (in MB) during import.
726
  compaction:
727
    levelZeroBatchMemoryRatio: 0.5 # The minimal memory ratio of free memory for level zero compaction executing in batch mode
728
    levelZeroMaxBatchSize: -1 # Max batch size refers to the max number of L1/L2 segments in a batch when executing L0 compaction. Default to -1, any value that is less than 1 means no limit. Valid range: >= 1.
729
    useMergeSort: false # Whether to enable mergeSort mode when performing mixCompaction.
730
    maxSegmentMergeSort: 30 # The maximum number of segments to be merged in mergeSort mode.
731
  gracefulStopTimeout: 1800 # seconds. force stop node without graceful stop
732
  slot:
733
    slotCap: 16 # The maximum number of tasks(e.g. compaction, importing) allowed to run concurrently on a datanode
734
  clusteringCompaction:
735
    memoryBufferRatio: 0.3 # The ratio of memory buffer of clustering compaction. Data larger than threshold will be flushed to storage.
736
    workPoolSize: 8 # worker pool size for one clustering compaction job.
737
  bloomFilterApplyParallelFactor: 4 # parallel factor when to apply pk to bloom filter, default to 4*CPU_CORE_NUM
738
  storage:
739
    deltalog: json # deltalog format, options: [json, parquet]
740
  ip:  # TCP/IP address of dataNode. If not specified, use the first unicastable address
741
  port: 21124 # TCP port of dataNode
742
  grpc:
743
    serverMaxSendSize: 536870912 # The maximum size of each RPC request that the dataNode can send, unit: byte
744
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the dataNode can receive, unit: byte
745
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on dataNode can send, unit: byte
746
    clientMaxRecvSize: 536870912 # The maximum size of each RPC request that the clients on dataNode can receive, unit: byte
747

748
# This topic introduces the message channel-related configurations of Milvus.
749
msgChannel:
750
  chanNamePrefix:
751
    # Root name prefix of the channel when a message channel is created.
752
    # It is recommended to change this parameter before starting Milvus for the first time.
753
    # To share a Pulsar instance among multiple Milvus instances, consider changing this to a name rather than the default one for each Milvus instance before you start them.
754
    cluster: by-dev
755
    # Sub-name prefix of the message channel where the root coord publishes time tick messages.
756
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.rootCoordTimeTick}
757
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
758
    # It is recommended to change this parameter before starting Milvus for the first time.
759
    rootCoordTimeTick: rootcoord-timetick
760
    # Sub-name prefix of the message channel where the root coord publishes its own statistics messages.
761
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.rootCoordStatistics}
762
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
763
    # It is recommended to change this parameter before starting Milvus for the first time.
764
    rootCoordStatistics: rootcoord-statistics
765
    # Sub-name prefix of the message channel where the root coord publishes Data Manipulation Language (DML) messages.
766
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.rootCoordDml}
767
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
768
    # It is recommended to change this parameter before starting Milvus for the first time.
769
    rootCoordDml: rootcoord-dml
770
    replicateMsg: replicate-msg
771
    # Sub-name prefix of the message channel where the query node publishes time tick messages.
772
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.queryTimeTick}
773
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
774
    # It is recommended to change this parameter before starting Milvus for the first time.
775
    queryTimeTick: queryTimeTick
776
    # Sub-name prefix of the message channel where the data coord publishes time tick messages.
777
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.dataCoordTimeTick}
778
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
779
    # It is recommended to change this parameter before starting Milvus for the first time.
780
    dataCoordTimeTick: datacoord-timetick-channel
781
    # Sub-name prefix of the message channel where the data coord publishes segment information messages.
782
    # The complete channel name prefix is ${msgChannel.chanNamePrefix.cluster}-${msgChannel.chanNamePrefix.dataCoordSegmentInfo}
783
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
784
    # It is recommended to change this parameter before starting Milvus for the first time.
785
    dataCoordSegmentInfo: segment-info-channel
786
  subNamePrefix:
787
    # Subscription name prefix of the data coord.
788
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
789
    # It is recommended to change this parameter before starting Milvus for the first time.
790
    dataCoordSubNamePrefix: dataCoord
791
    # Subscription name prefix of the data node.
792
    # Caution: Changing this parameter after using Milvus for a period of time will affect your access to old data.
793
    # It is recommended to change this parameter before starting Milvus for the first time.
794
    dataNodeSubNamePrefix: dataNode
795

796
# Configures the system log output.
797
log:
798
  # Milvus log level. Option: debug, info, warn, error, panic, and fatal.
799
  # It is recommended to use debug level under test and development environments, and info level in production environment.
800
  level: info
801
  file:
802
    # Root path to the log files.
803
    # The default value is set empty, indicating to output log files to standard output (stdout) and standard error (stderr).
804
    # If this parameter is set to a valid local path, Milvus writes and stores log files in this path.
805
    # Set this parameter as the path that you have permission to write.
806
    rootPath:
807
    maxSize: 300 # The maximum size of a log file, unit: MB.
808
    maxAge: 10 # The maximum retention time before a log file is automatically cleared, unit: day. The minimum value is 1.
809
    maxBackups: 20 # The maximum number of log files to back up, unit: day. The minimum value is 1.
810
  format: text # Milvus log format. Option: text and JSON
811
  stdout: true # Stdout enable or not
812

813
grpc:
814
  log:
815
    level: WARNING
816
  gracefulStopTimeout: 3 # second, time to wait graceful stop finish
817
  client:
818
    compressionEnabled: false
819
    dialTimeout: 200
820
    keepAliveTime: 10000
821
    keepAliveTimeout: 20000
822
    maxMaxAttempts: 10
823
    initialBackoff: 0.2
824
    maxBackoff: 10
825
    backoffMultiplier: 2
826
    minResetInterval: 1000
827
    maxCancelError: 32
828
    minSessionCheckInterval: 200
829

830
# Configure external tls.
831
tls:
832
  serverPemPath: configs/cert/server.pem
833
  serverKeyPath: configs/cert/server.key
834
  caPemPath: configs/cert/ca.pem
835

836
# Configure internal tls.
837
internaltls:
838
  serverPemPath: configs/cert/server.pem
839
  serverKeyPath: configs/cert/server.key
840
  caPemPath: configs/cert/ca.pem
841
  sni: localhost # The server name indication (SNI) for internal TLS, should be the same as the name provided by the certificates ref: https://en.wikipedia.org/wiki/Server_Name_Indication
842

843
common:
844
  defaultPartitionName: _default # Name of the default partition when a collection is created
845
  defaultIndexName: _default_idx # Name of the index when it is created with name unspecified
846
  entityExpiration: -1 # Entity expiration in seconds, CAUTION -1 means never expire
847
  indexSliceSize: 16 # Index slice size in MB
848
  threadCoreCoefficient:
849
    highPriority: 10 # This parameter specify how many times the number of threads is the number of cores in high priority pool
850
    middlePriority: 5 # This parameter specify how many times the number of threads is the number of cores in middle priority pool
851
    lowPriority: 1 # This parameter specify how many times the number of threads is the number of cores in low priority pool
852
    chunkCache: 10 # This parameter specify how many times the number of threads is the number of cores in chunk cache pool
853
  buildIndexThreadPoolRatio: 0.75
854
  DiskIndex:
855
    MaxDegree: 56
856
    SearchListSize: 100
857
    PQCodeBudgetGBRatio: 0.125
858
    BuildNumThreadsRatio: 1
859
    SearchCacheBudgetGBRatio: 0.1
860
    LoadNumThreadRatio: 8
861
    BeamWidthRatio: 4
862
  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.
863
  gracefulStopTimeout: 1800 # seconds. it will force quit the server if the graceful stop process is not completed during this time.
864
  storageType: remote # please adjust in embedded Milvus: local, available values are [local, remote, opendal], value minio is deprecated, use remote instead
865
  # Default value: auto
866
  # Valid values: [auto, avx512, avx2, avx, sse4_2]
867
  # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
868
  simdType: auto
869
  # This parameter controls the write mode of the local disk, which is used to write temporary data downloaded from remote storage.
870
  # Currently, only QueryNode uses 'common.diskWrite*' parameters. Support for other components will be added in the future.
871
  # The options include 'direct' and 'buffered'. The default value is 'buffered'.
872
  diskWriteMode: buffered
873
  # Disk write buffer size in KB, only used when disk write mode is 'direct', default is 64KB.
874
  # Current valid range is [4, 65536]. If the value is not aligned to 4KB, it will be rounded up to the nearest multiple of 4KB.
875
  diskWriteBufferSizeKb: 64
876
  # This parameter controls the number of writer threads used for disk write operations. The valid range is [0, hardware_concurrency].
877
  # It is designed to limit the maximum concurrency of disk write operations to reduce the impact on disk read performance.
878
  # For example, if you want to limit the maximum concurrency of disk write operations to 1, you can set this parameter to 1.
879
  # The default value is 0, which means the caller will perform write operations directly without using an additional writer thread pool.
880
  # In this case, the maximum concurrency of disk write operations is determined by the caller's thread pool size.
881
  diskWriteNumThreads: 0
882
  diskWriteRateLimiter:
883
    refillPeriodUs: 100000 # refill period in microseconds if disk rate limiter is enabled, default is 100000us (100ms)
884
    avgKBps: 262144 # average kilobytes per second if disk rate limiter is enabled, default is 262144KB/s (256MB/s)
885
    maxBurstKBps: 524288 # max burst kilobytes per second if disk rate limiter is enabled, default is 524288KB/s (512MB/s)
886
    # amplification ratio for high priority tasks if disk rate limiter is enabled, value <= 0 means ratio limit is disabled.
887
    # The ratio is the multiplication factor of the configured bandwidth.
888
    # For example, if the rate limit is 100KB/s, and the high priority ratio is 2, then the high priority tasks will be limited to 200KB/s.
889
    highPriorityRatio: -1
890
    middlePriorityRatio: -1 # amplification ratio for middle priority tasks if disk rate limiter is enabled, value <= 0 means ratio limit is disabled
891
    lowPriorityRatio: -1 # amplification ratio for low priority tasks if disk rate limiter is enabled, value <= 0 means ratio limit is disabled
892
  security:
893
    authorizationEnabled: true
894
    # The superusers will ignore some system check processes,
895
    # like the old password verification when updating the credential
896

897
    superUsers: root
898
    # default password for root user. The maximum length is 72 characters.
899
    # Large numeric passwords require double quotes to avoid yaml parsing precision issues.
900

901
    defaultRootPassword: "xxxxxx"
325 collapsed lines
902
    rootShouldBindRole: false # Whether the root user should bind a role when the authorization is enabled.
903
    enablePublicPrivilege: true # Whether to enable public privilege
904
    rbac:
905
      overrideBuiltInPrivilegeGroups:
906
        enabled: false # Whether to override build-in privilege groups
907
      cluster:
908
        readonly:
909
          privileges: ListDatabases,SelectOwnership,SelectUser,DescribeResourceGroup,ListResourceGroups,ListPrivilegeGroups # Cluster level readonly privileges
910
        readwrite:
911
          privileges: ListDatabases,SelectOwnership,SelectUser,DescribeResourceGroup,ListResourceGroups,ListPrivilegeGroups,FlushAll,TransferNode,TransferReplica,UpdateResourceGroups # Cluster level readwrite privileges
912
        admin:
913
          privileges: ListDatabases,SelectOwnership,SelectUser,DescribeResourceGroup,ListResourceGroups,ListPrivilegeGroups,FlushAll,TransferNode,TransferReplica,UpdateResourceGroups,BackupRBAC,RestoreRBAC,CreateDatabase,DropDatabase,CreateOwnership,DropOwnership,ManageOwnership,CreateResourceGroup,DropResourceGroup,UpdateUser,RenameCollection,CreatePrivilegeGroup,DropPrivilegeGroup,OperatePrivilegeGroup # Cluster level admin privileges
914
      database:
915
        readonly:
916
          privileges: ShowCollections,DescribeDatabase # Database level readonly privileges
917
        readwrite:
918
          privileges: ShowCollections,DescribeDatabase,AlterDatabase # Database level readwrite privileges
919
        admin:
920
          privileges: ShowCollections,DescribeDatabase,AlterDatabase,CreateCollection,DropCollection # Database level admin privileges
921
      collection:
922
        readonly:
923
          privileges: Query,Search,IndexDetail,GetFlushState,GetLoadState,GetLoadingProgress,HasPartition,ShowPartitions,DescribeCollection,DescribeAlias,GetStatistics,ListAliases # Collection level readonly privileges
924
        readwrite:
925
          privileges: Query,Search,IndexDetail,GetFlushState,GetLoadState,GetLoadingProgress,HasPartition,ShowPartitions,DescribeCollection,DescribeAlias,GetStatistics,ListAliases,Load,Release,Insert,Delete,Upsert,Import,Flush,Compaction,LoadBalance,CreateIndex,DropIndex,CreatePartition,DropPartition # Collection level readwrite privileges
926
        admin:
927
          privileges: Query,Search,IndexDetail,GetFlushState,GetLoadState,GetLoadingProgress,HasPartition,ShowPartitions,DescribeCollection,DescribeAlias,GetStatistics,ListAliases,Load,Release,Insert,Delete,Upsert,Import,Flush,Compaction,LoadBalance,CreateIndex,DropIndex,CreatePartition,DropPartition,CreateAlias,DropAlias # Collection level admin privileges
928
    internaltlsEnabled: false
929
    tlsMode: 0
930
  session:
931
    ttl: 30 # ttl value when session granting a lease to register service
932
    retryTimes: 30 # retry times when session sending etcd requests
933
  locks:
934
    metrics:
935
      enable: false # whether gather statistics for metrics locks
936
    threshold:
937
      info: 500 # minimum milliseconds for printing durations in info level
938
      warn: 1000 # minimum milliseconds for printing durations in warn level
939
    maxWLockConditionalWaitTime: 600 # maximum seconds for waiting wlock conditional
940
  storage:
941
    scheme: s3
942
    enablev2: false
943
  # Whether to disable the internal time messaging mechanism for the system.
944
  # If disabled (set to false), the system will not allow DML operations, including insertion, deletion, queries, and searches.
945
  # This helps Milvus-CDC synchronize incremental data
946
  ttMsgEnabled: true
947
  traceLogMode: 0 # trace request info
948
  bloomFilterSize: 100000 # bloom filter initial size
949
  bloomFilterType: BlockedBloomFilter # bloom filter type, support BasicBloomFilter and BlockedBloomFilter
950
  maxBloomFalsePositive: 0.001 # max false positive rate for bloom filter
951
  bloomFilterApplyBatchSize: 1000 # batch size when to apply pk to bloom filter
952
  collectionReplicateEnable: false # Whether to enable collection replication.
953
  usePartitionKeyAsClusteringKey: false # if true, do clustering compaction and segment prune on partition key field
954
  useVectorAsClusteringKey: false # if true, do clustering compaction and segment prune on vector field
955
  enableVectorClusteringKey: false # if true, enable vector clustering key and vector clustering compaction
956
  localRPCEnabled: false # enable local rpc for internal communication when mix or standalone mode.
957
  sync:
958
    taskPoolReleaseTimeoutSeconds: 60 # The maximum time to wait for the task to finish and release resources in the pool
959
  enabledOptimizeExpr: true # Indicates whether to enable optimize expr
960
  enabledJSONKeyStats: false # Indicates sealedsegment whether to enable JSON key stats
961
  enabledGrowingSegmentJSONKeyStats: false # Indicates growingsegment whether to enable JSON key stats
962
  enableConfigParamTypeCheck: true # Indicates whether to enable config param type check
963
  clusterID: 0 # cluster id
964

965
# QuotaConfig, configurations of Milvus quota and limits.
966
# By default, we enable:
967
#   1. TT protection;
968
#   2. Memory protection.
969
#   3. Disk quota protection.
970
# You can enable:
971
#   1. DML throughput limitation;
972
#   2. DDL, DQL qps/rps limitation;
973
#   3. DQL Queue length/latency protection;
974
#   4. DQL result rate protection;
975
# If necessary, you can also manually force to deny RW requests.
976
quotaAndLimits:
977
  enabled: true # `true` to enable quota and limits, `false` to disable.
978
  # quotaCenterCollectInterval is the time interval that quotaCenter
979
  # collects metrics from Proxies, Query cluster and Data cluster.
980
  # seconds, (0 ~ 65536)
981
  quotaCenterCollectInterval: 3
982
  forceDenyAllDDL: false # true to force deny all DDL requests, false to allow.
983
  limits:
984
    allocRetryTimes: 15 # retry times when delete alloc forward data from rate limit failed
985
    allocWaitInterval: 1000 # retry wait duration when delete alloc forward data rate failed, in millisecond
986
    complexDeleteLimitEnable: false # whether complex delete check forward data by limiter
987
    maxCollectionNum: 65536
988
    maxCollectionNumPerDB: 65536 # Maximum number of collections per database.
989
    maxInsertSize: -1 # maximum size of a single insert request, in bytes, -1 means no limit
990
    maxResourceGroupNumOfQueryNode: 1024 # maximum number of resource groups of query nodes
991
    maxGroupSize: 10 # maximum size for one single group when doing search group by
992
  ddl:
993
    enabled: false # Whether DDL request throttling is enabled.
994
    # Maximum number of collection-related DDL requests per second.
995
    # Setting this item to 10 indicates that Milvus processes no more than 10 collection-related DDL requests per second, including collection creation requests, collection drop requests, collection load requests, and collection release requests.
996
    # To use this setting, set quotaAndLimits.ddl.enabled to true at the same time.
997
    collectionRate: -1
998
    # Maximum number of partition-related DDL requests per second.
999
    # Setting this item to 10 indicates that Milvus processes no more than 10 partition-related requests per second, including partition creation requests, partition drop requests, partition load requests, and partition release requests.
1000
    # To use this setting, set quotaAndLimits.ddl.enabled to true at the same time.
1001
    partitionRate: -1
1002
    db:
1003
      collectionRate: -1 # qps of db level , default no limit, rate for CreateCollection, DropCollection, LoadCollection, ReleaseCollection
1004
      partitionRate: -1 # qps of db level, default no limit, rate for CreatePartition, DropPartition, LoadPartition, ReleasePartition
1005
  indexRate:
1006
    enabled: false # Whether index-related request throttling is enabled.
1007
    # Maximum number of index-related requests per second.
1008
    # Setting this item to 10 indicates that Milvus processes no more than 10 partition-related requests per second, including index creation requests and index drop requests.
1009
    # To use this setting, set quotaAndLimits.indexRate.enabled to true at the same time.
1010
    max: -1
1011
    db:
1012
      max: -1 # qps of db level, default no limit, rate for CreateIndex, DropIndex
1013
  flushRate:
1014
    enabled: true # Whether flush request throttling is enabled.
1015
    # Maximum number of flush requests per second.
1016
    # Setting this item to 10 indicates that Milvus processes no more than 10 flush requests per second.
1017
    # To use this setting, set quotaAndLimits.flushRate.enabled to true at the same time.
1018
    max: -1
1019
    collection:
1020
      max: 0.1 # qps, default no limit, rate for flush at collection level.
1021
    db:
1022
      max: -1 # qps of db level, default no limit, rate for flush
1023
  compactionRate:
1024
    enabled: false # Whether manual compaction request throttling is enabled.
1025
    # Maximum number of manual-compaction requests per second.
1026
    # Setting this item to 10 indicates that Milvus processes no more than 10 manual-compaction requests per second.
1027
    # To use this setting, set quotaAndLimits.compaction.enabled to true at the same time.
1028
    max: -1
1029
    db:
1030
      max: -1 # qps of db level, default no limit, rate for manualCompaction
1031
  dbRate:
1032
    enabled: false # Whether DB request throttling is enabled
1033
    # Maximum number of db-related requests per second.
1034
    # Setting this item to 10 indicates that Milvus processes no more than 10 db-related requests per second, including db creation/drop/alter requests.
1035
    # To use this setting, set quotaAndLimits.dbRate.enabled to true at the same time.
1036
    #
1037
    max: -1
1038
  dml:
1039
    enabled: false # Whether DML request throttling is enabled.
1040
    insertRate:
1041
      # Highest data insertion rate per second.
1042
      # Setting this item to 5 indicates that Milvus only allows data insertion at the rate of 5 MB/s.
1043
      # To use this setting, set quotaAndLimits.dml.enabled to true at the same time.
1044
      max: -1
1045
      db:
1046
        max: -1 # MB/s, default no limit
1047
      collection:
1048
        # Highest data insertion rate per collection per second.
1049
        # Setting this item to 5 indicates that Milvus only allows data insertion to any collection at the rate of 5 MB/s.
1050
        # To use this setting, set quotaAndLimits.dml.enabled to true at the same time.
1051
        max: -1
1052
      partition:
1053
        max: -1 # MB/s, default no limit
1054
    upsertRate:
1055
      max: -1 # MB/s, default no limit
1056
      db:
1057
        max: -1 # MB/s, default no limit
1058
      collection:
1059
        max: -1 # MB/s, default no limit
1060
      partition:
1061
        max: -1 # MB/s, default no limit
1062
    deleteRate:
1063
      # Highest data deletion rate per second.
1064
      # Setting this item to 0.1 indicates that Milvus only allows data deletion at the rate of 0.1 MB/s.
1065
      # To use this setting, set quotaAndLimits.dml.enabled to true at the same time.
1066
      max: -1
1067
      db:
1068
        max: -1 # MB/s, default no limit
1069
      collection:
1070
        # Highest data deletion rate per second.
1071
        # Setting this item to 0.1 indicates that Milvus only allows data deletion from any collection at the rate of 0.1 MB/s.
1072
        # To use this setting, set quotaAndLimits.dml.enabled to true at the same time.
1073
        max: -1
1074
      partition:
1075
        max: -1 # MB/s, default no limit
1076
    bulkLoadRate:
1077
      max: -1 # MB/s, default no limit, not support yet. TODO: limit bulkLoad rate
1078
      db:
1079
        max: -1 # MB/s, default no limit, not support yet. TODO: limit db bulkLoad rate
1080
      collection:
1081
        max: -1 # MB/s, default no limit, not support yet. TODO: limit collection bulkLoad rate
1082
      partition:
1083
        max: -1 # MB/s, default no limit, not support yet. TODO: limit partition bulkLoad rate
1084
  dql:
1085
    enabled: false # Whether DQL request throttling is enabled.
1086
    searchRate:
1087
      # Maximum number of vectors to search per second.
1088
      # Setting this item to 100 indicates that Milvus only allows searching 100 vectors per second no matter whether these 100 vectors are all in one search or scattered across multiple searches.
1089
      # To use this setting, set quotaAndLimits.dql.enabled to true at the same time.
1090
      max: -1
1091
      db:
1092
        max: -1 # vps (vectors per second), default no limit
1093
      collection:
1094
        # Maximum number of vectors to search per collection per second.
1095
        # Setting this item to 100 indicates that Milvus only allows searching 100 vectors per second per collection no matter whether these 100 vectors are all in one search or scattered across multiple searches.
1096
        # To use this setting, set quotaAndLimits.dql.enabled to true at the same time.
1097
        max: -1
1098
      partition:
1099
        max: -1 # vps (vectors per second), default no limit
1100
    queryRate:
1101
      # Maximum number of queries per second.
1102
      # Setting this item to 100 indicates that Milvus only allows 100 queries per second.
1103
      # To use this setting, set quotaAndLimits.dql.enabled to true at the same time.
1104
      max: -1
1105
      db:
1106
        max: -1 # qps, default no limit
1107
      collection:
1108
        # Maximum number of queries per collection per second.
1109
        # Setting this item to 100 indicates that Milvus only allows 100 queries per collection per second.
1110
        # To use this setting, set quotaAndLimits.dql.enabled to true at the same time.
1111
        max: -1
1112
      partition:
1113
        max: -1 # qps, default no limit
1114
  limitWriting:
1115
    # forceDeny false means dml requests are allowed (except for some
1116
    # specific conditions, such as memory of nodes to water marker), true means always reject all dml requests.
1117
    forceDeny: false
1118
    ttProtection:
1119
      enabled: false
1120
      # maxTimeTickDelay indicates the backpressure for DML Operations.
1121
      # DML rates would be reduced according to the ratio of time tick delay to maxTimeTickDelay,
1122
      # if time tick delay is greater than maxTimeTickDelay, all DML requests would be rejected.
1123
      # seconds
1124
      maxTimeTickDelay: 300
1125
    memProtection:
1126
      # When memory usage > memoryHighWaterLevel, all dml requests would be rejected;
1127
      # When memoryLowWaterLevel < memory usage < memoryHighWaterLevel, reduce the dml rate;
1128
      # When memory usage < memoryLowWaterLevel, no action.
1129
      enabled: true
1130
      dataNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in DataNodes
1131
      dataNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in DataNodes
1132
      queryNodeMemoryLowWaterLevel: 0.85 # (0, 1], memoryLowWaterLevel in QueryNodes
1133
      queryNodeMemoryHighWaterLevel: 0.95 # (0, 1], memoryHighWaterLevel in QueryNodes
1134
    growingSegmentsSizeProtection:
1135
      # No action will be taken if the growing segments size is less than the low watermark.
1136
      # When the growing segments size exceeds the low watermark, the dml rate will be reduced,
1137
      # but the rate will not be lower than minRateRatio * dmlRate.
1138
      enabled: false
1139
      minRateRatio: 0.5
1140
      lowWaterLevel: 0.2
1141
      highWaterLevel: 0.4
1142
    diskProtection:
1143
      enabled: true # When the total file size of object storage is greater than `diskQuota`, all dml requests would be rejected;
1144
      diskQuota: -1 # MB, (0, +inf), default no limit
1145
      diskQuotaPerDB: -1 # MB, (0, +inf), default no limit
1146
      diskQuotaPerCollection: -1 # MB, (0, +inf), default no limit
1147
      diskQuotaPerPartition: -1 # MB, (0, +inf), default no limit
1148
    l0SegmentsRowCountProtection:
1149
      enabled: false # switch to enable l0 segment row count quota
1150
      lowWaterLevel: 30000000 # l0 segment row count quota, low water level
1151
      highWaterLevel: 50000000 # l0 segment row count quota, high water level
1152
    deleteBufferRowCountProtection:
1153
      enabled: false # switch to enable delete buffer row count quota
1154
      lowWaterLevel: 32768 # delete buffer row count quota, low water level
1155
      highWaterLevel: 65536 # delete buffer row count quota, high water level
1156
    deleteBufferSizeProtection:
1157
      enabled: false # switch to enable delete buffer size quota
1158
      lowWaterLevel: 134217728 # delete buffer size quota, low water level
1159
      highWaterLevel: 268435456 # delete buffer size quota, high water level
1160
  limitReading:
1161
    # forceDeny false means dql requests are allowed (except for some
1162
    # specific conditions, such as collection has been dropped), true means always reject all dql requests.
1163
    forceDeny: false
1164

1165
trace:
1166
  # trace exporter type, default is stdout,
1167
  # optional values: ['noop','stdout', 'jaeger', 'otlp']
1168
  exporter: noop
1169
  # fraction of traceID based sampler,
1170
  # optional values: [0, 1]
1171
  # Fractions >= 1 will always sample. Fractions < 0 are treated as zero.
1172
  sampleFraction: 0
1173
  jaeger:
1174
    url:  # when exporter is jaeger should set the jaeger's URL
1175
  otlp:
1176
    endpoint:  # example: "127.0.0.1:4317" for grpc, "127.0.0.1:4318" for http
1177
    method:  # otlp export method, acceptable values: ["grpc", "http"],  using "grpc" by default
1178
    secure: true
1179
  initTimeoutSeconds: 10 # segcore initialization timeout in seconds, preventing otlp grpc hangs forever
1180

1181
#when using GPU indexing, Milvus will utilize a memory pool to avoid frequent memory allocation and deallocation.
1182
#here, you can set the size of the memory occupied by the memory pool, with the unit being MB.
1183
#note that there is a possibility of Milvus crashing when the actual memory demand exceeds the value set by maxMemSize.
1184
#if initMemSize and MaxMemSize both set zero,
1185
#milvus will automatically initialize half of the available GPU memory,
1186
#maxMemSize will the whole available GPU memory.
1187
gpu:
1188
  initMemSize: 2048 # Gpu Memory Pool init size
1189
  maxMemSize: 4096 # Gpu Memory Pool Max size
1190

1191
# Any configuration related to the streaming node server.
1192
streamingNode:
1193
  ip:  # TCP/IP address of streamingNode. If not specified, use the first unicastable address
1194
  port: 22222 # TCP port of streamingNode
1195
  grpc:
1196
    serverMaxSendSize: 268435456 # The maximum size of each RPC request that the streamingNode can send, unit: byte
1197
    serverMaxRecvSize: 268435456 # The maximum size of each RPC request that the streamingNode can receive, unit: byte
1198
    clientMaxSendSize: 268435456 # The maximum size of each RPC request that the clients on streamingNode can send, unit: byte
1199
    clientMaxRecvSize: 268435456 # The maximum size of each RPC request that the clients on streamingNode can receive, unit: byte
1200

1201
# Any configuration related to the streaming service.
1202
streaming:
1203
  walBalancer:
1204
    # The interval of balance task trigger at background, 1 min by default.
1205
    # It's ok to set it into duration string, such as 30s or 1m30s, see time.ParseDuration
1206
    triggerInterval: 1m
1207
    # The initial interval of balance task trigger backoff, 50 ms by default.
1208
    # It's ok to set it into duration string, such as 30s or 1m30s, see time.ParseDuration
1209
    backoffInitialInterval: 50ms
1210
    backoffMultiplier: 2 # The multiplier of balance task trigger backoff, 2 by default
1211
  walBroadcaster:
1212
    concurrencyRatio: 1 # The concurrency ratio based on number of CPU for wal broadcaster, 1 by default.
1213
  txn:
1214
    defaultKeepaliveTimeout: 10s # The default keepalive timeout for wal txn, 10s by default
1215

1216
# Any configuration related to the knowhere vector search engine
1217
knowhere:
1218
  enable: true # When enable this configuration, the index parameters defined following will be automatically populated as index parameters, without requiring user input.
1219
  DISKANN:
1220
    build:
1221
      max_degree: 56 # Maximum degree of the Vamana graph
1222
      pq_code_budget_gb_ratio: 0.125 # Size limit on the PQ code (compared with raw data)
1223
      search_cache_budget_gb_ratio: 0.1 # Ratio of cached node numbers to raw data
1224
      search_list_size: 100 # Size of the candidate list during building graph
1225
    search:
1226
      beam_width_ratio: 4 # Ratio between the maximum number of IO requests per search iteration and CPU number

编辑docker-compose.yaml

1
services:
2
  etcd:
3
    container_name: milvus-etcd
4
    image: quay.io/coreos/etcd:v3.5.18
5
    environment:
6
      - ETCD_AUTO_COMPACTION_MODE=revision
7
      - ETCD_AUTO_COMPACTION_RETENTION=1000
8
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
9
      - ETCD_SNAPSHOT_COUNT=50000
10
    volumes:
11
      - ./etcd:/etcd
12
    command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
13
    healthcheck:
14
      test: ["CMD", "etcdctl", "endpoint", "health"]
15
      interval: 30s
16
      timeout: 20s
17
      retries: 3
18

19

20
  standalone:
21
    container_name: milvus-standalone
22
    image: milvusdb/milvus:v2.5.19
23
    command: ["milvus", "run", "standalone"]
24
    security_opt:
25
    - seccomp:unconfined
26
    environment:
27
      - ETCD_ENDPOINTS=etcd:2379
28
      - TIMEZONE=Asia/Shanghai
29
    volumes:
30
      - ./milvus.yaml:/milvus/configs/milvus.yaml
31
      - ./milvus:/var/lib/milvus
32
    healthcheck:
33
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
34
      interval: 30s
35
      start_period: 90s
36
      timeout: 20s
37
      retries: 3
38
    ports:
39
      - "65011:19530" # gRPC
40
      - "65012:9091" # HTTP管理面板（不知道是不是有bug，设置了用户密码依旧能直接访问，有知道咋解决的大佬请评论下咋弄））
41
    depends_on:
42
      - "etcd"

启动Milvus
Terminal window
```
1
docker-compose up -d
```

配置Nginx Proxy Manager反代gRPC协议#

如图配置：

1
# 第一行可以删除，主要是用于在用cf cdn后让npm获取真实ip的
2
# 如果需要的话，请参考https://blog.useforall.com/posts/nginx-proxy-manager-get-real-client-ip-a-unified-solution进行配置
3
include /data/nginx/custom/cloudflare_ips.conf;
4
underscores_in_headers on;
5
location / {
6
    # 验证是否为 gRPC 请求 (可选但推荐)
7
    if ($content_type !~ "application/grpc") {
8
        return 404;
9
    }
10

11
    # 超时和保活设置
12
    grpc_read_timeout 300s;
13
    grpc_send_timeout 300s;
14
    grpc_socket_keepalive on;
15

16
    grpc_pass grpc://172.17.0.1:65011;
17
}
18
access_log off;

去Cloudflare打开gRPC开关
注意
有些人这个地方可能会是Join Beta，点击加入即可。我有些域名直接可以开，有些显示要加入Beta。具体看个人情况。加入Beta后会显示：
```
1
Thanks for your interest! You will be able to enable gRPC support once you have been admitted to the beta.
```
具体要多久也不清楚，可能几分钟，也可能几小时不等。
最后请参照官方教程连接Milvus数据库

Docker部署Milvus并使用Nginx Proxy Manager反代gRPC协议

组件说明#

部署流程#

部署Milvus#

基本配置#

配置Nginx Proxy Manager反代gRPC协议#

评论区

目录