-
Notifications
You must be signed in to change notification settings - Fork 183
DOC-12485 prevent bucket from running out of space #3811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/8.0
Are you sure you want to change the base?
Changes from all commits
8f0d840
f923219
cf954eb
8143e97
e4256fa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -122,7 +122,7 @@ The listed alerts are as follows. | |||||
| The auto-failover system stops auto-failover when the maximum number of spare nodes available has been reached. | ||||||
| `auto_failover_maximum_reached` | ||||||
|
||||||
| Node wasn't auto-failed-over as other nodes are down at the same time | ||||||
| Node was not auto-failed-over as other nodes are down at the same time | ||||||
| Auto-failover does not take place if there is already a node down. | ||||||
| `auto_failover_other_nodes_down` | ||||||
|
||||||
|
@@ -202,17 +202,30 @@ The size of the change history may need to be increased. | |||||
For information, on establishing change-history size, see xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets]. | ||||||
| `history_size_warning` | ||||||
|
||||||
| Low Indexer Residence Percentage | ||||||
| Approaching Indexer low resident percentage | ||||||
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| `indexer_low_resident_percentage` | ||||||
|
||||||
a| [#memcached-alert] | ||||||
Memcached connection threshold exceeded. | ||||||
| Trigger an alert if the number of `system` or `user` connections used by the data service exceeds a configurable percentage of the available connections{blank}xref:#memcached-alert-foonote[^1^]. | ||||||
For information on setting the `memcached` alert thresholds, see xref:rest-api:rest-cluster-email-notifications.adoc#setting-memcache-alert-threshold[Setting alerts]. | ||||||
For information about setting the `memcached` alert thresholds, see xref:rest-api:rest-cluster-email-notifications.adoc#setting-memcache-alert-threshold[Setting alerts]. | ||||||
| `memcached_connections` | ||||||
|
||||||
| Rebalance stage appears stuck | ||||||
| An ongoing KV or index rebalance has not made progress during the timeout period set by the `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV` alert limits. | ||||||
The default value for the timeout period is 1800 seconds (30 minutes). | ||||||
| `stuck_rebalance` | ||||||
|
||||||
| Disk usage is within 10% of maximum for data service mutations | ||||||
| The used disk space on the a filesystem containing the Data Service storage path is within 10% of the configured limit. | ||||||
This limit is set either through the Advanced Data Settings in the Couchbase Server Web Console, or by using the `/settings/resourceManagement` REST API endpoint. | ||||||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information. | ||||||
| `disk_guardrail` | ||||||
|
||||||
| Index has diverging replicas | ||||||
| The indexer has detected inconsistencies between an index and its replicas. | ||||||
| `indexer_diverging_replicas` | ||||||
|
||||||
|=== | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -156,35 +156,54 @@ For information, see xref:learn:clusters-and-availability/rebalance.adoc#limitin | |||||
|
||||||
[#data-settings] | ||||||
=== Data Settings | ||||||
The fields that appear when you expand the *Advanced Data Settings* section let you control filesystem use limits and I/O thread allocation. | ||||||
|
||||||
The settings in this area control the numbers of threads that are allocated _per node_ by Couchbase Server to the _reading_ and _writing_ of data, respectively. | ||||||
The maximum thread-allocation to each is _64_, the minimum _4_. | ||||||
image::manage-settings/data-settings.png["The Data Settings panel",align=center] | ||||||
|
||||||
A high thread-allocation may improve performance on systems whose hardware-resources are commensurately supportive (for example, where the number of CPU cores is high). | ||||||
In particular, a high number of _writer_ threads on such systems may significantly optimize the performance of _durable writes_: see xref:learn:data/durability.adoc[Durability], for information. | ||||||
*Prevent writes to buckets when storage becomes <number>% full* controls whether Couchbase Server prevents the filesystem containing the data path from becoming full. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "whether Couchbase Server prevents the filesystem containing the data path from becoming full." |
||||||
This option is off by default. | ||||||
When selected, Couchbase Server prevents writes to buckets when the filesystem fills to the percent you set in the *% full* field. | ||||||
The default value for this field is 85%. | ||||||
|
||||||
Note, however, that a high thread-allocation might _impair_ some aspects of system-performance on less appropriately resourced nodes. | ||||||
Consequently, changes to the default thread-allocation should not be made to production systems without prior testing. | ||||||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information. | ||||||
|
||||||
Left-clicking on the *Advanced Data Settings* tab displays radio buttons for *Reader Thread Settings* and *Writer Thread Settings*: | ||||||
The *Reader Thread Settings* and *Writer Thread Settings* options let you control the number of threads the Data Service uses on each node to read and write data. | ||||||
Allocating more threads can improve performance. | ||||||
In particular, adding more writer threads can improve durable write performance,. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
See xref:learn:data/durability.adoc[] for more information. | ||||||
However, setting the number of threads too high can reduce performance if the node is not capable of handling the additional threads. | ||||||
|
||||||
image::manage-settings/data-settings.png["The Data Settings panel",548,align=center] | ||||||
Both *Reader Thread Settings* and *Writer Thread Settings* offer the same options: | ||||||
|
||||||
Each group has the same, three radio buttons, which are as follows: | ||||||
Default:: | ||||||
Couchbase Server sets the number of threads to a balanced value suitable for most workloads. | ||||||
|
||||||
* *Default*. | ||||||
The number of threads allocated is set to a balanced value which is reasonable for most workloads. | ||||||
Disk i/o optimized:: | ||||||
Couchbase Server sets the number of threads equal to the number of CPU cores on the node. | ||||||
For buckets using the Magma storage engine, consider using this setting for the following conditions: | ||||||
+ | ||||||
-- | ||||||
For Writes:: | ||||||
+ | ||||||
* When reducing the latency of durable writes is more important to you than write throughput. | ||||||
* For write-intensive workloads where you want greater throughput and you find the SSD is not saturated using the default setting. | ||||||
|
||||||
* *Disk i/o optimized*. | ||||||
The number of threads allocated is equal to the number of CPU cores for the node. + | ||||||
In order to get maximum performance from Magma for disk-oriented workloads, it is recommended to set the Writer Threads to 'Disk i/o optimized'. This setting will ensure there are enough threads to sustain high write rates. + | ||||||
To Learn more about the Magma Storage Engine, see xref:learn:buckets-memory-and-storage/storage-engines.adoc#storage-engine-magma[Storage Engines -- Magma Storage Engine]. | ||||||
For Reads:: | ||||||
+ | ||||||
* When you have low memory data residency, use this option for better throughput and latency. | ||||||
* When your data is on a high-latency virtualized storage device such as EBS volumes on the cloud. | ||||||
In this case, a larger I/O queue depth helps saturate the disk IOPS/bandwidth. | ||||||
|
||||||
* *Fixed value*. | ||||||
The number of threads allocated is equal to the value selected from the pull-down menu. | ||||||
For more details, see xref:learn:buckets-memory-and-storage/storage-engines.adoc#storage-engine-magma[Magma]. | ||||||
-- | ||||||
|
||||||
Fixed value:: | ||||||
When you select this option, a field appears in which you can select the number of threads to use. | ||||||
+ | ||||||
NOTE: A good rule of thumb is to set each of readers and writers equal to the queue depth of the underlying IO subsystem (i.e. readers = queue_depth and writers = queue_depth). + | ||||||
However, for best performance it is recommended to benchmark with different settings and pick the one that best meets the throughput and latency requirements in your environment. | ||||||
NOTE: As a guideline, set the number of reader and writer threads equal to the queue depth of your IO subsystem (for example, readers = queue_depth and writers = queue_depth). | ||||||
For best performance, benchmark different settings and choose the one that meets your throughput and latency requirements. | ||||||
|
||||||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#threading[Threading] for more information about reader and writer threads. | ||||||
|
||||||
[#query-settings] | ||||||
=== Query Settings | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,166 @@ | ||||||
= Set Data Disk Use Limits | ||||||
:description: You can have Couchbase Server stop writing to the data storage path when it is a specific percentage full. This option helps prevent the data path from running out of disk space and making recovery difficult. | ||||||
:keywords: storage, disk usage limits, disk space, data storage path | ||||||
|
||||||
|
||||||
[abstract] | ||||||
{description} | ||||||
|
||||||
== Description | ||||||
|
||||||
Allowing any filesystem on a node to become full can cause errors. | ||||||
If the filesystem containing the data storage path becomes full, recovery can be difficult. | ||||||
This endpoint allows you to set a limit on the percentage of disk space that can be used by the data storage path. | ||||||
When the data storage path reaches this limit, Couchbase Server stops writing to it. | ||||||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information. | ||||||
|
||||||
== HTTP Methods | ||||||
|
||||||
This API endpoint supports the following methods: | ||||||
|
||||||
* <<#get-settings>> | ||||||
* <<#set-usage-limit>> | ||||||
|
||||||
|
||||||
[[get-settings]] | ||||||
== Get Data Disk Use Limits | ||||||
|
||||||
Use this endpoint to get the current data disk use limit settings. | ||||||
|
||||||
.Get Limit Settings | ||||||
---- | ||||||
GET /settings/resourceManagement | ||||||
---- | ||||||
|
||||||
=== curl Syntax | ||||||
|
||||||
[source,bash] | ||||||
---- | ||||||
curl -u $USER:$PASSWORD -X GET \ | ||||||
'http://{HOST}:{PORT}/settings/resourceManagement' | ||||||
---- | ||||||
|
||||||
.Path Parameters | ||||||
:priv-link: get-privs | ||||||
include::partial$user-pw-host-port-params.adoc[] | ||||||
|
||||||
[[get-privs]] | ||||||
=== Required Privileges | ||||||
|
||||||
You must have at least on one of the following roles: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* xref:learn:security/roles.adoc#full-admin[Full Admin] | ||||||
* xref:learn:security/roles.adoc#cluster-admin[Cluster Admin] | ||||||
* xref:learn:security/roles.adoc#local-user-security-admin[Local User Admin] | ||||||
* xref:learn:security/roles.adoc#security-admin[Security Admin] | ||||||
|
||||||
|
||||||
=== Responses | ||||||
|
||||||
`200 OK`:: | ||||||
Returns a JSON object containing the current data disk use limit settings. | ||||||
See <<get-settings-example>> for the schema of the output. | ||||||
|
||||||
`403 Forbidden`:: | ||||||
Returned if the user does not have one of the roles listed in <<get-privs>>. | ||||||
|
||||||
[#get-settings-example] | ||||||
=== Examples | ||||||
|
||||||
The following gets the current settings for data disk use limits: | ||||||
|
||||||
[source,bash] | ||||||
---- | ||||||
curl -u Administrator:password \ | ||||||
-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
---- | ||||||
|
||||||
The JSON returned by this command shows the current settings for data disk use limits: | ||||||
|
||||||
[source,json] | ||||||
---- | ||||||
{ | ||||||
"diskUsage": { | ||||||
"enabled": false, | ||||||
"maximum": 85 | ||||||
} | ||||||
} | ||||||
---- | ||||||
|
||||||
The result shows that the disk usage limit is not enabled, and the maximum disk usage is set to 85% (the default) | ||||||
|
||||||
|
||||||
[[set-usage-limit]] | ||||||
== Set Data Disk Use Limits | ||||||
Use this endpoint to set the data disk use limit settings. | ||||||
|
||||||
.Set Limits | ||||||
---- | ||||||
POST /settings/resourceManagement | ||||||
---- | ||||||
|
||||||
=== curl Syntax | ||||||
|
||||||
[source,bash] | ||||||
---- | ||||||
curl -u $USER:$PASSWORD -X POST \ | ||||||
'http://{HOST}:{PORT}/settings/resourceManagement' \ | ||||||
-H 'Content-Type: application/json' \ | ||||||
-d '{"diskUsage": {"enabled": [true|false], "maximum": <integer>}}' | ||||||
---- | ||||||
|
||||||
.Path Parameters | ||||||
:priv-link: set-privs | ||||||
include::partial$user-pw-host-port-params.adoc[] | ||||||
|
||||||
.Data Parameters | ||||||
|
||||||
`enabled` (Boolean):: | ||||||
If `true`, enables the data disk use limit. If `false`, disables the data disk use limit. | ||||||
|
||||||
`maximum` (integer):: | ||||||
The maximum percentage of disk space that can be used by the data storage path. | ||||||
If the data storage path reaches this limit, Couchbase Server stops writing to it. | ||||||
This value must be between 1 and 100. | ||||||
|
||||||
[[set-privs]] | ||||||
=== Required Privileges | ||||||
|
||||||
You must have at least on one of the following roles: | ||||||
|
||||||
* xref:learn:security/roles.adoc#full-admin[Full Admin] | ||||||
* xref:learn:security/roles.adoc#cluster-admin[Cluster Admin] | ||||||
* xref:learn:security/roles.adoc#security-admin[Security Admin] | ||||||
|
||||||
=== Responses | ||||||
|
||||||
`200 OK`:: | ||||||
Returns a JSON object containing the current data disk use limit settings. | ||||||
See <<set-limit-example>> for the schema of the output. | ||||||
|
||||||
`403 Forbidden`:: | ||||||
Returned if the user does not have one of the roles listed in <<set-privs>>. | ||||||
|
||||||
[#set-limit-example] | ||||||
=== Examples | ||||||
|
||||||
The following example enables data disk use limits and sets the maximum disk usage to 90%: | ||||||
|
||||||
[source,bash] | ||||||
---- | ||||||
curl -X POST 'http://127.0.0.1:8091/settings/resourceManagement' \ | ||||||
-H "Content-Type: application/json"\ | ||||||
-d '{"diskUsage": {"enabled": true, "maximum": 90}}' | jq | ||||||
---- | ||||||
|
||||||
The JSON returned by this command shows new current settings for data disk use limits: | ||||||
|
||||||
[source,json] | ||||||
---- | ||||||
{ | ||||||
"diskUsage": { | ||||||
"enabled": true, | ||||||
"maximum": 90 | ||||||
} | ||||||
} | ||||||
---- |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -69,6 +69,7 @@ curl -X POST http://<ip-address-or-domain-name:8091>/settings/alerts/limits | |||||
-d certExpirationDays = <integer> | ||||||
-d historyWarningThreshold=<integer> | ||||||
-d lowIndexerResidentPerc=<integer> | ||||||
-d maxDataDiskUsedPerc=<integer> | ||||||
-d maxDiskUsedPerc=<integer> | ||||||
-d maxIndexerRamPerc=<integer> | ||||||
-d maxOverheadPerc=<integer> | ||||||
|
@@ -77,7 +78,8 @@ curl -X POST http://<ip-address-or-domain-name:8091>/settings/alerts/limits | |||||
-d memoryCriticalThreshold=<integer> | ||||||
-d memcachedSystemConnectionWarningThreshold=<integer> | ||||||
-d memcachedUserConnectionWarningThreshold=<integer> | ||||||
|
||||||
-d stuckRebalanceThresholdIndex=<integer> | ||||||
-d stuckRebalanceThresholdKV=<integer> | ||||||
|
||||||
curl -X POST http://<ip-address-or-domain-name>:8091/settings/alert/sendTestEmail | ||||||
-u <username>:<password> | ||||||
|
@@ -152,6 +154,15 @@ See xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets], for inf | |||||
Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, which is the value of `lowIndexerResidentPerc`. | ||||||
The default value is `10`. | ||||||
|
||||||
* `maxDataDiskUsedPerc`. | ||||||
The percentage of disk space used that will trigger an alert on the filesystem containing the data service, index service, or the `ns_log` or `audit_log` storage paths. | ||||||
This alert warns you that the disk is becoming full. | ||||||
It occurs even if data disk usage limits are not enabled. | ||||||
The value must be an integer between `1` and `100`, which is the percentage of disk space used. | ||||||
It defaults to `90`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It actually defaults to 75%. Also, if the data disk limit is enabled, then it will ignore the configured threshold and use 10% less than the enforcement threshold. |
||||||
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information. | ||||||
|
||||||
[[maxdatadiskusedperc]] | ||||||
* `maxDiskUsedPerc`, `maxIndexerRamPerc`, and `maxOverheadPerc`. | ||||||
The maximum percentages for disk usage, memory consumption by the Index Service, and overhead. | ||||||
Values must be between `0` and `100`. | ||||||
|
@@ -173,6 +184,12 @@ NOTE: If the node exceeds 90% of the available system connections, then please c | |||||
|
||||||
* `memcachedUserConnectionWarningThreshold`. Trigger the `xref:manage:manage-settings/configure-alerts.adoc#memcached-alert[memcached_connections]` alert if the number of `user` connections in use exceeds the given percentage of connections available. (E.g., if this value is set to `90`, the system will trigger an alert if the number of user connections used by the data service exceeds 90% of the available connections.) | ||||||
|
||||||
* `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV`. | ||||||
Sets the timeout threshold for an index rebalance and a data operation to be considered stuck. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
If this period elapses and no progress has been made, Couchbase Server tiggers an alert. | ||||||
The value must be an integer that represents a number of seconds. | ||||||
The default value is `1800` seconds (30 minutes). | ||||||
|
||||||
== Responses | ||||||
|
||||||
A successful call returns `200 OK`. | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
|
||
`USER`:: | ||
The name of a user who has one of the roles listed in <<{priv-link}>>. | ||
|
||
`PASSWORD`:: | ||
The password for the `user`. | ||
|
||
`HOST`:: | ||
Hostname or IP address of a Couchbase Server. | ||
|
||
`PORT`:: | ||
Port number for the REST API. | ||
Defaults are 8091 for unencrypted and 18901 for encrypted connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggests that there wasn't already an alert for this, which there is: https://docs.couchbase.com/server/current/manage/manage-settings/configure-alerts.html#:~:text=Disk%20space%20used%20for%20persistent%20storage%20has%20reached%20at%20least%2090%25%20of%20capacity
The new alert is lower and specific to the data disk