Skip to content

DOC-12485 prevent bucket from running out of space #3811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: release/8.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,7 @@ include::cli:partial$cbcli/nav.adoc[]
**** xref:rest-api:rest-manage-cluster-connections.adoc[Managing Cluster Connections]
**** xref:rest-api:rest-set-up-alternate-address.adoc[Managing Alternate Addresses]
**** xref:rest-api:rest-cluster-email-notifications.adoc[Setting Alerts]
**** xref:rest-api:disk-usage-limits.adoc[]

*** xref:rest-api:rest-status-and-events-overview.adoc[Status and Events]
**** xref:rest-api:rest-get-cluster-tasks.adoc[Getting Cluster Tasks]
Expand Down
8 changes: 8 additions & 0 deletions modules/introduction/partials/new-features-80.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,14 @@ The metric includes the first 32 characters sent by any clients up to the first
and limits the number of metrics to 100.
Additional information sent by clients at connection time can be found in the logs.

[#section-new-feature-800-disk-limits]
https://jira.issues.couchbase.com/browse/MB-59113[MB-59113] Prevent buckets from causing nodes to run out of disk space::
You can configure Couchbase Server to prevent writes to buckets from consuming all of the disk space in a node.
You set a minimum amount of space every node must have free in the filesystem used by the data service.
If the node's has less free space than this limit, Couchbase Server prevents writes to buckets.
Even if you do not set this limit, Couchbase Server now alerts you when a node starts to run out of disk space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.


[#section-new-feature-800-XDCR]
=== XDCR
Expand Down
235 changes: 152 additions & 83 deletions modules/learn/pages/buckets-memory-and-storage/storage-settings.adoc

Large diffs are not rendered by default.

Binary file modified modules/manage/assets/images/manage-settings/data-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 16 additions & 3 deletions modules/manage/pages/manage-settings/configure-alerts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ The listed alerts are as follows.
| The auto-failover system stops auto-failover when the maximum number of spare nodes available has been reached.
| `auto_failover_maximum_reached`

| Node wasn't auto-failed-over as other nodes are down at the same time
| Node was not auto-failed-over as other nodes are down at the same time
| Auto-failover does not take place if there is already a node down.
| `auto_failover_other_nodes_down`

Expand Down Expand Up @@ -202,17 +202,30 @@ The size of the change history may need to be increased.
For information, on establishing change-history size, see xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets].
| `history_size_warning`

| Low Indexer Residence Percentage
| Approaching Indexer low resident percentage
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`.
| Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, the default for which is `10`%.

| `indexer_low_resident_percentage`

a| [#memcached-alert]
Memcached connection threshold exceeded.
| Trigger an alert if the number of `system` or `user` connections used by the data service exceeds a configurable percentage of the available connections{blank}xref:#memcached-alert-foonote[^1^].
For information on setting the `memcached` alert thresholds, see xref:rest-api:rest-cluster-email-notifications.adoc#setting-memcache-alert-threshold[Setting alerts].
For information about setting the `memcached` alert thresholds, see xref:rest-api:rest-cluster-email-notifications.adoc#setting-memcache-alert-threshold[Setting alerts].
| `memcached_connections`

| Rebalance stage appears stuck
| An ongoing KV or index rebalance has not made progress during the timeout period set by the `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV` alert limits.
The default value for the timeout period is 1800 seconds (30 minutes).
| `stuck_rebalance`

| Disk usage is within 10% of maximum for data service mutations
| The used disk space on the a filesystem containing the Data Service storage path is within 10% of the configured limit.
This limit is set either through the Advanced Data Settings in the Couchbase Server Web Console, or by using the `/settings/resourceManagement` REST API endpoint.
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.
| `disk_guardrail`

| Index has diverging replicas
| The indexer has detected inconsistencies between an index and its replicas.
| `indexer_diverging_replicas`

|===

Expand Down
57 changes: 38 additions & 19 deletions modules/manage/pages/manage-settings/general-settings.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -156,35 +156,54 @@ For information, see xref:learn:clusters-and-availability/rebalance.adoc#limitin

[#data-settings]
=== Data Settings
The fields that appear when you expand the *Advanced Data Settings* section let you control filesystem use limits and I/O thread allocation.

The settings in this area control the numbers of threads that are allocated _per node_ by Couchbase Server to the _reading_ and _writing_ of data, respectively.
The maximum thread-allocation to each is _64_, the minimum _4_.
image::manage-settings/data-settings.png["The Data Settings panel",align=center]

A high thread-allocation may improve performance on systems whose hardware-resources are commensurately supportive (for example, where the number of CPU cores is high).
In particular, a high number of _writer_ threads on such systems may significantly optimize the performance of _durable writes_: see xref:learn:data/durability.adoc[Durability], for information.
*Prevent writes to buckets when storage becomes <number>% full* controls whether Couchbase Server prevents the filesystem containing the data path from becoming full.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"whether Couchbase Server prevents the filesystem containing the data path from becoming full."
This is too strongly worded. We can't prevent the filesystem becoming full, so lets be careful not to imply that we can

This option is off by default.
When selected, Couchbase Server prevents writes to buckets when the filesystem fills to the percent you set in the *% full* field.
The default value for this field is 85%.

Note, however, that a high thread-allocation might _impair_ some aspects of system-performance on less appropriately resourced nodes.
Consequently, changes to the default thread-allocation should not be made to production systems without prior testing.
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.

Left-clicking on the *Advanced Data Settings* tab displays radio buttons for *Reader Thread Settings* and *Writer Thread Settings*:
The *Reader Thread Settings* and *Writer Thread Settings* options let you control the number of threads the Data Service uses on each node to read and write data.
Allocating more threads can improve performance.
In particular, adding more writer threads can improve durable write performance,.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In particular, adding more writer threads can improve durable write performance,.
In particular, adding more writer threads can improve durable write performance.

See xref:learn:data/durability.adoc[] for more information.
However, setting the number of threads too high can reduce performance if the node is not capable of handling the additional threads.

image::manage-settings/data-settings.png["The Data Settings panel",548,align=center]
Both *Reader Thread Settings* and *Writer Thread Settings* offer the same options:

Each group has the same, three radio buttons, which are as follows:
Default::
Couchbase Server sets the number of threads to a balanced value suitable for most workloads.

* *Default*.
The number of threads allocated is set to a balanced value which is reasonable for most workloads.
Disk i/o optimized::
Couchbase Server sets the number of threads equal to the number of CPU cores on the node.
For buckets using the Magma storage engine, consider using this setting for the following conditions:
+
--
For Writes::
+
* When reducing the latency of durable writes is more important to you than write throughput.
* For write-intensive workloads where you want greater throughput and you find the SSD is not saturated using the default setting.

* *Disk i/o optimized*.
The number of threads allocated is equal to the number of CPU cores for the node. +
In order to get maximum performance from Magma for disk-oriented workloads, it is recommended to set the Writer Threads to 'Disk i/o optimized'. This setting will ensure there are enough threads to sustain high write rates. +
To Learn more about the Magma Storage Engine, see xref:learn:buckets-memory-and-storage/storage-engines.adoc#storage-engine-magma[Storage Engines -- Magma Storage Engine].
For Reads::
+
* When you have low memory data residency, use this option for better throughput and latency.
* When your data is on a high-latency virtualized storage device such as EBS volumes on the cloud.
In this case, a larger I/O queue depth helps saturate the disk IOPS/bandwidth.

* *Fixed value*.
The number of threads allocated is equal to the value selected from the pull-down menu.
For more details, see xref:learn:buckets-memory-and-storage/storage-engines.adoc#storage-engine-magma[Magma].
--

Fixed value::
When you select this option, a field appears in which you can select the number of threads to use.
+
NOTE: A good rule of thumb is to set each of readers and writers equal to the queue depth of the underlying IO subsystem (i.e. readers = queue_depth and writers = queue_depth). +
However, for best performance it is recommended to benchmark with different settings and pick the one that best meets the throughput and latency requirements in your environment.
NOTE: As a guideline, set the number of reader and writer threads equal to the queue depth of your IO subsystem (for example, readers = queue_depth and writers = queue_depth).
For best performance, benchmark different settings and choose the one that meets your throughput and latency requirements.

See xref:learn:buckets-memory-and-storage/storage-settings.adoc#threading[Threading] for more information about reader and writer threads.

[#query-settings]
=== Query Settings
Expand Down
166 changes: 166 additions & 0 deletions modules/rest-api/pages/disk-usage-limits.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
= Set Data Disk Use Limits
:description: You can have Couchbase Server stop writing to the data storage path when it is a specific percentage full. This option helps prevent the data path from running out of disk space and making recovery difficult.
:keywords: storage, disk usage limits, disk space, data storage path


[abstract]
{description}

== Description

Allowing any filesystem on a node to become full can cause errors.
If the filesystem containing the data storage path becomes full, recovery can be difficult.
This endpoint allows you to set a limit on the percentage of disk space that can be used by the data storage path.
When the data storage path reaches this limit, Couchbase Server stops writing to it.
See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.

== HTTP Methods

This API endpoint supports the following methods:

* <<#get-settings>>
* <<#set-usage-limit>>


[[get-settings]]
== Get Data Disk Use Limits

Use this endpoint to get the current data disk use limit settings.

.Get Limit Settings
----
GET /settings/resourceManagement
----

=== curl Syntax

[source,bash]
----
curl -u $USER:$PASSWORD -X GET \
'http://{HOST}:{PORT}/settings/resourceManagement'
----

.Path Parameters
:priv-link: get-privs
include::partial$user-pw-host-port-params.adoc[]

[[get-privs]]
=== Required Privileges

You must have at least on one of the following roles:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You must have at least on one of the following roles:
You must have at least one of the following roles:


* xref:learn:security/roles.adoc#full-admin[Full Admin]
* xref:learn:security/roles.adoc#cluster-admin[Cluster Admin]
* xref:learn:security/roles.adoc#local-user-security-admin[Local User Admin]
* xref:learn:security/roles.adoc#security-admin[Security Admin]


=== Responses

`200 OK`::
Returns a JSON object containing the current data disk use limit settings.
See <<get-settings-example>> for the schema of the output.

`403 Forbidden`::
Returned if the user does not have one of the roles listed in <<get-privs>>.

[#get-settings-example]
=== Examples

The following gets the current settings for data disk use limits:

[source,bash]
----
curl -u Administrator:password \
-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-X GET 'http://127.0.0.1:8091//settings/resourceManagement' | jq
-X GET 'http://127.0.0.1:8091/settings/resourceManagement' | jq

----

The JSON returned by this command shows the current settings for data disk use limits:

[source,json]
----
{
"diskUsage": {
"enabled": false,
"maximum": 85
}
}
----

The result shows that the disk usage limit is not enabled, and the maximum disk usage is set to 85% (the default)


[[set-usage-limit]]
== Set Data Disk Use Limits
Use this endpoint to set the data disk use limit settings.

.Set Limits
----
POST /settings/resourceManagement
----

=== curl Syntax

[source,bash]
----
curl -u $USER:$PASSWORD -X POST \
'http://{HOST}:{PORT}/settings/resourceManagement' \
-H 'Content-Type: application/json' \
-d '{"diskUsage": {"enabled": [true|false], "maximum": <integer>}}'
----

.Path Parameters
:priv-link: set-privs
include::partial$user-pw-host-port-params.adoc[]

.Data Parameters

`enabled` (Boolean)::
If `true`, enables the data disk use limit. If `false`, disables the data disk use limit.

`maximum` (integer)::
The maximum percentage of disk space that can be used by the data storage path.
If the data storage path reaches this limit, Couchbase Server stops writing to it.
This value must be between 1 and 100.

[[set-privs]]
=== Required Privileges

You must have at least on one of the following roles:

* xref:learn:security/roles.adoc#full-admin[Full Admin]
* xref:learn:security/roles.adoc#cluster-admin[Cluster Admin]
* xref:learn:security/roles.adoc#security-admin[Security Admin]

=== Responses

`200 OK`::
Returns a JSON object containing the current data disk use limit settings.
See <<set-limit-example>> for the schema of the output.

`403 Forbidden`::
Returned if the user does not have one of the roles listed in <<set-privs>>.

[#set-limit-example]
=== Examples

The following example enables data disk use limits and sets the maximum disk usage to 90%:

[source,bash]
----
curl -X POST 'http://127.0.0.1:8091/settings/resourceManagement' \
-H "Content-Type: application/json"\
-d '{"diskUsage": {"enabled": true, "maximum": 90}}' | jq
----

The JSON returned by this command shows new current settings for data disk use limits:

[source,json]
----
{
"diskUsage": {
"enabled": true,
"maximum": 90
}
}
----
19 changes: 18 additions & 1 deletion modules/rest-api/pages/rest-cluster-email-notifications.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ curl -X POST http://<ip-address-or-domain-name:8091>/settings/alerts/limits
-d certExpirationDays = <integer>
-d historyWarningThreshold=<integer>
-d lowIndexerResidentPerc=<integer>
-d maxDataDiskUsedPerc=<integer>
-d maxDiskUsedPerc=<integer>
-d maxIndexerRamPerc=<integer>
-d maxOverheadPerc=<integer>
Expand All @@ -77,7 +78,8 @@ curl -X POST http://<ip-address-or-domain-name:8091>/settings/alerts/limits
-d memoryCriticalThreshold=<integer>
-d memcachedSystemConnectionWarningThreshold=<integer>
-d memcachedUserConnectionWarningThreshold=<integer>

-d stuckRebalanceThresholdIndex=<integer>
-d stuckRebalanceThresholdKV=<integer>

curl -X POST http://<ip-address-or-domain-name>:8091/settings/alert/sendTestEmail
-u <username>:<password>
Expand Down Expand Up @@ -152,6 +154,15 @@ See xref:rest-api:rest-bucket-create.adoc[Creating and Editing Buckets], for inf
Warns that the Index Service is, on a given node, occupying a percentage of available memory that is below an established threshold, which is the value of `lowIndexerResidentPerc`.
The default value is `10`.

* `maxDataDiskUsedPerc`.
The percentage of disk space used that will trigger an alert on the filesystem containing the data service, index service, or the `ns_log` or `audit_log` storage paths.
This alert warns you that the disk is becoming full.
It occurs even if data disk usage limits are not enabled.
The value must be an integer between `1` and `100`, which is the percentage of disk space used.
It defaults to `90`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually defaults to 75%. Also, if the data disk limit is enabled, then it will ignore the configured threshold and use 10% less than the enforcement threshold.

See xref:learn:buckets-memory-and-storage/storage-settings.adoc#filesystem-free-space-and-usage-limits[Filesystem Free Space and Usage Limits] for more information.

[[maxdatadiskusedperc]]
* `maxDiskUsedPerc`, `maxIndexerRamPerc`, and `maxOverheadPerc`.
The maximum percentages for disk usage, memory consumption by the Index Service, and overhead.
Values must be between `0` and `100`.
Expand All @@ -173,6 +184,12 @@ NOTE: If the node exceeds 90% of the available system connections, then please c

* `memcachedUserConnectionWarningThreshold`. Trigger the `xref:manage:manage-settings/configure-alerts.adoc#memcached-alert[memcached_connections]` alert if the number of `user` connections in use exceeds the given percentage of connections available. (E.g., if this value is set to `90`, the system will trigger an alert if the number of user connections used by the data service exceeds 90% of the available connections.)

* `stuckRebalanceThresholdIndex` and `stuckRebalanceThresholdKV`.
Sets the timeout threshold for an index rebalance and a data operation to be considered stuck.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sets the timeout threshold for an index rebalance and a data operation to be considered stuck.
Sets the timeout threshold for a data or index service rebalance to make no identified progress to be considered stuck.

If this period elapses and no progress has been made, Couchbase Server tiggers an alert.
The value must be an integer that represents a number of seconds.
The default value is `1800` seconds (30 minutes).

== Responses

A successful call returns `200 OK`.
Expand Down
13 changes: 13 additions & 0 deletions modules/rest-api/partials/user-pw-host-port-params.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

`USER`::
The name of a user who has one of the roles listed in <<{priv-link}>>.

`PASSWORD`::
The password for the `user`.

`HOST`::
Hostname or IP address of a Couchbase Server.

`PORT`::
Port number for the REST API.
Defaults are 8091 for unencrypted and 18901 for encrypted connections.