Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds 'Backup, high availability, and resilience tools' landing page #301

Merged
merged 16 commits into from
Feb 5, 2025
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 33 additions & 13 deletions deploy-manage/tools.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you still planning on changing this url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, but I’d prefer to create a new issue and PR for it to ensure that all links in other content remain safe. I think we can treat this as a secondary priority for now until we complete the base migration work - what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,48 @@ mapped_pages:

# Backup, high availability, and resilience tools [high-availability]

Your data is important to you. Keeping it safe and available is important to Elastic. Sometimes your cluster may experience hardware failure or a power loss. To help you plan for this, {{es}} offers a number of features to achieve high availability despite failures. Depending on your deployment type, you might need to provision servers in different zones or configure external repositories to meet your organization’s availability needs.
Elasticsearch provides comprehensive tools to safeguard data, ensure continuous availability, and maintain resilience. These tools are designed to support disaster recovery strategies, enabling businesses to protect critical information and minimize downtime, and maintain high availability in case of unexpected failures.

* **[Design for resilience](production-guidance/availability-and-resilience.md)**
For strategies to design resilient clusters, see **[Availability and resilience](production-guidance/availability-and-resilience.md)**.

Distributed systems like Elasticsearch are designed to keep working even if some of their components have failed. An Elasticsearch cluster can continue operating normally if some of its nodes are unavailable or disconnected, as long as there are enough well-connected nodes to take over the unavailable node’s responsibilities.
::::{note}
Snapshot and restore, as well as cross-cluster replication, are not currently available in serverless deployments, but they are planned for future releases. For more information, see **[LINK-TODO]**.
::::

If you’re designing a smaller cluster, you might focus on making your cluster resilient to single-node failures. Designers of larger clusters must also consider cases where multiple nodes fail at the same time.
## Snapshots

* **[Cross-cluster replication](tools/cross-cluster-replication.md)**
Snapshots in Elasticsearch are point-in-time backups that include your cluster's data, settings, and overall state. They capture all the information necessary to restore your cluster to a specific moment in time, making them essential for protecting data, recovering from unexpected issues, and transferring data between clusters. Snapshots are a reliable way to ensure the safety of your data and maintain the continuity of your operations.

To effectively distribute read and write operations across nodes, the nodes in a cluster need good, reliable connections to each other. To provide better connections, you typically co-locate the nodes in the same data center or nearby data centers.
You can perform the following tasks to manage snapshots and snapshot repositories:

Co-locating nodes in a single location exposes you to the risk of a single outage taking your entire cluster offline. To maintain high availability, you can prepare a second cluster that can take over in case of disaster by implementing {{ccr}} (CCR).
- **[Register a repository](tools/snapshot-and-restore/manage-snapshot-repositories.md):** Configure storage repositories (for example, S3, Azure, Google Cloud) to store snapshots. The way that you register repositories differs depending on your deployment method:
- **[Elastic Cloud Hosted](tools/snapshot-and-restore/elastic-cloud-hosted.md):** Deployments come with a preconfigured S3 repository for automatic backups, simplifying the setup process. You can also register external repositories, such as Azure, and Google Cloud, for more flexibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **[Elastic Cloud Hosted](tools/snapshot-and-restore/elastic-cloud-hosted.md):** Deployments come with a preconfigured S3 repository for automatic backups, simplifying the setup process. You can also register external repositories, such as Azure, and Google Cloud, for more flexibility.
- **[Elastic Cloud Hosted](tools/snapshot-and-restore/elastic-cloud-hosted.md):** Deployments come with a preconfigured S3 repository for automatic backups, simplifying the setup process. You can also register external repositories, such as Azure and Google Cloud, for more flexibility.

- **[Elastic Cloud Enterprise](tools/snapshot-and-restore/cloud-enterprise.md):** Repository configuration is managed through the Elastic Cloud Enterprise user interface and automatically linked to deployments.
- **[Elastic Cloud on Kubernetes](tools/snapshot-and-restore/cloud-on-k8s.md) and [self-managed](tools/snapshot-and-restore/self-managed.md) deployments:** Repositories must be configured manually.

CCR provides a way to automatically synchronize indices from a leader cluster to a follower cluster. This cluster could be in a different data center or even a different content from the leader cluster. If the primary cluster fails, the secondary cluster can take over.
- **[Create snapshots](tools/snapshot-and-restore/create-snapshots.md):** Manually or automatically create backups of your cluster.
- **[Restore a snapshot](tools/snapshot-and-restore/restore-snapshot.md):** Recover indices, data streams, or the entire cluster to revert to a previous state. You can choose to restore specific parts of a snapshot, such as a single index, or perform a full restore.

::::{tip}
You can also use CCR to create secondary clusters to serve read requests in geo-proximity to your users.
::::
To reduce storage costs for infrequently accessed data while maintaining access, you can also create **[searchable snapshots](tools/snapshot-and-restore/searchable-snapshots.md)**.

* **[Snapshots](tools/snapshot-and-restore.md)**
::::{note}
Snapshot configurations vary across Elastic Cloud Hosted, Elastic Cloud Enterprise (ECE), Elastic Cloud on Kubernetes (ECK), and self-managed deployments.
::::
Comment on lines +39 to +41
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is duplicative so we can remove it

Suggested change
::::{note}
Snapshot configurations vary across Elastic Cloud Hosted, Elastic Cloud Enterprise (ECE), Elastic Cloud on Kubernetes (ECK), and self-managed deployments.
::::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed, I think we should remove it except if there was any specific need (and in such case the note would need to explain its point, as it's currently vague).


Take snapshots of your cluster that can be restored in case of failure.
## Cross-cluster replication (CCR)

**[Cross-cluster replication (CCR)](tools/cross-cluster-replication.md)** is a feature in Elasticsearch that allows you to replicate data in real time from a leader cluster to one or more follower clusters. This replication ensures that data is synchronized across clusters, providing continuity, redundancy, and enhanced data accessibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this content show up on a child page?

To effectively distribute read and write operations across nodes, the nodes in a cluster need good, reliable connections to each other. To provide better connections, you typically co-locate the nodes in the same data center or nearby data centers.

Co-locating nodes in a single location exposes you to the risk of a single outage taking your entire cluster offline. To maintain high availability, you can prepare a second cluster that can take over in case of disaster by implementing {{ccr}} (CCR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was planning to move this to the CCR landing page.


CCR provides a way to automatically synchronize indices from a leader cluster to a follower cluster. This cluster could be in a different data center or even a different continent from the leader cluster. If the primary cluster fails, the secondary cluster can take over.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second para feels a lot like the one above it. let's blend them

Suggested change
**[Cross-cluster replication (CCR)](tools/cross-cluster-replication.md)** is a feature in Elasticsearch that allows you to replicate data in real time from a leader cluster to one or more follower clusters. This replication ensures that data is synchronized across clusters, providing continuity, redundancy, and enhanced data accessibility.
CCR provides a way to automatically synchronize indices from a leader cluster to a follower cluster. This cluster could be in a different data center or even a different continent from the leader cluster. If the primary cluster fails, the secondary cluster can take over.
**[Cross-cluster replication (CCR)](tools/cross-cluster-replication.md)** is a feature in Elasticsearch that allows you to replicate data in real time from a leader cluster to one or more follower clusters. This cluster could be in a different data center or even a different continent from the leader cluster.
This replication ensures that data is synchronized across clusters, providing continuity, redundancy, and enhanced data accessibility. For example, if the primary cluster fails, the secondary cluster can take over.


::::{note}
CCR relies on **[Remote clusters](remote-clusters.md)** functionality to establish and manage connections between the leader and the follower clusters.
::::

You can perform the following tasks to manage cross-cluster replication:

- **[Set up CCR](tools/cross-cluster-replication/set-up-cross-cluster-replication.md):** Configure leader and follower clusters for data replication. To enable CCR, you need to configure **[Remote clusters](remote-clusters.md)**.
- **[Manage CCR](tools/cross-cluster-replication/manage-cross-cluster-replication.md):** Monitor and manage replicated indices.
- **[Automate replication](tools/cross-cluster-replication/manage-auto-follow-patterns.md):** Use auto-follow patterns to automatically replicate newly created indices.
- **Failover clusters:** Configure **[uni-directional](tools/cross-cluster-replication/uni-directional-disaster-recovery.md)** or **[bi-directional](tools/cross-cluster-replication/bi-directional-disaster-recovery.md)** CCR for redundancy and disaster recovery.
- **[Cluster upgrade considerations when using CCR](upgrade.md):** Maintain cluster performance and security by upgrading your Elasticsearch clusters. Use the tools and procedures suited to your deployment type: Elastic Cloud Hosted, ECE, ECK, or self-managed.
Loading