-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Monokaix <changxuzheng@huawei.com>
- Loading branch information
Showing
25 changed files
with
3,140 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
+++ | ||
title = "Introduction" | ||
|
||
date = 2024-09-29 | ||
lastmod = 2024-09-29 | ||
|
||
draft = false # Is this a draft? true/false | ||
toc = true # Show table of contents? true/false | ||
type = "docs" # Do not modify. | ||
|
||
# Add menu entry to sidebar. | ||
[menu.v1-10-0] | ||
parent = "home" | ||
weight = 1 | ||
+++ | ||
|
||
## What is Volcano | ||
Volcano is a cloud native system for high-performance workloads, which has been accepted by [Cloud Native Computing Foundation (CNCF)](https://www.cncf.io/) as its first and only official container batch scheduling project. Volcano supports popular computing frameworks such as [Spark](https://spark.apache.org/), [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Flink](https://flink.apache.org/), [Argo](https://argoproj.github.io/), [MindSpore](https://www.mindspore.cn/en), and [PaddlePaddle](https://www.paddlepaddle.org.cn/). Volcano also supports scheduling of computing resources on different architecture, such as x86, Arm, and Kunpeng. | ||
|
||
## Why Volcano | ||
Job scheduling and management become increasingly complex and critical for high-performance batch computing. Common requirements are as follows: | ||
|
||
* Support for diverse scheduling algorithms | ||
* More efficient scheduling | ||
* Non-intrusive support for mainstream computing frameworks | ||
* Support for multi-architecture computing | ||
|
||
Volcano is designed to cater to these requirements. In addition, Volcano inherits the design of Kubernetes APIs, allowing you to easily run applications that require high-performance computing on Kubernetes. | ||
## Features | ||
### Rich scheduling policies | ||
Volcano supports a variety of scheduling policies: | ||
|
||
* Gang scheduling | ||
* Fair-share scheduling | ||
* Queue scheduling | ||
* Preemption scheduling | ||
* Topology-based scheduling | ||
* Reclaim | ||
* Backfill | ||
* Resource reservation | ||
|
||
You can also configure plug-ins and actions to use custom scheduling policies. | ||
### Enhanced job management | ||
You can use enhanced job features of Volcano for high-performance computing: | ||
|
||
* Multi-pod jobs | ||
* Improved error handling | ||
* Indexed jobs | ||
|
||
### Multi-architecture computing | ||
Volcano can schedule computing resources from multiple architectures: | ||
|
||
* x86 | ||
* Arm | ||
* Kunpeng | ||
* Ascend | ||
* GPU | ||
|
||
### Faster scheduling | ||
Compared with existing queue schedulers, Volcano shortens the average scheduling delay through a series of optimizations. | ||
|
||
## Ecosystem | ||
Volcano allows you to use mainstream computing frameworks: | ||
|
||
* [Spark](https://spark.apache.org/) | ||
* [TensorFlow](https://www.tensorflow.org/) | ||
* [PyTorch](https://pytorch.org/) | ||
* [Flink](https://flink.apache.org/) | ||
* [Argo](https://argoproj.github.io/) | ||
* [MindSpore](https://www.mindspore.cn/en) | ||
* [PaddlePaddle](https://www.paddlepaddle.org.cn/) | ||
* [Open MPI](https://www.open-mpi.org/) | ||
* [Horovod](https://horovod.readthedocs.io/) | ||
* [MXNet](https://mxnet.apache.org/) | ||
* [Kubeflow](https://www.kubeflow.org/) | ||
* [KubeGene](https://github.com/volcano-sh/kubegene) | ||
* [Cromwell](https://cromwell.readthedocs.io/) | ||
|
||
Volcano has been commercially used as the infrastructure scheduling engine by companies and organizations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
+++ | ||
title = "Actions" | ||
|
||
date = 2024-09-29 | ||
lastmod = 2024-09-29 | ||
|
||
draft = false # Is this a draft? true/false | ||
toc = true # Show table of contents? true/false | ||
type = "docs" # Do not modify. | ||
|
||
# Add menu entry to sidebar. | ||
linktitle = "Actions" | ||
[menu.v1-10-0] | ||
parent = "scheduler" | ||
weight = 2 | ||
+++ | ||
|
||
|
||
|
||
### Enqueue | ||
|
||
#### Overview | ||
|
||
The Enqueue action filters qualified jobs into the queue to be scheduled. When the minimum number of resource requests under a Job cannot be met, even if the scheduling action is performed for a pod under a Job, pod will not be able to schedule because the "Gang" constraint is not reached. A state refresh from "Pending" to "Inqueue" can only happen if the minimum resource size of the job is met. In general, the Enqueue action is an essential action for the scheduler configuration. | ||
|
||
#### Scenario | ||
|
||
Enqueue action is the preparatory stage in the scheduling process. Only when the cluster resources meet the minimum resource request for the job scheduling, the job state can be changed from "pending" to "Enqueue". In this way, Enqueue Action can prevent a large number of unscheduled pods in the cluster and improve the performance of the scheduler in the high-load scenarios where the cluster resources may be insufficient, such as AI/MPI/HPC. | ||
|
||
|
||
|
||
### Allocate | ||
|
||
#### Overview | ||
|
||
This Action binds of <task , node> , including pre-selection and further selection.PredicateFn is used to filter out nodes that cannot be allocated,and NodeOrderFn is used to score the nodes to find the one that best fits.Allocate action is a essential step in a scheduling process,which is used to handle pod scheduling that has resource requests in the pod list to be scheduled. | ||
|
||
The Allocate action follows the commit mechanism. When a pod's scheduling request is satisfied, a binding action is not necessarily performed for that pod. This step also depends on whether the gang constraint of the Job in which the pod resides is satisfied. Only if the gang constraint of the Job in which the pod resides is satisfied can the pod be scheduled; otherwise, the pod cannot be scheduled. | ||
|
||
#### Scenario | ||
|
||
In a clustered mixed business scenario, the Allocate pre-selected part enables specific businesses (AI, big data, HPC, scientific computing) to quickly filter, sort, and schedule according to their namespace quickly and centrally. In a complex computing scenario such as TensorFlow or MPI, where there are multiple tasks in a single job, the Allocate action traversal multiple task allocation options under the job to find the most appropriate node for each task. | ||
|
||
|
||
|
||
### Preempt | ||
|
||
#### Overview | ||
|
||
The preempt action is used for resource preemption between jobs in a queue , or between tasks in a job.The preempt action is a preemption step in the scheduling process, which is used to deal with high-priority scheduling problems. It is used for preemption between jobs in the same queue, or between tasks under the same job. | ||
|
||
#### Scenario | ||
|
||
- Preemption between jobs in the same queue: Multiple departments in a company share a cluster, and each department can be mapped into a Queue. Resources of different departments cannot be preempted from each other. This mechanism can well guarantee the isolation of resources of departments..In complex scheduling scenarios, basic resources (CPUs, disks, GPUs, memory, network bandwidth) are allocated based on services: In computing-intensive scenarios, such as AI and high-performance scientific computing, queues require more computing resources, such as CPUs, GPUs, and memory. Big data scenarios, such as the Spark framework, have high requirements on disks. Different queues share resources. If AI jobs preempts all CPU resources, jobs in queues of other scenarios will starve. Therefore, the queue-based resource allocation is used to ensure service running. | ||
- Preemption between tasks in the same job: Usually, there can be multiple tasks in the same Job. For example, in complex AI application scenarios, a parameter server and multiple workers need to be set inside the TF-job, and preemption between multiple workers is supported by preemption within such scenarios. | ||
|
||
### Reserve | ||
|
||
#### Overview | ||
|
||
The action has been deprecated from v1.2 and replaced with SLA plugin. | ||
|
||
The Reserve action completes the resource reservation. Bind the selected target job to the node. The Reserve action, the elect action, and the Reservation plugin make up the resource Reservation mechanism. The Reserve action must be configured after the allocate action. | ||
|
||
#### Scenario | ||
|
||
In practical applications, there are two common scenarios as follows: | ||
|
||
- In the case of insufficient cluster resources, it is assumed that for Job A and Job B in the state to be scheduled, the application amount of resource A is less than B or the priority of resource A is higher than that of job B. Based on the default scheduling policy, A will schedule ahead of B. In the worst case, if subsequent jobs with high priority or less application resources are added to the queue to be scheduled, B will be hungry for a long time and wait forever. | ||
|
||
- In the case of insufficient cluster resources, assume that there are jobs A and B to be scheduled. The priority of A is lower than that of B, but the resource application amount is smaller than that of B. Under the scheduling policy based on cluster throughput and resource utilization as the core, A will be scheduled first. In the worst case, B will remain hungry. | ||
|
||
|
||
Therefore, we need a fair scheduling mechanism that ensures that chronic hunger for some reason reaches a critical state when it is dispatched. Job reservation is such a fair scheduling mechanism. | ||
|
||
Resource reservation mechanisms need to consider node selection, number of nodes, and how to lock nodes. Volcano resource reservation mechanism reserves resources for target operations in the way of node group locking, that is, select a group of nodes that meet certain constraints and include them into the node group. Nodes within the node group will not accept new job delivery from the inclusion moment, and the total specification of nodes meets the requirements of target operations. It is important to note that target jobs can be scheduled throughout the cluster, while non-target jobs can only be scheduled with nodes outside the node group. | ||
|
||
### Backfill | ||
|
||
#### Overview | ||
|
||
Backfill action is a backfill step in the scheduling process. It deals with the pod scheduling that does not specify the resource application amount in the list of pod to be scheduled. When executing the scheduling action on a single pod, it traverse all nodes and schedule the pod to this node as long as the node meets the scheduling request of pod. | ||
|
||
#### Scenario | ||
|
||
In a cluster, the main resources are occupied by "fat jobs", such as AI model training. Backfill actions allow the cluster to quickly schedule "small jobs" such as single AI model identification and small data volume communication. Backfill can improve cluster throughput and resource utilization. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
+++ | ||
title = "Architecture" | ||
|
||
date = 2024-09-29 | ||
lastmod = 2024-09-29 | ||
|
||
draft = false # Is this a draft? true/false | ||
toc = true # Show table of contents? true/false | ||
type = "docs" # Do not modify. | ||
|
||
# Add menu entry to sidebar. | ||
linktitle = "Architecture" | ||
[menu.v1-10-0] | ||
parent = "home" | ||
weight = 2 | ||
+++ | ||
|
||
## Overall Architecture | ||
|
||
|
||
{{<figure library="1" src="arch_1.png" title="Application scenarios of Volcano">}} | ||
|
||
|
||
Volcano is designed for high-performance workloads running on Kubernetes. It follows the design and mechanisms of Kubernetes. | ||
|
||
|
||
{{<figure library="1" src="arch_2.PNG" title="Volcano architecture">}} | ||
|
||
|
||
Volcano consists of **scheduler** / **controllermanager** / **admission** / **vcctl**: | ||
|
||
##### Scheduler | ||
Volcano Scheduler schedules jobs to the most suitable node based on actions and plug-ins. Volcano supplements Kubernetes to support multiple scheduling algorithms for jobs. | ||
|
||
##### ControllerManager (CM) | ||
Volcano CMs manage the lifecycle of Custom Resource Definitions (CRDs). You can use the **Queue CM**, **PodGroup CM**, and **VCJob CM**. | ||
|
||
##### Admission | ||
Volcano Admission is responsible for the CRD API validation. | ||
|
||
##### vcctl | ||
Volcano vcctl is the command line client for Volcano. |
Oops, something went wrong.