-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Membership Module
Membership module is introduced as of 08/01/2023 targeting to replace worker-registration on master. Membership module provides capability to either
- use a static file to provide a pre-set of worker list for a alluxio cluster
- use etcd cluster as a distributed system membership coordinator
MembershipManager
is the module interface for different implementation of membership management. There are currently 3 implementations:
-
NOOP -
NoOpMembershipManager
: fallback to the old way of using master for worker registration is still leveraged for regression/testing purpose. -
STATIC -
StaticMembershipManager
: uses a static config file(default file is $ALLUXIO_HOME/conf/workers) to configure a list of workers hostnames to form the alluxio cluster, it doesn't provide membership capability as to track any new member joining / leaving, member liveliness. It's merely used as a simple quickstart deployment way to spin up a DORA alluxio cluster. -
ETCD -
EtcdMembershipManager
: uses a pre-configured standalone etcd cluster to manage worker membership. On first startup, worker will register itself to etcd, and then keeping its liveness to etcd throughout its process lifetime. Through EtcdMembershipManager module, either client or worker could get informations about:
a. What are the currently registered workers?
b. What are the currently alive workers?
No need to configure anything, it will not leverage any MembershipManager module at all.
Use a static file, following the format of conf/workers (refer to : https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-Cluster.html?q=conf%2Fworkers#basic-setup) , put hostnames of ALL workers on each new line. And configure the alluxio-site.properties with:
alluxio.worker.membership.manager.type=STATIC
alluxio.worker.static.config.file=<absolute_path_to_static_config_workerlist_file>
or just
alluxio.worker.membership.manager.type=STATIC
then conf/workers will be used. e.g. configure an alluxio cluster with 2 workers, conf/workers:
# List of Worker started on each of the machines listed below.
ec2-1-111-11-111.compute-1.amazonaws.com
ec2-2-222-22-222.compute-2.amazonaws.com
Depending on the deployment environment, Bare Metal or K8s, users could setup etcd cluster and alluxio cluster individually, or through helm install with alluxio's k8s operator for a one-click install for both.
Set up etcd cluster, refer to etcd doc here: https://etcd.io/docs/v3.4/op-guide/clustering/
e.g. Say we have an etcd 3 node setup:
Name | Address | Hostname |
---|---|---|
infra0 | 10.0.1.10 | infra0.example.com |
infra1 | 10.0.1.11 | infra1.example.com |
infra2 | 10.0.1.12 | infra2.example.com |
Configure alluxio-site.properties:
alluxio.worker.membership.manager.type=ETCD
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379
[NOTICE] As etcdmembership module relies on etcd's high availability to provide membership service, include ALL the etcd cluster nodes in configuration (or at lease all initial ones if new nodes has been bootstrapped into etcd later) to allow etcdmembership module to redirect connection to etcd leader automatically.
After spin up alluxio workers, use bin/alluxio fsadmin report nodestatus
to check status of worker registration.
Use k8s operator, we can spin up a DORA alluxio cluster along with etcd cluster pod(s) with helm. (Prerequisite refer to https://docs.google.com/document/d/1iiDZDNBTJWQ1WAJ-31aKDo9pL1DeTrvrvYUdd-YrTpI/edit#heading=h.1rc792noj716)
To pull etcd dependency for helm repo, do
helm dependency update
To configure alluxio with a single pod etcd cluster: enable etcd component in k8s-operator/deploy/charts/alluxio/config.yaml
image: <docker_username>/<image-name>
imageTag: <tag>
dataset:
path: <ufs path>
credentials: # s3 as example. Leave it empty if not needed.
aws.accessKeyId:xxxxxxxxxx
aws.secretKey: xxxxxxxxxxxxxxx
etcd:
enabled: true
then under k8s-operator/deploy/charts/alluxio/
do:
$helm install <cluster name> -f config.yaml .
then with $kubectl get pods
will give:
[root@ip-172-31-24-66 alluxio]# kubectl get pods
NAME READY STATUS RESTARTS AGE
dora0802-alluxio-master-0 0/1 Running 0 3s
dora0802-alluxio-worker-6577bc9-s6njq 0/1 Running 0 3s
dora0802-etcd-0 0/1 Running 0 3s
- To spin up 3-node etcd cluster
Simply add replicaCount
field to indicate number of etcd instances:
etcd:
enabled: true
replicaCount: 3
will now have a 3-pod etcd cluster:
NAME READY STATUS RESTARTS AGE
dora0802-1-alluxio-master-0 1/1 Running 0 111m
dora0802-1-alluxio-worker-5fc8bd885-jk6pn 1/1 Running 0 111m
dora0802-1-etcd-0 1/1 Running 0 111m
dora0802-1-etcd-1 1/1 Running 0 111m
dora0802-1-etcd-2 1/1 Running 0 111m
If you would like to use etcdctl in k8s env, spin up a etcdclient via: $kubectl run lucyetcd-client --restart='Never' --image docker.io/bitnami/etcd:3.5.9-debian-11-r24 --env ETCDCTL_ENDPOINTS="dora0802-1-etcd:2379" --namespace default --command -- sleep infinity