Skip to content
lucyge2022 edited this page Aug 2, 2023 · 17 revisions

Membership Module

Membership module is introduced as of 08/01/2023 targeting to replace worker-registration on master. Membership module provides capability to either

  1. use a static file to provide a pre-set of worker list for a alluxio cluster
  2. use etcd cluster as a distributed system membership coordinator

Code structure

MembershipManager is the module interface for different implementation of membership management. There are currently 3 implementations:

  1. NOOP - NoOpMembershipManager: fallback to the old way of using master for worker registration is still leveraged for regression/testing purpose.
  2. STATIC - StaticMembershipManager: uses a static config file(default file is $ALLUXIO_HOME/conf/workers) to configure a list of workers hostnames to form the alluxio cluster, it doesn't provide membership capability as to track any new member joining / leaving, member liveliness. It's merely used as a simple quickstart deployment way to spin up a DORA alluxio cluster.
  3. ETCD - EtcdMembershipManager: uses a pre-configured standalone etcd cluster to manage worker membership. On first startup, worker will register itself to etcd, and then keeping its liveness to etcd throughout its process lifetime. Through EtcdMembershipManager module, either client or worker could get informations about:

a. What are the currently registered workers?

b. What are the currently alive workers?

Deployment

- NOOP

No need to configure anything, it will not leverage any MembershipManager module at all.

- STATIC

Use a static file, following the format of conf/workers (refer to : https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-Cluster.html?q=conf%2Fworkers#basic-setup) , put hostnames of ALL workers on each new line. And configure the alluxio-site.properties with:

alluxio.worker.membership.manager.type=STATIC
alluxio.worker.static.config.file=<absolute_path_to_static_config_workerlist_file>

or just

alluxio.worker.membership.manager.type=STATIC

then conf/workers will be used. e.g. configure an alluxio cluster with 2 workers, conf/workers:

# List of Worker started on each of the machines listed below.
ec2-1-111-11-111.compute-1.amazonaws.com
ec2-2-222-22-222.compute-2.amazonaws.com           

- ETCD

Depending on the deployment environment, Bare Metal or K8s, users could setup etcd cluster and alluxio cluster individually, or through helm install with alluxio's k8s operator for a one-click install for both.

1) Bare Metal

Set up etcd cluster, refer to etcd doc here: https://etcd.io/docs/v3.4/op-guide/clustering/

e.g. Say we have an etcd 3 node setup:

Name Address Hostname
infra0 10.0.1.10 infra0.example.com
infra1 10.0.1.11 infra1.example.com
infra2 10.0.1.12 infra2.example.com

Configure alluxio-site.properties:

alluxio.worker.membership.manager.type=ETCD
alluxio.etcd.endpoints=http://infra0.example.com:2379,http://infra1.example.com:2379,http://infra2.example.com:2379

[NOTICE] As etcdmembership module relies on etcd's high availability to provide membership service, include ALL the etcd cluster nodes in configuration (or at lease all initial ones if new nodes has been bootstrapped into etcd later) to allow etcdmembership module to redirect connection to etcd leader automatically.

After spin up alluxio workers, use bin/alluxio fsadmin report nodestatus to check status of worker registration.

2) K8s

Use k8s operator, we can spin up a DORA alluxio cluster along with etcd cluster pod(s) with helm. (Prerequisite refer to https://docs.google.com/document/d/1iiDZDNBTJWQ1WAJ-31aKDo9pL1DeTrvrvYUdd-YrTpI/edit#heading=h.1rc792noj716)

To pull etcd dependency for helm repo, do

helm dependency update 

To configure alluxio with a single pod etcd cluster: enable etcd component in k8s-operator/deploy/charts/alluxio/config.yaml

image: <docker_username>/<image-name>
imageTag: <tag>
dataset:
  path: <ufs path>
  credentials: # s3 as example. Leave it empty if not needed.
    aws.accessKeyId:xxxxxxxxxx
    aws.secretKey: xxxxxxxxxxxxxxx
etcd:
  enabled: true

then under k8s-operator/deploy/charts/alluxio/ do:

$helm install <cluster name> -f config.yaml .

then with $kubectl get pods will give:

[root@ip-172-31-24-66 alluxio]# kubectl get pods                                          
NAME                                    READY   STATUS     RESTARTS   AGE
dora0802-alluxio-master-0               0/1     Running    0          3s
dora0802-alluxio-worker-6577bc9-s6njq   0/1     Running    0          3s
dora0802-etcd-0                         0/1     Running    0          3s
  • To spin up 3-node etcd cluster

Simply add replicaCount field to indicate number of etcd instances:

etcd:
  enabled: true
  replicaCount: 3

will now have a 3-pod etcd cluster:

NAME                                        READY   STATUS    RESTARTS   AGE
dora0802-1-alluxio-master-0                 1/1     Running   0          111m
dora0802-1-alluxio-worker-5fc8bd885-jk6pn   1/1     Running   0          111m
dora0802-1-etcd-0                           1/1     Running   0          111m
dora0802-1-etcd-1                           1/1     Running   0          111m
dora0802-1-etcd-2                           1/1     Running   0          111m

If you would like to use etcdctl in k8s env, spin up a etcdclient via: $kubectl run lucyetcd-client --restart='Never' --image docker.io/bitnami/etcd:3.5.9-debian-11-r24 --env ETCDCTL_ENDPOINTS="dora0802-1-etcd:2379" --namespace default --command -- sleep infinity

Clone this wiki locally