Skip to content

Knox, HA, HAProxy

stanislawbartkowski edited this page Nov 14, 2020 · 6 revisions

Introduction

Apache Knox Gateway link does not come with HA solution ready to use. It is possible to install more than one Knox server. But the client software should specify the hostname of the Knox server is going to connect to and in case of the failure, the client should be manually switched to another Knox server. But there is an easy way to overcome this problem.

Solution

The solution is HAProxy. Install more than one Knox servers and between the client software and the Hadoop cluster position HAProxy server. HAProxy not only resolves automatic failover issue but also provides the load balancer service allowing spreading requests across multiple Knox Getaways in case of heavy traffic.

Docker HAProxy

The easiest way is to use Docker HAProxy.

Prepare Dockerfile

FROM haproxy:1.7
COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg

Prepare haproxy.cfg

The Knox service is enabled for HTTPS protocol, so the HAProxy should be configured as SSL Pass-Through.

global
        daemon
        maxconn 1024
        pidfile /var/run/haproxy.pid

defaults
        balance roundrobin
        timeout client 60s
        timeout connect 60s
        timeout server 60s

#---------------------------------------------------------------------
# round robin balancing between Knox Gateways
#---------------------------------------------------------------------

frontend knox_ha
    bind *:8443
    mode tcp
    default_backend knox_ha

backend knox_ha
    balance roundrobin
    mode tcp
    server <hostname 1> <Knox host IP 1>:8443  check
    server <hostname 2> <Knox host IP 2>:8443  check

Build Docker image

docker build -t my-haproxy .

Run Docker container

docker run -p 8443:8443 -d --name my-haproxy my-haproxy

Reach Knox Gateway through HAProxy

curl -i -k -X GET "https://<docker host>:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS"


HAProxy is at your service.

Test

Test environment

Test environment consists of two Knox Gateways: mdp1.sb.cxom and mdp2.sb.com. The simplest service to test is KNOX-WebHDFS.

Test Knox Gateways directly.

curl -i -k -u user1:secret -X GET "https://mdp1.sb.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS"
curl -i -k -u user1:secret -X GET "https://mdp2.sb.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS"

Create haproxy.cfg configuration file

global
        daemon
        maxconn 1024
        pidfile /var/run/haproxy.pid

defaults
        balance roundrobin
        timeout client 60s
        timeout connect 60s
        timeout server 60s

#---------------------------------------------------------------------
# round robin balancing between Knox Gateways
#---------------------------------------------------------------------

frontend knox_ha
    bind *:8443
    mode tcp
    default_backend knox_ha

backend knox_ha
    balance roundrobin
    mode tcp
    server mdp1.sb.com 192.168.122.39:8443  check
    server mdp2.sb.com 192.168.122.42:8443  check

Create Docker image and container

docker build -t my-haproxy .
docker run -p 8443:8443 -d --name my-haproxy my-haproxy

Access Knox Gateway through HAProxy

curl -i -k -u user1:secret -X GET "https://localhost:8443/gateway/default/webhdfs/v1/user?op=LISTSTATUS"

Identify the Knox node where HAProxy redirected the request

Logon to mdp1.sb.com host and check /var/log/knox/gateway.log. Discover the log entry pointing that request was served here. If the entry cannot be detected, scan the log on mdp2.sb.com.

2019-10-18 23:15:57,293 INFO  knox.gateway (KnoxLdapRealm.java:getUserDn(725)) - Computed userDn: cn=user1,ou=users,dc=centos7,dc=com using ldapSearch for principal: user1
2019-10-18 23:15:57,553 INFO  knox.gateway (KnoxLdapRealm.java:getUserDn(725)) - Computed userDn: cn=user1,ou=users,dc=centos7,dc=com using ldapSearch for principal: user1
2019-10-18 23:15:57,563 INFO  knox.gateway (KnoxLdapRealm.java:rolesFor(328)) - Computed roles/groups: [ldapusers] for principal: user1
2019-10-18 23:15:57,976 INFO  knox.gateway (AclsAuthorizationFilter.java:init(73)) - Initializing AclsAuthz Provider for: WEBHDFS
2019-10-18 23:15:57,977 INFO  knox.gateway (AclsAuthorizationFilter.java:doFilter(104)) - Access Granted: true

Simulate Knox Gateway failure

Assume that the requests are currently serviced by mdp1.sb.com. Stop the service or simply kill the process. Use Knox KateWay again.

curl -i -k -u user1:secret -X GET "https://localhost:8443/gateway/default/webhdfs/v1/user?op=LISTSTATUS"

The command should bring back the result again as though nothing has happened. Logon the mdp2.sb.com and examine the gateway.log file again to be sure that request was redirected there.

Congratulations, automatic failover is active.