-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8338] Ganglia fails to run because of httpd.conf configuration mismatch #121
Conversation
for node in $SLAVES $OTHER_MASTERS; do | ||
ssh -t -t $SSH_OPTS root@$node "if ! rpm --quiet -q $GANGLIA_PACKAGES; then yum install -q -y $GANGLIA_PACKAGES; fi" & sleep 0.3 | ||
#Uninstalls older version of ganglia from other masters if it was reinstalled in AMI | ||
ssh -t -t $SSH_OPTS root@$node "yum remove -q -y $OLD_GANGLIA_PACKAGES" & sleep 0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out today and I ran into error messages of the form Existing lock /var/run/yum.pid: another copy is running as pid 2758.
I think the problem is that while the yum remove is running
we are also starting the yum install
. I think a good thing to do here would be to not put the ssh
in background in the first line. Also we can probably use pssh
to parallelize these steps rather than have a loop. @nchammas did some work on installing pssh
on the driver node if I am not wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another minor comment is that i see warning messages of the form
No Match for argument: httpd*
No Match for argument: php*
No Match for argument: ganglia
No Match for argument: ganglia
No Match for argument: ganglia-gmond*
No Match for argument: ganglia-gmetad*
No Match for argument: httpd*
No Match for argument: php*
No Match for argument: ganglia*
No Match for argument: ganglia*
No Match for argument: ganglia-gmond*
No Match for argument: ganglia-gmetad*
Is this expected ?
@smartkiwi Apologies for the long delay in getting back on this. I had some inline comments. |
@smartkiwi We are in the process of moving development of spark-ec2 to a new repository |
Hi. When are you going to move the repo? |
Cool - that should be fine. We have a PR open to change the repo for Spark 1.5 at apache/spark#7899 -- this PR will still be used for 1.4 and older versions and we can port this one over to the new repo as well. |
I was looking into this change today. I see that AMI already have ganglia 3.3 and httpd 2.2 preinstalled and functional. I've searched commit history - looks like the root case of the issue is updated httpd configuration / package version. What was the reason for updating httpd? Was it something security related? I feel now that upgrading ganglia manual is a hack. The right way would be to build new AMI. |
So I propose to use this version (yum still might show warning that is waits for pid file to disappear) and open another PR to build new AMI with ganglia installed. |
Thanks @smartkiwi, Building a new AMI would be good, but that is pretty cumbersome, so I am not sure when that will happen. My comment earlier was to just use |
@shivaram BTW current AMI includes both ganglia and httpd preinstalled - how about keep using them, and do not try to install newer version? This would mean rolling back few configuration files to approx early 2014 versions. |
My guess is that the EC2 launch process auto-updates some packages based on security requirements. I'm not sure about it though |
As for waiting for ealier yum command (yum remove) to complete - I've tried that. I saw that once first yum command finishes (yum remove) - there is some other yum process running. And that's what is blocking next command the script runs (yum install). Here are some debug messages from slave (I saw similar on master too):
As you see there is another yum process started - "/usr/bin/yum --security check-update". I suspect that it is triggered after yum remove command exists. So fixing yum wait for lock message is rather tricky. I would leave it for now. As for pssh - it is doable. I suggest to wrap up this PR and I'll submit another one that adds pssh later this week. |
This sounds unlikely to me - the update is triggered by ganglia/init.sh I've logged into master instance before ganglia.sh run - and see that it is already there. Before ganglia/init.sh run:
after ganglia/init.sh (from my branch) run:
|
I automated the process of making new spark-ec2 AMIs, but I never got around to pushing for it to be adopted upstream. After the repo migration is complete, perhaps we can revisit that issue if there is still interest in my work. |
Yeah we should definitely revisit that and create some new AMIs. The trouble is that somebody needs to own the Amazon EC2 account that has the AMIs and I don't know of any way to share that credential across committers to spark-ec2. I wonder if we can do this using IAM etc. but that is a separate thread of discussion |
The following pull request contains working solution |
As described here AMI has preinstalled ganglia but it's version is too old, spark-ec2 has configuration for newer one.
This PR uninstall old ganglia and related packages and install's new one.
Tested - now it works.