Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8338] Ganglia fails to run because of httpd.conf configuration mismatch #121

Closed
wants to merge 7 commits into from

Conversation

smartkiwi
Copy link

As described here AMI has preinstalled ganglia but it's version is too old, spark-ec2 has configuration for newer one.

This PR uninstall old ganglia and related packages and install's new one.

Tested - now it works.

for node in $SLAVES $OTHER_MASTERS; do
ssh -t -t $SSH_OPTS root@$node "if ! rpm --quiet -q $GANGLIA_PACKAGES; then yum install -q -y $GANGLIA_PACKAGES; fi" & sleep 0.3
#Uninstalls older version of ganglia from other masters if it was reinstalled in AMI
ssh -t -t $SSH_OPTS root@$node "yum remove -q -y $OLD_GANGLIA_PACKAGES" & sleep 0.3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out today and I ran into error messages of the form Existing lock /var/run/yum.pid: another copy is running as pid 2758.

I think the problem is that while the yum remove is running we are also starting the yum install. I think a good thing to do here would be to not put the ssh in background in the first line. Also we can probably use pssh to parallelize these steps rather than have a loop. @nchammas did some work on installing pssh on the driver node if I am not wrong

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also another minor comment is that i see warning messages of the form

No Match for argument: httpd*
No Match for argument: php*
No Match for argument: ganglia
No Match for argument: ganglia
No Match for argument: ganglia-gmond*
No Match for argument: ganglia-gmetad*
No Match for argument: httpd*
No Match for argument: php*
No Match for argument: ganglia*
No Match for argument: ganglia*
No Match for argument: ganglia-gmond*
No Match for argument: ganglia-gmetad*

Is this expected ?

@shivaram
Copy link

shivaram commented Jul 6, 2015

@smartkiwi Apologies for the long delay in getting back on this. I had some inline comments.

@shivaram
Copy link

shivaram commented Aug 4, 2015

@smartkiwi We are in the process of moving development of spark-ec2 to a new repository amplab/spark-ec2. I'd like to either merge this if you have chance to address the comments or I could move the PR to the new repo by opening it there. Any thoughts ?

@smartkiwi
Copy link
Author

Hi. When are you going to move the repo?
I try to take a look tomorrow and address the comments so it could be merged.

@shivaram
Copy link

shivaram commented Aug 4, 2015

Cool - that should be fine. We have a PR open to change the repo for Spark 1.5 at apache/spark#7899 -- this PR will still be used for 1.4 and older versions and we can port this one over to the new repo as well.

@smartkiwi
Copy link
Author

I was looking into this change today.
Still see yum complaining about pid file present on both master and slaves.

I see that AMI already have ganglia 3.3 and httpd 2.2 preinstalled and functional.

I've searched commit history - looks like the root case of the issue is updated httpd configuration / package version.

What was the reason for updating httpd? Was it something security related?

I feel now that upgrading ganglia manual is a hack. The right way would be to build new AMI.

@smartkiwi
Copy link
Author

So I propose to use this version (yum still might show warning that is waits for pid file to disappear) and open another PR to build new AMI with ganglia installed.
I suspect that create_image.sh should work without changes - as ganglia would install required dependencies.

@shivaram
Copy link

shivaram commented Aug 5, 2015

Thanks @smartkiwi, Building a new AMI would be good, but that is pretty cumbersome, so I am not sure when that will happen. My comment earlier was to just use pssh or wait for the earlier yum command to finish to avoid the waiting for lock error message. Is there any reason why those wont fix the warnings ?

@smartkiwi
Copy link
Author

@shivaram BTW current AMI includes both ganglia and httpd preinstalled - how about keep using them, and do not try to install newer version? This would mean rolling back few configuration files to approx early 2014 versions.
Do you remember what was the reason do upgrade/change configuration that time?

@shivaram
Copy link

shivaram commented Aug 5, 2015

My guess is that the EC2 launch process auto-updates some packages based on security requirements. I'm not sure about it though

@smartkiwi
Copy link
Author

As for waiting for ealier yum command (yum remove) to complete - I've tried that. I saw that once first yum command finishes (yum remove) - there is some other yum process running. And that's what is blocking next command the script runs (yum install).

Here are some debug messages from slave (I saw similar on master too):

++ for node in '$SLAVES' '$OTHER_MASTERS'
++ ssh -t -t -o StrictHostKeyChecking=no -o ConnectTimeout=5 root@ec2-54-159-7-111.compute-1.amazonaws.com 'yum remove -q -y httpd* php* ganglia* ganglia* ganglia-gmond* ganglia-gmetad*  2>&1 | grep -v '\''No Match for argument:'\''; ps -aef | grep yum; sleep 1 ; yum install -q -y httpd24-2.4* php56-5.6* ganglia-3.6* ganglia-web-3.5* ganglia-gmond-3.6* ganglia-gmetad-3.6*'
root      2216  2214  0 21:38 pts/0    00:00:00 bash -c yum remove -q -y httpd* php* ganglia* ganglia* ganglia-gmond* ganglia-gmetad*  2>&1 | grep -v 'No Match for argument:'; ps -aef | grep yum; sleep 1 ; yum install -q -y httpd24-2.4* php56-5.6* ganglia-3.6* ganglia-web-3.5* ganglia-gmond-3.6* ganglia-gmetad-3.6*
root      2287  2286  9 21:38 ?        00:00:00 /usr/bin/python2.7 /usr/bin/yum --security check-update
root      2290  2216  0 21:38 pts/0    00:00:00 grep yum
Connection to ec2-....compute-1.amazonaws.com closed.

As you see there is another yum process started - "/usr/bin/yum --security check-update". I suspect that it is triggered after yum remove command exists.

So fixing yum wait for lock message is rather tricky. I would leave it for now.

As for pssh - it is doable. I suggest to wrap up this PR and I'll submit another one that adds pssh later this week.

@smartkiwi
Copy link
Author

My guess is that the EC2 launch process auto-updates some packages based on security requirements. I'm not sure about it though.

This sounds unlikely to me - the update is triggered by ganglia/init.sh

I've logged into master instance before ganglia.sh run - and see that it is already there.
Let me try to start cluster without even trying to install ganglia with yum. I suspect that the code in ganglia/init.sh was used before ganglia was preinstalled into AMI.

Before ganglia/init.sh run:

ganglia.x86_64                                                          3.3.7-5.2.amzn1                                  @amzn-main
ganglia-gmetad.x86_64                                                   3.3.7-5.2.amzn1                                  @amzn-main
ganglia-gmond.x86_64                                                    3.3.7-5.2.amzn1                                  @amzn-main
ganglia-web.x86_64                                                      3.3.7-5.2.amzn1                                  @amzn-main
httpd.x86_64                                                            2.2.29-1.5.amzn1                                 @amzn-main
httpd-tools.x86_64                                                      2.2.29-1.5.amzn1                                 @amzn-main
php.x86_64                                                              5.3.28-1.2.amzn1                                 @amzn-updates
php-ZendFramework.noarch                                                1.12.13-1.11.amzn1                               @amzn-updates
php-bcmath.x86_64                                                       5.3.28-1.2.amzn1                                 @amzn-updates
php-cli.x86_64                                                          5.3.28-1.2.amzn1                                 @amzn-updates
php-common.x86_64                                                       5.3.28-1.2.amzn1                                 @amzn-updates
php-gd.x86_64                                                           5.3.28-1.2.amzn1                                 @amzn-updates
php-process.x86_64                                                      5.3.28-1.2.amzn1                                 @amzn-updates
php-xml.x86_64                                                          5.3.28-1.2.amzn1                                 @amzn-update

after ganglia/init.sh (from my branch) run:

Installed Packages
ganglia.x86_64                                                          3.6.0-3.11.amzn1                                 @amzn-main
ganglia-gmetad.x86_64                                                   3.6.0-3.11.amzn1                                 @amzn-main
ganglia-gmond.x86_64                                                    3.6.0-3.11.amzn1                                 @amzn-main
ganglia-web.x86_64                                                      3.5.10-3.11.amzn1                                @amzn-main
httpd24.x86_64                                                          2.4.12-1.60.amzn1                                @amzn-main
httpd24-tools.x86_64                                                    2.4.12-1.60.amzn1                                @amzn-main
php-ZendFramework.noarch                                                1.12.13-1.11.amzn1                               @amzn-updates
php56.x86_64                                                            5.6.10-1.115.amzn1                               @amzn-updates
php56-bcmath.x86_64                                                     5.6.10-1.115.amzn1                               @amzn-updates
php56-cli.x86_64                                                        5.6.10-1.115.amzn1                               @amzn-updates
php56-common.x86_64                                                     5.6.10-1.115.amzn1                               @amzn-updates
php56-gd.x86_64                                                         5.6.10-1.115.amzn1                               @amzn-updates
php56-jsonc.x86_64                                                      1.3.6-1.19.amzn1                                 @amzn-main
php56-process.x86_64                                                    5.6.10-1.115.amzn1                               @amzn-updates
php56-xml.x86_64                                                        5.6.10-1.115.amzn1                               @amzn-update

@nchammas
Copy link

nchammas commented Aug 5, 2015

Building a new AMI would be good, but that is pretty cumbersome, so I am not sure when that will happen.

I automated the process of making new spark-ec2 AMIs, but I never got around to pushing for it to be adopted upstream.

After the repo migration is complete, perhaps we can revisit that issue if there is still interest in my work.

@shivaram
Copy link

shivaram commented Aug 5, 2015

Yeah we should definitely revisit that and create some new AMIs. The trouble is that somebody needs to own the Amazon EC2 account that has the AMIs and I don't know of any way to share that credential across committers to spark-ec2. I wonder if we can do this using IAM etc. but that is a separate thread of discussion

@smartkiwi
Copy link
Author

The following pull request contains working solution
#133

@smartkiwi smartkiwi closed this Aug 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants