Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
biomed-shifts:nagios [2016/02/01 14:37] fmichel [Paths and configuration] |
biomed-shifts:nagios [2016/02/01 14:44] fmichel [Paths and configuration] |
||
---|---|---|---|
Line 25: | Line 25: | ||
==== Paths and configuration ==== | ==== Paths and configuration ==== | ||
- | __Topology__ a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/home/fmichel/vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours. | + | __Topology__: a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/home/fmichel/vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours. |
The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. | The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. | ||
Line 38: | Line 38: | ||
* Actual code of probes: /usr/lib/python2.4/site-packages/gridmetrics | * Actual code of probes: /usr/lib/python2.4/site-packages/gridmetrics | ||
- | **Soft/Hard states vs. //max_check_attempts**//: http:<nowiki>//</nowiki>nagios.sourceforge.net/docs/nagioscore/3/en/statetypes.html | + | **Soft/Hard states vs. max_check_attempts**: http://nagios.sourceforge.net/docs/nagioscore/3/en/statetypes.html |
* normal_check_interval 60 | * normal_check_interval 60 | ||
* retry_check_interval 15 | * retry_check_interval 15 | ||
* max_check_attempts 4 | * max_check_attempts 4 | ||
- | ====== > each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification, except for passive checks (passive_host_checks_are_soft=0). | + | => each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification, except for passive checks (passive_host_checks_are_soft=0). |
**Passive checks**: they are initiated and performed by external applications/processes. | **Passive checks**: they are initiated and performed by external applications/processes. | ||
Line 53: | Line 53: | ||
When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: | When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: | ||
- | 1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: | + | 1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: \\ |
- | + | <code> | |
- | '' | + | eval `ssh-agent` |
- | eval `ssh-agent`<br> | + | ssh-add |
- | ssh-add<br> | + | gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr: |
- | gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:<br> | + | </code> |
- | '' | + | |
2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: | 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: | ||
- | + | <code> | |
- | '' | + | gsissh -AX -p 2222 grid11.lal.in2p3.fr |
- | gsissh -AX -p 2222 grid11.lal.in2p3.fr<br> | + | scp user*pem fmichel@grid04.lal.in2p3.fr:/.globus |
- | scp user*pem fmichel@grid04.lal.in2p3.fr:/.globus<br> | + | </code> |
- | '' | + | |
3. Then test the new pem files: | 3. Then test the new pem files: | ||
- | + | <code> | |
- | '' | + | |
ssh fmichel@grid04.lal.in2p3.fr<br> | ssh fmichel@grid04.lal.in2p3.fr<br> | ||
voms-proxy-init --voms biomed | voms-proxy-init --voms biomed | ||
- | '' | + | </code> |
==== Proxy certificate renewal ==== | ==== Proxy certificate renewal ==== | ||
Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr | Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr | ||
* Create a valid proxy certificate: | * Create a valid proxy certificate: | ||
- | > ''$ voms-proxy-init --voms biomed'' | + | ''$ voms-proxy-init --voms biomed'' |
* Renew the proxy | * Renew the proxy | ||
- | > ''$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid04.lal.in2p3.fr"'' | + | <code> |
+ | $ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=grid04.lal.in2p3.fr"'' | ||
+ | </code> | ||
* Check the proxy: | * Check the proxy: | ||
- | > ''$ myproxy-info -l nagios -s myproxy.grif.fr'' | + | <code>$ myproxy-info -l nagios -s myproxy.grif.fr''</code> |
* Test the proxy retrieval probe: | * Test the proxy retrieval probe: | ||
- | > Ssh to the Nagios server: grid04.lal.in2p3.fr | + | <code>Ssh to the Nagios server: grid04.lal.in2p3.fr |
- | > ''$ sudo su - nagios'' | + | $ sudo su - nagios |
- | > ''$ /usr/libexec/grid-monitoring/probes/hr.srce/refresh_proxy --myproxyuser nagios --cert /etc/nagios/globus/hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /etc/nagios/globus/hostkey.pem -x /etc/nagios/globus/userproxy.pem-biomed'' | + | $ /usr/libexec/grid-monitoring/probes/hr.srce/refresh_proxy --myproxyuser nagios --cert /etc/nagios/globus/hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /etc/nagios/globus/hostkey.pem -x /etc/nagios/globus/userproxy.pem-biomed |
+ | </code> | ||