biomed-shifts:nagios

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
biomed-shifts:nagios [2016/02/01 14:36]
fmichel
biomed-shifts:nagios [2016/02/01 17:22]
fmichel [Information for biomed shifters]
Line 19: Line 19:
  
 The figure below depicts important graphical elements in Nagios referring to downtimes and comments: The figure below depicts important graphical elements in Nagios referring to downtimes and comments:
 +{{:​biomed-shifts:​nagios_comment-24224655.png?​direct&​550|}}
 +
  
-{{:​biomed-shifts:​nagios_comment-24224655.png?​600|}} 
 =====  Information for administrators ​ ===== =====  Information for administrators ​ =====
  
 ====  Paths and configuration ​ ==== ====  Paths and configuration ​ ====
  
-<​u>​Topology</​u>> ​a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/​home/​fmichel/​vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours.+__Topology__: ​a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/​home/​fmichel/​vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours.
  
 The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored.
Line 38: Line 39:
   *  Actual code of probes: /​usr/​lib/​python2.4/​site-packages/​gridmetrics   *  Actual code of probes: /​usr/​lib/​python2.4/​site-packages/​gridmetrics
  
-**Soft/Hard states vs. //max_check_attempts**//: http:<​nowiki>/​/</nowiki>nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html+**Soft/Hard states vs. max_check_attempts**:​ http://​nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html
   *  normal_check_interval ​          60   *  normal_check_interval ​          60
   *  retry_check_interval ​           15   *  retry_check_interval ​           15
   *  max_check_attempts ​             4   *  max_check_attempts ​             4
-====== > each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification,​ except for passive checks (passive_host_checks_are_soft=0).+=> each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification,​ except for passive checks (passive_host_checks_are_soft=0).
  
 **Passive checks**: they are initiated and performed by external applications/​processes. **Passive checks**: they are initiated and performed by external applications/​processes.
Line 53: Line 54:
 When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps:
  
-1. Copy the pem files to the gate machine grid11.lal.in2p3.fr:​ +1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: ​\\ 
- +<​code>​ 
-''​ +eval `ssh-agent` 
-eval `ssh-agent`<br> +ssh-add 
-ssh-add<br> +gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:​ 
-gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:<​br> +</code>
-''​+
  
 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4:
- +<​code>​ 
-''​ +gsissh -AX -p 2222 grid11.lal.in2p3.fr 
-gsissh -AX -p 2222 grid11.lal.in2p3.fr<br> +scp user*pem fmichel@grid04.lal.in2p3.fr:/​.globus 
-scp user*pem fmichel@grid04.lal.in2p3.fr:/​.globus<​br> +</code>
-''​+
  
 3. Then test the new pem files: 3. Then test the new pem files:
- +<​code>​
-''​+
 ssh fmichel@grid04.lal.in2p3.fr<​br>​ ssh fmichel@grid04.lal.in2p3.fr<​br>​
 voms-proxy-init --voms biomed voms-proxy-init --voms biomed
-''​+</​code>​ 
 ====  Proxy certificate renewal ​ ==== ====  Proxy certificate renewal ​ ====
  
 Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr
   *  Create a valid proxy certificate:​   *  Create a valid proxy certificate:​
-''​$ voms-proxy-init --voms biomed''​+     <​code>$ voms-proxy-init --voms biomed</​code>​
   *  Renew the proxy   *  Renew the proxy
-''​$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/​O=GRID-FR/​C=FR/​O=CNRS/​OU=LAL/​CN=grid04.lal.in2p3.fr"''​+<code> 
 +$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/​O=GRID-FR/​C=FR/​O=CNRS/​OU=LAL/​CN=grid04.lal.in2p3.fr"''​ 
 +</​code>​
  
   *  Check the proxy:   *  Check the proxy:
-''​$ myproxy-info -l nagios -s myproxy.grif.fr''​+<code>$ myproxy-info -l nagios -s myproxy.grif.fr''​</​code>​
  
   *  Test the proxy retrieval probe:   *  Test the proxy retrieval probe:
-> Ssh to the Nagios server: grid04.lal.in2p3.fr +<code>Ssh to the Nagios server: grid04.lal.in2p3.fr 
-> ''​$ sudo su - nagios''​ +$ sudo su - nagios 
-> ''​$ /​usr/​libexec/​grid-monitoring/​probes/​hr.srce/​refresh_proxy ​ --myproxyuser nagios --cert /​etc/​nagios/​globus/​hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /​etc/​nagios/​globus/​hostkey.pem -x /​etc/​nagios/​globus/​userproxy.pem-biomed''​ +$ /​usr/​libexec/​grid-monitoring/​probes/​hr.srce/​refresh_proxy ​ --myproxyuser nagios --cert /​etc/​nagios/​globus/​hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /​etc/​nagios/​globus/​hostkey.pem -x /​etc/​nagios/​globus/​userproxy.pem-biomed 
 +</​code>​
  
  • biomed-shifts/nagios.txt
  • Last modified: 2017/08/28 11:41
  • by fmichel