biomed-shifts:nagios

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
biomed-shifts:nagios [2016/02/01 14:30]
fmichel created
biomed-shifts:nagios [2016/02/01 17:21]
fmichel [Information for biomed shifters]
Line 1: Line 1:
-====== ​ Information for biomed shifters  ​======+======  biomed VO Nagios ====== 
 + 
 +=====  Information for biomed shifters ​ =====
  
 The biomed [[https://​grid16.lal.in2p3.fr/​nagios|Nagios box]] is hosted and maintained by site GRIF from the French NGI. The biomed [[https://​grid16.lal.in2p3.fr/​nagios|Nagios box]] is hosted and maintained by site GRIF from the French NGI.
Line 17: Line 19:
  
 The figure below depicts important graphical elements in Nagios referring to downtimes and comments: The figure below depicts important graphical elements in Nagios referring to downtimes and comments:
 +{{:​biomed-shifts:​nagios_comment-24224655.png?​direct|}}
  
-[[file:​nagios_comment.png]] +=====  Information for administrators ​ =====
-======  Information for administrators  ​======+
  
 ====  Paths and configuration ​ ==== ====  Paths and configuration ​ ====
  
-<​u>​Topology</​u>> ​a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/​home/​fmichel/​vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours.+__Topology__: ​a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/​home/​fmichel/​vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours.
  
 The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored.
Line 36: Line 38:
   *  Actual code of probes: /​usr/​lib/​python2.4/​site-packages/​gridmetrics   *  Actual code of probes: /​usr/​lib/​python2.4/​site-packages/​gridmetrics
  
-**Soft/Hard states vs. //max_check_attempts**//: http:<​nowiki>/​/</nowiki>nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html+**Soft/Hard states vs. max_check_attempts**:​ http://​nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html
   *  normal_check_interval ​          60   *  normal_check_interval ​          60
   *  retry_check_interval ​           15   *  retry_check_interval ​           15
   *  max_check_attempts ​             4   *  max_check_attempts ​             4
-====== > each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification,​ except for passive checks (passive_host_checks_are_soft=0).+=> each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification,​ except for passive checks (passive_host_checks_are_soft=0).
  
 **Passive checks**: they are initiated and performed by external applications/​processes. **Passive checks**: they are initiated and performed by external applications/​processes.
Line 51: Line 53:
 When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps:
  
-1. Copy the pem files to the gate machine grid11.lal.in2p3.fr:​ +1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: ​\\ 
- +<​code>​ 
-''​ +eval `ssh-agent` 
-eval `ssh-agent`<br> +ssh-add 
-ssh-add<br> +gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:​ 
-gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:<​br> +</code>
-''​+
  
 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4:
- +<​code>​ 
-''​ +gsissh -AX -p 2222 grid11.lal.in2p3.fr 
-gsissh -AX -p 2222 grid11.lal.in2p3.fr<br> +scp user*pem fmichel@grid04.lal.in2p3.fr:/​.globus 
-scp user*pem fmichel@grid04.lal.in2p3.fr:/​.globus<​br> +</code>
-''​+
  
 3. Then test the new pem files: 3. Then test the new pem files:
- +<​code>​
-''​+
 ssh fmichel@grid04.lal.in2p3.fr<​br>​ ssh fmichel@grid04.lal.in2p3.fr<​br>​
 voms-proxy-init --voms biomed voms-proxy-init --voms biomed
-''​+</​code>​ 
 ====  Proxy certificate renewal ​ ==== ====  Proxy certificate renewal ​ ====
  
 Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr
   *  Create a valid proxy certificate:​   *  Create a valid proxy certificate:​
-''​$ voms-proxy-init --voms biomed''​+     <​code>$ voms-proxy-init --voms biomed</​code>​
   *  Renew the proxy   *  Renew the proxy
-''​$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/​O=GRID-FR/​C=FR/​O=CNRS/​OU=LAL/​CN=grid04.lal.in2p3.fr"''​+<code> 
 +$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/​O=GRID-FR/​C=FR/​O=CNRS/​OU=LAL/​CN=grid04.lal.in2p3.fr"''​ 
 +</​code>​
  
   *  Check the proxy:   *  Check the proxy:
-''​$ myproxy-info -l nagios -s myproxy.grif.fr''​+<code>$ myproxy-info -l nagios -s myproxy.grif.fr''​</​code>​
  
   *  Test the proxy retrieval probe:   *  Test the proxy retrieval probe:
-> Ssh to the Nagios server: grid04.lal.in2p3.fr +<code>Ssh to the Nagios server: grid04.lal.in2p3.fr 
-> ''​$ sudo su - nagios''​ +$ sudo su - nagios 
-> ''​$ /​usr/​libexec/​grid-monitoring/​probes/​hr.srce/​refresh_proxy ​ --myproxyuser nagios --cert /​etc/​nagios/​globus/​hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /​etc/​nagios/​globus/​hostkey.pem -x /​etc/​nagios/​globus/​userproxy.pem-biomed''​ +$ /​usr/​libexec/​grid-monitoring/​probes/​hr.srce/​refresh_proxy ​ --myproxyuser nagios --cert /​etc/​nagios/​globus/​hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /​etc/​nagios/​globus/​hostkey.pem -x /​etc/​nagios/​globus/​userproxy.pem-biomed 
 +</​code>​
  
  • biomed-shifts/nagios.txt
  • Last modified: 2017/08/28 11:41
  • by fmichel