Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| biomed-shifts:nagios [2016/02/01 13:34] – [Information for biomed shifters] fmichel | biomed-shifts:nagios [2017/08/28 09:41] (current) – fmichel | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== | + | ====== biomed |
| - | The biomed | + | <note warning> |
| - | It monitors SEs, CEs and WMSs of all sites that support the VO. It is the major reference for biomed support team members in term of resources monitoring. | + | |
| - | For monitoring results, go to Service Groups -> Summary -> service name (like SERVICE_SRM_V2, | + | ===== Information for biomed shifters |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | * [[https:// | + | |
| - | Clicking on the SE/CE/WMS host name gives information on the scheduled downtimes (host state information section). | + | The biomed [[https:// |
| - | Only critical problems (showing in red) may lead to ticket submission | + | |
| + | For monitoring results, go to Service Groups → Summary → service name (like SERVICE_SRM_V2, | ||
| + | |||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | |||
| + | Clicking on the SE/CE/WMS host name gives information on the scheduled downtimes (host state information section). Only critical problems (showing in red) may lead to ticket submission | ||
| A description of Nagios probes is available from [[https:// | A description of Nagios probes is available from [[https:// | ||
| Line 16: | Line 19: | ||
| A [[https:// | A [[https:// | ||
| - | The figure below depicts important graphical elements in Nagios referring to downtimes and comments: | + | The figure below depicts important graphical elements in Nagios referring to downtimes and comments: |
| - | {{: | + | ===== Information for administrators ===== |
| - | ====== | + | |
| - | ==== Paths and configuration | + | ==== Paths and configuration ==== |
| - | < | + | __Topology__: |
| The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. | The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored. | ||
| Line 29: | Line 31: | ||
| Consequently, | Consequently, | ||
| - | ==== Paths and configuration | + | ==== Paths and configuration ==== |
| - | * Documentation: | + | * Documentation: |
| - | * Configuration: | + | * Configuration: |
| - | * Probes path: / | + | * Probes path: / |
| - | * Actual code of probes: / | + | * Actual code of probes: / |
| - | **Soft/Hard states vs. //max_check_attempts**//: http:< | + | **Soft/Hard states vs. max_check_attempts**: |
| - | * normal_check_interval | + | |
| - | * retry_check_interval | + | |
| - | * max_check_attempts | + | |
| - | ====== > each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed => hard state = notification, | + | |
| - | **Passive checks**: they are initiated and performed by external applications/ | + | * normal_check_interval 60 |
| - | Passive check results are submitted to Nagios for processing. | + | * retry_check_interval 15 |
| + | * max_check_attempts 4 | ||
| + | |||
| + | ⇒ each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed ⇒ hard state = notification, | ||
| + | |||
| + | **Passive checks**: they are initiated and performed by external applications/ | ||
| + | |||
| + | ==== Stop/start Nagios ==== | ||
| - | ==== Stop/start Nagios | ||
| As root, run: service nagios restart | As root, run: service nagios restart | ||
| - | ==== Changing the grid certificate | + | ==== Changing the grid certificate ==== |
| When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: | When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: | ||
| 1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: | 1. Copy the pem files to the gate machine grid11.lal.in2p3.fr: | ||
| - | '' | + | < |
| - | eval `ssh-agent`<br> | + | eval `ssh-agent` |
| - | ssh-add<br> | + | ssh-add |
| - | gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:< | + | gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr: |
| - | '' | + | </code> |
| 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: | 2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: | ||
| - | '' | + | < |
| - | gsissh -AX -p 2222 grid11.lal.in2p3.fr<br> | + | gsissh -AX -p 2222 grid11.lal.in2p3.fr |
| - | scp user*pem fmichel@grid04.lal.in2p3.fr:/ | + | scp user*pem fmichel@grid04.lal.in2p3.fr:/ |
| - | '' | + | </code> |
| 3. Then test the new pem files: | 3. Then test the new pem files: | ||
| - | '' | + | < |
| - | ssh fmichel@grid04.lal.in2p3.fr<br> | + | ssh fmichel@grid04.lal.in2p3.fr |
| voms-proxy-init --voms biomed | voms-proxy-init --voms biomed | ||
| - | '' | + | </ |
| - | ==== Proxy certificate renewal | + | |
| + | ==== Proxy certificate renewal ==== | ||
| Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr | Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr | ||
| - | * Create a valid proxy certificate: | ||
| - | > '' | ||
| - | * Renew the proxy | ||
| - | > '' | ||
| - | * Check the proxy: | + | * Create a valid proxy certificate: |
| - | > '' | + | |
| + | < | ||
| + | $ voms-proxy-init --voms biomed | ||
| + | </ | ||
| + | |||
| + | * Renew the proxy | ||
| + | |||
| + | < | ||
| + | $ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/ | ||
| + | </ | ||
| + | |||
| + | * Check the proxy: | ||
| + | |||
| + | <code> | ||
| + | $ myproxy-info -l nagios -s myproxy.grif.fr | ||
| + | </ | ||
| + | |||
| + | * Test the proxy retrieval probe: | ||
| - | * Test the proxy retrieval probe: | + | < |
| - | > Ssh to the Nagios server: grid04.lal.in2p3.fr | + | Ssh to the Nagios server: grid04.lal.in2p3.fr |
| - | > '' | + | $ sudo su - nagios |
| - | > '' | + | $ / |
| + | </ | ||