biomed-shifts:argo

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
biomed-shifts:argo [2017/08/28 11:18]
fmichel created
biomed-shifts:argo [2017/08/29 09:38]
fmichel
Line 12: Line 12:
 Clicking on the SE/CE host name gives information on the scheduled downtimes (host state information section). <color red>Only critical problems (showing in red) may lead to ticket submission</​color>​. Clicking on the SE/CE host name gives information on the scheduled downtimes (host state information section). <color red>Only critical problems (showing in red) may lead to ticket submission</​color>​.
  
-description of ARGO probes is available from [[https://tomtools.cern.ch/confluence/display/SAMDOC/grid-monitoring-probes-org.sam|the SAM wiki]]. Source code can also be found from the [[https://​svnweb.cern.ch/​trac/​sam/​browser/​trunk/​probes/​src/​gridmetrics|CERN Trac server]] or directly from the [[http://​svn.cern.ch/​guest/​sam/​trunk/​probes/​src/​gridmetrics/​|SVN repository]].+A [[https://argo-mon-biomed.cro-ngi.hr/nagios/cgi-bin/status.cgi?​host=argo-mon-biomed.cro-ngi.hr|specific probe]] checks ​the status of AGRO itself, i.eall critical processes for ARGO to runIt should be checked in case of suspicious behaviour.
  
-A [[https://​grid16.lal.in2p3.fr/​nagios/​cgi-bin/​status.cgi?​host=grid02.lal.in2p3.fr&​style=detail|specific probe]] checks the status of Nagios itself, i.e. all critical processes for Nagios to run. It should be checked in case of suspicious behaviour of Nagios. +The figure below depicts important graphical elements in ARGO referring to downtimes and comments: {{:​biomed-shifts:​nagios_comment-24224655.png?​direct&​450}}
- +
-The figure below depicts important graphical elements in Nagios ​referring to downtimes and comments: {{:​biomed-shifts:​nagios_comment-24224655.png?​direct&​550}}+
  
 ===== Information for administrators ===== ===== Information for administrators =====
  
-==== Paths and configuration ====+The ARGO instance is using the following POEM profile: https://​poem.egi.eu/​poem/​admin/​poem/​profile/​2/​
  
-__Topology__a VO feed is generated every day at 23h50 by script grid04.lal.in2p3.fr:/home/fmichel/​vo-feed-biomed.py. The feed is created from the status of the GRIF top BDII, an EMI BDII with expiration delay set to 24 hours.+**Probes documentation**https://wiki.egi.eu/​wiki/​ROC_SAM_Tests
  
-The VO feed, biomed.xml, is then copied to a web server at 0h00 and used by Nagios to build the list of resources monitored.+The topology is fetched from VAPOR: 
 +  * http://​operations-portal.egi.eu/​vapor/​downloadLavoisier/​option/​xml/​view/​vapor_sites/​param/​vo=biomed 
 +  * http://​operations-portal.egi.eu/​vapor/​downloadLavoisier/​option/​xml/​view/​vapor_endpoints/​param/​vo=biomed
  
-Consequently,​ the list of monitored resources is updated every day, avoiding to monitor decommissioned resources, but also with a delay of at least 24h to monitor resources that are down for just a few hours for instance. +**Soft/Hard states vs. max_check_attempts**:​ [[http://​nagios.sourceforge.net/​docs/​nagioscore/​4/​en/​statetypes.html|http://​nagios.sourceforge.net/​docs/​nagioscore/​4/​en/​statetypes.html]]
- +
-==== Paths and configuration ==== +
- +
-  * Documentation:​ http:<​nowiki>//</​nowiki>​library.nagios.com/​library/​products/​nagioscore/​manuals/​ +
-  * Configuration:​ /​etc/​nagios:​ nagios.cfg, services.cfg,​ wlcg.d/<​site name>/​*.cfg +
-  * Probes path: /​usr/​libexec/​grid-monitoring/​probes/​org.sam/​ +
-  * Actual code of probes: /​usr/​lib/​python2.4/​site-packages/​gridmetrics +
- +
-**Soft/Hard states vs. max_check_attempts**:​ [[http://​nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html|http://​nagios.sourceforge.net/​docs/​nagioscore/​3/​en/​statetypes.html]]+
  
   * normal_check_interval 60   * normal_check_interval 60
Line 44: Line 35:
  
 **Passive checks**: they are initiated and performed by external applications/​processes. Passive check results are submitted to Nagios for processing. **Passive checks**: they are initiated and performed by external applications/​processes. Passive check results are submitted to Nagios for processing.
- 
-==== Stop/start Nagios ==== 
- 
-As root, run: service nagios restart 
- 
-==== Changing the grid certificate ==== 
- 
-When the grid certificate of the user used to run tests is renewed once a year, copy the userkey.pem and usercert.pem files to .globus like on any UI. To do so, follow those steps: 
- 
-1. Copy the pem files to the gate machine grid11.lal.in2p3.fr:​ 
- 
-<​code>​ 
-eval `ssh-agent` 
-ssh-add 
-gsiscp -P 2222 user*.pem grid11.lal.in2p3.fr:​ 
-</​code>​ 
- 
-2. Then log into grid11.lal.in2p3.fr and copy the pem files to the Nagios box grid4: 
- 
-<​code>​ 
-gsissh -AX -p 2222 grid11.lal.in2p3.fr 
-scp user*pem fmichel@grid04.lal.in2p3.fr:/​.globus 
-</​code>​ 
- 
-3. Then test the new pem files: 
- 
-<​code>​ 
-ssh fmichel@grid04.lal.in2p3.fr 
-voms-proxy-init --voms biomed 
-</​code>​ 
- 
-==== Proxy certificate renewal ==== 
- 
-Ssh to the any UI or on Nagios server: grid04.lal.in2p3.fr 
- 
-  * Create a valid proxy certificate:​ 
- 
-<​code>​ 
-$ voms-proxy-init --voms biomed 
-</​code>​ 
- 
-  * Renew the proxy 
- 
-<​code>​ 
-$ myproxy-init --cred_lifetime 672 --credname NagiosRetrieve-grid04.lal.in2p3.fr-biomed --pshost myproxy.grif.fr --username nagios --regex_dn_match --retrievable_by_cert "/​O=GRID-FR/​C=FR/​O=CNRS/​OU=LAL/​CN=grid04.lal.in2p3.fr"​ 
-</​code>​ 
- 
-  * Check the proxy: 
- 
-<​code>​ 
-$ myproxy-info -l nagios -s myproxy.grif.fr 
-</​code>​ 
- 
-  * Test the proxy retrieval probe: 
- 
-<​code>​ 
-Ssh to the Nagios server: grid04.lal.in2p3.fr 
-$ sudo su - nagios 
-$ /​usr/​libexec/​grid-monitoring/​probes/​hr.srce/​refresh_proxy ​ --myproxyuser nagios --cert /​etc/​nagios/​globus/​hostcert.pem --vo biomed --name NagiosRetrieve-grid04.lal.in2p3.fr-biomed -H myproxy.grif.fr --key /​etc/​nagios/​globus/​hostkey.pem -x /​etc/​nagios/​globus/​userproxy.pem-biomed 
-</​code>​ 
- 
  
  • biomed-shifts/argo.txt
  • Last modified: 2017/08/29 09:38
  • by fmichel