biomed VO ARGO Mon

The biomed ARGO box is hosted and maintained by the Croatian NGI. It monitors SEs and CEs of all sites that support the VO. It is the major reference for biomed support team members in term of resources monitoring.

Probes documentation:

For monitoring results, go to Service Groups → Summary → service name (SERVICE_SRM_V2 or SERVICE_CREAM-CE), or bookmark direct links like these:

Clicking on the SE/CE host name gives information on the scheduled downtimes (host state information section). Only critical problems (showing in red) may lead to ticket submission.

A specific probe checks the status of AGRO itself, i.e. all critical processes for ARGO to run. It should be checked in case of suspicious behaviour.

The figure below depicts important graphical elements in ARGO referring to downtimes and comments:

The ARGO instance is using the following POEM profile:

The topology is fetched from VAPOR:

Soft/Hard states vs. max_check_attempts:

  • normal_check_interval 60
  • retry_check_interval 15
  • max_check_attempts 4

⇒ each service is checked once an hour, if an error occurs, retry each 15 min until 4 times failed ⇒ hard state = notification, except for passive checks (passive_host_checks_are_soft=0).

Passive checks: they are initiated and performed by external applications/processes. Passive check results are submitted to Nagios for processing.

  • biomed-shifts/argo.txt
  • Last modified: 2017/08/29 09:38
  • by fmichel