Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
biomed-shifts:practices [2018/03/07 00:00] fmichel [Reproduce the problem] |
biomed-shifts:practices [2021/10/13 14:01] sorina [Identify the problems] |
||
---|---|---|---|
Line 15: | Line 15: | ||
* <text type="danger">**Check ARGO alarms**</text> concerning SEs, CEs. | * <text type="danger">**Check ARGO alarms**</text> concerning SEs, CEs. | ||
* <text type="danger">**Deal with full SEs, and resource decommissioning**</text> | * <text type="danger">**Deal with full SEs, and resource decommissioning**</text> | ||
- | * <text type="danger">**Report detected issues concerning ARGO box **</text> by assigning a team ticket to NGI_HR (Croatia). | + | * <text type="danger">**Report detected issues concerning ARGO box **</text> by assigning a team ticket to the dedicated ARGO support unit. |
<text type="danger">**Before submitting GGUS Team tickets**, have a **careful** look at the [[#Advices_about_ticket_submission|advices about ticket submission]]</text>. | <text type="danger">**Before submitting GGUS Team tickets**, have a **careful** look at the [[#Advices_about_ticket_submission|advices about ticket submission]]</text>. | ||
Line 70: | Line 70: | ||
==== Identify the problems ==== | ==== Identify the problems ==== | ||
- | ''lcg-cr'' and ''lcg-del'' should work on every SE of the VO. The ARGO box will help you identify faulty servers. You may use the following straight links: [[https://argo-mon-biomed.cro-ngi.hr/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_SRM&style=overview|SRM service group status]], or [[https://argo-mon-biomed.cro-ngi.hr/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_SRM&style=detail&servicestatustypes=16&sorttype=2&sortoption=6|Critical issues for service group SRM]] | + | |
+ | SRM probes used by ARGO box | ||
+ | - https://github.com/EGI-Foundation/nagios-plugins-srm | ||
+ | - based on the gfal2 library for the storage operations (gfal-copy, etc) | ||
+ | - queries the BDII service in order to build the Storage URL to test given the host-name and the VO name | ||
+ | - a X509 valid proxy certificate is needed to execute the probe (configured via X509_USER_PROXY variable). | ||
__Reminder__: do **NOT** submit a ticket if the service is in **downtime** or it is **not in proper production status**: see on [[http://operations-portal.in2p3.fr/vapor/resources/GL2ResVO?VOfilter=biomed|VAPOR]] the supporting resources or faulty resources. | __Reminder__: do **NOT** submit a ticket if the service is in **downtime** or it is **not in proper production status**: see on [[http://operations-portal.in2p3.fr/vapor/resources/GL2ResVO?VOfilter=biomed|VAPOR]] the supporting resources or faulty resources. | ||
Line 103: | Line 109: | ||
Reproduce the problem by one of the two methods below. | Reproduce the problem by one of the two methods below. | ||
- | Download this {{:biomed-shifts:test.jdl|test JDL}}, rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command: | + | Download this {{:biomed-shifts:test.jdl|test JDL}} (or {{:biomed-shifts:test2.jdl|this one}}, since the 1st one seems to fail) , rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command: |
<code> glite-ce-job-submit -a -r <CE hostname>:<port>/<queue_name> test_ce_noreq.jdl</code> | <code> glite-ce-job-submit -a -r <CE hostname>:<port>/<queue_name> test_ce_noreq.jdl</code> | ||
Then check that the status and the output when the submit command has completed: | Then check that the status and the output when the submit command has completed: |