biomed-shifts:practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
biomed-shifts:practices [2018/03/07 00:00]
fmichel [Reproduce the problem]
biomed-shifts:practices [2021/10/13 14:01]
sorina [Identify the problems]
Line 15: Line 15:
   * <text type="​danger">​**Check ARGO alarms**</​text>​ concerning SEs, CEs.   * <text type="​danger">​**Check ARGO alarms**</​text>​ concerning SEs, CEs.
   * <text type="​danger">​**Deal with full SEs, and resource decommissioning**</​text>​   * <text type="​danger">​**Deal with full SEs, and resource decommissioning**</​text>​
-  * <text type="​danger">​**Report detected issues concerning ARGO box **</​text>​ by assigning a team ticket to NGI_HR (Croatia).+  * <text type="​danger">​**Report detected issues concerning ARGO box **</​text>​ by assigning a team ticket to the dedicated ARGO support unit.
  
 <text type="​danger">​**Before submitting GGUS Team tickets**, have a **careful** look at the [[#​Advices_about_ticket_submission|advices about ticket submission]]</​text>​. <text type="​danger">​**Before submitting GGUS Team tickets**, have a **careful** look at the [[#​Advices_about_ticket_submission|advices about ticket submission]]</​text>​.
Line 70: Line 70:
  
 ====  Identify the problems ​ ==== ====  Identify the problems ​ ====
-''​lcg-cr''​ and ''​lcg-del''​ should work on every SE of the VO. The ARGO box will help you identify faulty servers. You may use the following straight links: [[https://argo-mon-biomed.cro-ngi.hr/nagios/cgi-bin/​status.cgi?​servicegroup=SERVICE_SRM&​style=overview|SRM service group status]]or [[https://​argo-mon-biomed.cro-ngi.hr/​nagios/​cgi-bin/​status.cgi?​servicegroup=SERVICE_SRM&​style=detail&​servicestatustypes=16&​sorttype=2&​sortoption=6|Critical issues for service group SRM]]+ 
 +SRM probes used by ARGO box 
 +https://github.com/EGI-Foundation/nagios-plugins-srm 
 +- based on the gfal2 library for the storage operations (gfal-copyetc) 
 +queries the BDII service in order to build the Storage URL to test given the host-name and the VO name 
 +a X509 valid proxy certificate is needed to execute the probe (configured via X509_USER_PROXY variable). 
  
 __Reminder__:​ do **NOT** submit a ticket if the service is in **downtime** or it is **not in proper production status**: see on [[http://​operations-portal.in2p3.fr/​vapor/​resources/​GL2ResVO?​VOfilter=biomed|VAPOR]] the supporting resources or faulty resources. __Reminder__:​ do **NOT** submit a ticket if the service is in **downtime** or it is **not in proper production status**: see on [[http://​operations-portal.in2p3.fr/​vapor/​resources/​GL2ResVO?​VOfilter=biomed|VAPOR]] the supporting resources or faulty resources.
Line 103: Line 109:
 Reproduce the problem by one of the two methods below. Reproduce the problem by one of the two methods below.
  
-Download this {{:​biomed-shifts:​test.jdl|test JDL}}, rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command:+Download this {{:​biomed-shifts:​test.jdl|test JDL}} (or {{:​biomed-shifts:​test2.jdl|this one}}, since the 1st one seems to fail) , rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command:
 <​code>​ glite-ce-job-submit -a -r <CE hostname>:<​port>/<​queue_name>​ test_ce_noreq.jdl</​code>​ <​code>​ glite-ce-job-submit -a -r <CE hostname>:<​port>/<​queue_name>​ test_ce_noreq.jdl</​code>​
 Then check that the status and the output when the submit command has completed: Then check that the status and the output when the submit command has completed:
  • biomed-shifts/practices.txt
  • Last modified: 2022/05/19 16:32
  • by sorina