biomed-shifts:practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
biomed-shifts:practices [2021/10/13 12:09] – [Reproduce the problem] sorinabiomed-shifts:practices [2022/05/19 14:32] (current) – [Reproduce the problem] sorina
Line 57: Line 57:
 ======  Identification of issues  ====== ======  Identification of issues  ======
  
 +Link to Biomed ARGO page: [[https://biomed.ui.argo.grnet.gr/]]
 =====  VOMS server  ===== =====  VOMS server  =====
 The proxy certificate creation should work: The proxy certificate creation should work:
Line 63: Line 64:
 The VOMS administration interface should be available. From a UI, run the command: The VOMS administration interface should be available. From a UI, run the command:
 <code>voms-admin --vo=biomed --host  voms-biomed.in2p3.fr --port 8443 list-cas</code> <code>voms-admin --vo=biomed --host  voms-biomed.in2p3.fr --port 8443 list-cas</code>
- 
-=====  LFC server  ===== 
-Command "''time lfc-ls /grid''" should return in less than 30 seconds. 
  
 =====  Monitoring SEs  ===== =====  Monitoring SEs  =====
Line 87: Line 85:
 From the biomed-ui.fedcloud.fr VM, where gfal2 is already installed : From the biomed-ui.fedcloud.fr VM, where gfal2 is already installed :
  
-  - build the Storage URL following the model "srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed" +1. Build the Storage URL following the model <code>srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed</code> 
-NOTE 1: the model works for DPM SEs, not sure about storm or dCache+   
 +NOTE 1: the model works for DPM SEs, not sure about storm or dCache (a storm example is srm:////storm-01.roma3.infn.it:8444/srm/managerv2?SFN=/biomed) 
 NOTE 2: would be interesing to use the probe for building this URL NOTE 2: would be interesing to use the probe for building this URL
  
-  - use gfal-ls to check that we can list the folder +2. Use gfal-ls to check that we can list the folder 
- +<code>gfal-ls srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu </code> 
-  <code>gfal-ls srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu </code> +3. Use gfal-copy to copy a file (in this case, job.jdl) to the above URL 
-    +<code>gfal-copy job.jdl srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu/ 
-  - use gfal-copy to copy a file (in this case, job.jdl) to the above URL +
- +
-  <code>gfal-copy job.jdl srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu/ +
   Copying file:///home/spop/dirac/job.jdl   [DONE]  after 17s </code>   Copying file:///home/spop/dirac/job.jdl   [DONE]  after 17s </code>
-    +4. Check the copy was copied and is now listed 
-  - check the copy was copied and is now listed +<code>gfal-ls srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu
-  <code>gfal-ls srm://marsedpm.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/user/s/scamarasu+
   job.jdl </code>   job.jdl </code>
  
 Note that in some cases, the gfal-ls may work (as well as gfal-mkdir), but not the gfal-copy:  Note that in some cases, the gfal-ls may work (as well as gfal-mkdir), but not the gfal-copy: 
-<code>gfal-mkdir srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu </code> +<code>gfal-mkdir srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu  
-<code>gfal-ls srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu </code> +gfal-ls srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu 
-<code>gfal-copy dirac/job.jdl srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu/+gfal-copy dirac/job.jdl srm://clrlcgse01.in2p3.fr:8446/dpm/in2p3.fr/home/biomed/scamarasu/
 gfal-copy error: 70 (Communication error on send) - Could not open destination: globus_xio: Unable to connect to clrlcgse01.in2p3.fr:2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused </code> gfal-copy error: 70 (Communication error on send) - Could not open destination: globus_xio: Unable to connect to clrlcgse01.in2p3.fr:2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused </code>
  
Line 124: Line 120:
 When a SE is to planned for decommissioning, launch the specific [[Biomed-Shifts:decommisioning|SE decommissioning procedure]]. When a SE is to planned for decommissioning, launch the specific [[Biomed-Shifts:decommisioning|SE decommissioning procedure]].
  
 +Older decommissionning page is available here  [[Biomed-Shifts:old:old-decommisioning|Old SE decommissioning procedure]].
 =====  Monitoring CEs  ===== =====  Monitoring CEs  =====
  
 ====  Identify the problems  ==== ====  Identify the problems  ====
-The ARGO box is the best way to identify faulty resources. You may use the following straight link: [[https://argo-mon-biomed.cro-ngi.hr/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_CREAM-CE&style=detail&servicestatustypes=16&sorttype=2&sortoption=6|Critical issues for service group CREAM-CE]]+The ARGO box is the best way to identify faulty resources.  
 +====  Reproduce the problem  ====
  
-Probes documentation is available at https://wiki.egi.eu/wiki/ROC_SAM_Tests.+1Manual ARC CE submission
  
-====  Reproduce the problem  ==== +- see https://www.nordugrid.org/arc/arc6/users/submit_job.html for more details and a job description example 
-Reproduce the problem by one of the two methods below.+ 
 +- submit with "arcsub job.xrsl -c CENAME" 
 + 
 +Further ARC CE documentation available in French : https://grand-est.fr/support-utilisateurs/documentation-en-ligne/guide-dutilisation-de-arc-ce/
  
-Download this {{:biomed-shifts:test.jdl|test JDL}} (or {{:biomed-shifts:test2.jdl|this one}}, since the 1st one seems to fail) , rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command: +and DIRAC :https://grand-est.fr/support-utilisateurs/documentation-en-ligne/guide-dutilisation-de-dirac/
-<code> glite-ce-job-submit -a -r <CE hostname>:<port>/<queue_name> test_ce_noreq.jdl</code> +
-Then check that the status and the output when the submit command has completed: +
-<code>glite-ce-job-status <jobId></code>+
  
-Reminder: __before submitting a ticket make sure one is not open yet__.+2Manual HTCndorCE submission
  
 +TO BE DONE
 ==== Ignored alarms ==== ==== Ignored alarms ====
 Shifters shall focus on failed job submissions in priority: probes ''emi.cream.CREAMCE-AllowedSubmission''. Shifters shall focus on failed job submissions in priority: probes ''emi.cream.CREAMCE-AllowedSubmission''.
  • biomed-shifts/practices.1634126957.txt.gz
  • Last modified: 2021/10/13 12:09
  • by sorina