biomed-shifts:practices

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
biomed-shifts:practices [2021/10/13 14:05]
sorina [Reproduce the problem]
biomed-shifts:practices [2022/05/19 16:32] (current)
sorina [Reproduce the problem]
Line 57: Line 57:
 ====== ​ Identification of issues ​ ====== ====== ​ Identification of issues ​ ======
  
 +Link to Biomed ARGO page: [[https://​biomed.ui.argo.grnet.gr/​]]
 =====  VOMS server ​ ===== =====  VOMS server ​ =====
 The proxy certificate creation should work: The proxy certificate creation should work:
Line 63: Line 64:
 The VOMS administration interface should be available. From a UI, run the command: The VOMS administration interface should be available. From a UI, run the command:
 <​code>​voms-admin --vo=biomed --host ​ voms-biomed.in2p3.fr --port 8443 list-cas</​code>​ <​code>​voms-admin --vo=biomed --host ​ voms-biomed.in2p3.fr --port 8443 list-cas</​code>​
- 
-=====  LFC server ​ ===== 
-Command "''​time lfc-ls /​grid''"​ should return in less than 30 seconds. 
  
 =====  Monitoring SEs  ===== =====  Monitoring SEs  =====
Line 72: Line 70:
  
 SRM probes used by ARGO box SRM probes used by ARGO box
-- https://​github.com/​EGI-Foundation/​nagios-plugins-srm + 
-- based on the gfal2 library for the storage operations (gfal-copy, etc) +  ​- https://​github.com/​EGI-Foundation/​nagios-plugins-srm 
-- queries the BDII service in order to build the Storage URL to test given the host-name and the VO name +  - based on the gfal2 library for the storage operations (gfal-copy, etc) 
-- a X509 valid proxy certificate is needed to execute the probe (configured via X509_USER_PROXY variable).+  - queries the BDII service in order to build the Storage URL to test given the host-name and the VO name 
 +  - a X509 valid proxy certificate is needed to execute the probe (configured via X509_USER_PROXY variable).
  
  
Line 86: Line 85:
 From the biomed-ui.fedcloud.fr VM, where gfal2 is already installed : From the biomed-ui.fedcloud.fr VM, where gfal2 is already installed :
  
-i) build the Storage URL following the model "srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed" ;  +1. Build the Storage URL following the model <​code>​srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed</​code>​ 
- +   
-NOTE 1: the model works for DPM SEs, not sure about storm or dCache+NOTE 1: the model works for DPM SEs, not sure about storm or dCache ​(a storm example is srm:////​storm-01.roma3.infn.it:​8444/​srm/​managerv2?​SFN=/​biomed)
  
 NOTE 2: would be interesing to use the probe for building this URL NOTE 2: would be interesing to use the probe for building this URL
  
-ii) use gfal-ls to check that we can list the folder +2. Use gfal-ls to check that we can list the folder
 <​code>​gfal-ls srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu </​code>​ <​code>​gfal-ls srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu </​code>​
- +3. Use gfal-copy to copy a file (in this case, job.jdl) to the above URL
-iii) use gfal-copy to copy a file (in this case, job.jdl) to the above URL +
 <​code>​gfal-copy job.jdl srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu/ ​ <​code>​gfal-copy job.jdl srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu/ ​
-Copying file:///​home/​spop/​dirac/​job.jdl ​  ​[DONE] ​ after 17s </​code>​ +  ​Copying file:///​home/​spop/​dirac/​job.jdl ​  ​[DONE] ​ after 17s </​code>​ 
- +4. Check the copy was copied and is now listed
-iv) check the copy was copied and is now listed+
 <​code>​gfal-ls srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu <​code>​gfal-ls srm://​marsedpm.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​user/​s/​scamarasu
-job.jdl </​code>​+  ​job.jdl </​code>​
  
 Note that in some cases, the gfal-ls may work (as well as gfal-mkdir),​ but not the gfal-copy: ​ Note that in some cases, the gfal-ls may work (as well as gfal-mkdir),​ but not the gfal-copy: ​
-<​code>​gfal-mkdir srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu ​</​code>​ +<​code>​gfal-mkdir srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu  
-<​code>​gfal-ls srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu ​</​code>​ +gfal-ls srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu 
-<​code>​gfal-copy dirac/​job.jdl srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu/​+gfal-copy dirac/​job.jdl srm://​clrlcgse01.in2p3.fr:​8446/​dpm/​in2p3.fr/​home/​biomed/​scamarasu/​
 gfal-copy error: 70 (Communication error on send) - Could not open destination:​ globus_xio: Unable to connect to clrlcgse01.in2p3.fr:​2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused </​code>​ gfal-copy error: 70 (Communication error on send) - Could not open destination:​ globus_xio: Unable to connect to clrlcgse01.in2p3.fr:​2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused </​code>​
  
Line 125: Line 120:
 When a SE is to planned for decommissioning,​ launch the specific [[Biomed-Shifts:​decommisioning|SE decommissioning procedure]]. When a SE is to planned for decommissioning,​ launch the specific [[Biomed-Shifts:​decommisioning|SE decommissioning procedure]].
  
 +Older decommissionning page is available here  [[Biomed-Shifts:​old:​old-decommisioning|Old SE decommissioning procedure]].
 =====  Monitoring CEs  ===== =====  Monitoring CEs  =====
  
 ====  Identify the problems ​ ==== ====  Identify the problems ​ ====
-The ARGO box is the best way to identify faulty resources. ​You may use the following straight link: [[https://​argo-mon-biomed.cro-ngi.hr/​nagios/​cgi-bin/​status.cgi?​servicegroup=SERVICE_CREAM-CE&​style=detail&​servicestatustypes=16&​sorttype=2&​sortoption=6|Critical issues for service group CREAM-CE]]+The ARGO box is the best way to identify faulty resources. ​ 
 +====  ​Reproduce the problem ​ ====
  
-Probes documentation is available at https://​wiki.egi.eu/​wiki/​ROC_SAM_Tests.+1Manual ARC CE submission
  
-====  Reproduce the problem ​ ==== +- see https://​www.nordugrid.org/​arc/​arc6/​users/​submit_job.html for more details and a job description example 
-Reproduce the problem by one of the two methods below.+ 
 +- submit with "​arcsub job.xrsl -c CENAME"​ 
 + 
 +Further ARC CE documentation available in French : https://​grand-est.fr/​support-utilisateurs/​documentation-en-ligne/​guide-dutilisation-de-arc-ce/​
  
-Download this {{:biomed-shifts:test.jdl|test JDL}} (or {{:biomed-shifts:​test2.jdl|this one}}, since the 1st one seems to fail) , rename it as test_ce_noreq.jdl and submit it to the concerned CE. Check the BDII (lcg-infosites) to get the full name of a queue on that CE and run the command: +and DIRAC :https://grand-est.fr/support-utilisateurs/​documentation-en-ligne/guide-dutilisation-de-dirac/
-<​code>​ glite-ce-job-submit -a -r <CE hostname>:<​port>​/<​queue_name>​ test_ce_noreq.jdl</​code>​ +
-Then check that the status and the output when the submit command has completed:​ +
-<​code>​glite-ce-job-status <​jobId><​/code>+
  
-Reminder: __before submitting a ticket make sure one is not open yet__.+2Manual HTCndorCE submission
  
 +TO BE DONE
 ==== Ignored alarms ==== ==== Ignored alarms ====
 Shifters shall focus on failed job submissions in priority: probes ''​emi.cream.CREAMCE-AllowedSubmission''​. Shifters shall focus on failed job submissions in priority: probes ''​emi.cream.CREAMCE-AllowedSubmission''​.
  • biomed-shifts/practices.1634126753.txt.gz
  • Last modified: 2021/10/13 14:05
  • by sorina