biomed-shifts:start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
biomed-shifts:start [2016/02/01 13:31]
fmichel [Biomed Support Team]
— (current)
Line 1: Line 1:
-<​jumbotron>​ 
-====== ​ Biomed Support Team ====== 
-{{ :​logo.png?​200|}} 
-Welcome to the Biomed support team wiki pages. These pages are dedicated to the Biomed technical support team. They provide provide organisational information as to duty shifts, and technical information about common support tasks as well as best practices. 
-</​jumbotron>​ 
- 
-<text background="​warning">​Shortcut for **team shifters**: **[[Biomed-Shifts:​Practices|daily tasks and best practices]]**</​text>​. 
- 
- 
-=====  News ===== 
-  *  <text background="​warning">​Mar. 2014: Upgrade version of Nagios accessible https:<​nowiki>//</​nowiki>​grid16.lal.in2p3.fr/​nagios/​. It must now be used instead of grid04.</​text>​ 
- 
-**Former news** 
-  *  Apr. 2013: New version of Nagios moved to bigger machine https:<​nowiki>//</​nowiki>​grid04.lal.in2p3.fr/​nagios/​. It must now be used instead of grid02. 
-  *  Apr. 2013: A [[Biomed-Shifts:​Nagios|new wiki page]] sums up all the information about Nagios, accessible from the [[#​Nagios_box|Nagios box]] section in this page. 
-  *  Jan. 2013: A new version of Nagios has been installed on at https:<​nowiki>//</​nowiki>​grid02.lal.in2p3.fr/​nagios/​. It is usable from now on, but will be moved to another server in the coming months. 
-  *  Oct. 2012: probe org.sam.WN-SoftVer-biomed fails for more and more CREAM CEs. This is due to an old version of the probe that is not able to parse EMI2 version => NO TICKET SHOULD BE SUBMITTED FOR THIS ISSUE. 
-  *  Nov. 5th 2012 This [[Biomed-Shifts:​Practices#​Check_CEs_publishing_bad_number_of_running/​waiting_jobs|list of CEs]] reports the CEs and CREAM CEs that publish wrong values for running or waiting jobs 
-  *  Oct. 2012 It is now possible to add comments in the Nagios interface. On the page of any host or probe, click "Add a new comment"​ at the bottom of the page, and in the form check the Persistent checkbox. This should help leave messages intended to biomed shifters like "this service has been decommissioned,​ do not care about it". On the usual list of alarms, a comment is mentioned with a little white cloud icon. 
-  *  Jul. 25th 2012 The [[#​STATUS_OF_BIOMED_RESOURCES|list of unavailable services]] now reports status from both the GOCDB (downtime, not monitored, not in production) and the BDII (draining, closed) 
-  *  Apr. 5th 2012 Check the [[#​STATUS_OF_BIOMED_RESOURCES|list of unavailable services]] as mentioned in the GOCDB (downtime, not monitored, not in production) 
-  *  Feb. 14th 2012 The [[#​Status_of_biomed_Online_Storage_Space|free space report]] of SEs now provides the SE status from the GOCDB (downtime, not monitored...) 
-  *  Jav. 25th 2012 News in the [[Biomed-Shifts:​Practices|best practises page]]: include lists of VOSUpport tickets, Team tickets, and Team solved tickets. Added list of decommissioned resources. 
-  *  Dec. 8th, 2011: Tools moved to new page [[Biomed-Shifts:​Biomed-support-tools|Biomed Support Tools]], including new tools <​tt>​show-se-space</​tt>​ and <​tt>​monitor-se-space</​tt>​. 
-  *  Nov. 14th, 2011: Section [[Biomed-Shifts:​Index#​Status_of_biomed_Online_Storage_Space|Biomed Storage Space]] reports available space on all SEs supporting biomed. It is updated every 10 minutes. 
-  *  [[Biomed-Shifts:​Biomed-support-tools#​se-web-report-generator|Report generator for biomed SEs]] available: in particular, the first report form opens a GGUS search page, a GOCDB search page, and trends and alerts histogram from Nagios, for a given SE hostname. This shall be usefull for shifters when a new SE is in alarm in Nagios. 
-  *  Nagios box will be down for the two first weeks of November (due to cooling system failure). In the meantime, manual scripts should be used to monitor resources. 
-  *  August 2011: update of the [[#​Files_and_Space_Management_Tools|file management tool]]; it now handles file replication. 
-  *  July 2011: another script has be set on line: it simply automates the lcg-cr and lcg-del commands on a SE: [[https://​dav.healthgrid.org//​biomed-shifts/​lcg-cr-1.0.sh|lcg-cr-1.0.sh]] 
-  *  July 2011: [[#​Files_and_Space_Management_Tools|file management tools]] can help shifters handling the decommissioning and full SE procedures 
-  *  June 2011: The [[https://​vodashboard.lip.pt|VO Admin Dashboard]] provides an integrated view over several other portals based on the selected VO. Integrated portals: Ops Portal, CIC, GGUS, GStat, GOCDB, VOMS Admin, Apps DB, RT. It is still under development,​ later on Nagios should be included into the view. We can use it during the shifts to check on it, assess its interest for us, raise questions or requests about it. 
-  *  May 2011: the shift schedule was extended to the end of July (see below). 
-  *  20/04/2011: the host name resolution tool URL has changed: please <​u>​now use  https:<​nowiki>//</​nowiki>​gus.fzk.de/​pages/​help/​help_hostinfo.php</​u>​. The previous URL (https:<​nowiki>//</​nowiki>​iwrgustrain.fzk.de/​pages/​help/​help_hostinfo.php) is that of a GGUS test instance that is not always synchronized with the GOCDB. 
-  *  18/​01/​2011:​the GGUS team provided a link to get the site name from the host name ([[https://​iwrgustrain.fzk.de/​pages/​help/​help_hostinfo.php|check here]]). This is useful to submit team tickets! 
-  *  Dec 2010: the host certificate of the VOMS server cclcgvomsli01.in2p3.fr expired on 1st December 2010 and it generated a lot of errors in the VO services (in particular SEs). [[Biomed-Shifts:​VOMS-certificate | Here ]] is some information to send to site admins to update the VOMS certificate. 
-  *  Nov 2010: thanks to the French NGI biomed now has its own Nagios box. Check it [[https://​grid04.lal.in2p3.fr/​nagios/​|here]]. It runs with a biomed certificate and is used by the technical shifts. 
-  *  September 2010: thanks to CC-IN2P3 we now have two LFC machines behind lfc-biomed.in2p3.fr. This should allow to have 198 concurrent connection threads. 
- 
-=====  Participants ===== 
- 
-Mailing list: biomed-technical-support [no spam-AT] googlegroups [no spam-DOT] com 
- 
-The participants are listed in the order of the shifts: 
-  *  CNRS-I3S, FR (Franck Michel) 
-  *  CNRS-IPHC, FR (Patrick Guterl) 
-  *  CNRS-Creatis,​ FR (Tristan Glatard, backup: Sorina Camarasu-Pop) 
-  *  BME-IIT, HU (Ãos Szlavecz, backup: Gár Hesz) 
-  *  UPV, ES (Abel Antonio Carrióollado) 
- 
-Backup (teams no longer heavy users, but wishing to remain as backup): 
-  *  CNRS-LPC, FR (Paul de Vlieger) 
-  *  CNRS-ISC-PIF,​ FR (Romain Reuillon) 
-  *  INFN-BA, Libi, Bari, IT (Giacinto Donvito) 
- 
-Past (teams that contributed in the past, but had to leave us) 
-  *  IsraGrid (Arad Alper) 
-  *  IFI - Institut de la Francophonie pour l'​Informatique,​ VN (Bui The Quang) 
- 
-===== Schedule ===== 
- 
-<​well>​ 
-^ Start date ^ End date ^ Team on duty 
-| 28-12-2015 | 01-01-2016 | I3S | 
-| 04-01-2016 | 08-01-2016 | CREATIS | 
-| 11-01-2016 | 15-01-2016 | IPHC | 
-| 18-01-2016 | 22-01-2016 | UPV | 
-| 25-01-2016 | 29-01-2016 | BME-IIT | 
-| 01-02-2016 <label type="​success">​On duty</​label>​ | 05-02-2016 | I3S | 
-| 08-02-2016 | 12-02-2016 | CREATIS | 
-| 15-02-2016 | 19-02-2016 | IPHC | 
-| 22-02-2016 | 26-02-2016 | UPV | 
-| 19-02-2016 | 04-03-2016 | BME-IIT | 
-| 07-03-2016 | 11-03-2016 | I3S | 
-| 14-03-2016 | 18-03-2016 | CREATIS | 
-| 21-03-2016 | 25-03-2016 | IPHC | 
-| 28-03-2016 | 01-04-2016 | UPV | 
-| 04-04-2016 | 08-04-2016 | BME-IIT | 
-| 11-04-2016 | 15-04-2016 | I3S | 
-| 18-04-2016 | 22-04-2016 | CREATIS | 
-| 25-04-2016 | 29-04-2016 | IPHC | 
-| 02-05-2016 | 06-05-2016 | UPV | 
-| 09-05-2016 | 13-05-2016 | BME-IIT | 
-| 16-05-2016 | 20-05-2016 | I3S | 
-| 23-05-2016 | 27-05-2016 | CREATIS | 
-</​well>​ 
-CNRS-I3S, CNRS-IPHC, CNRS-Creatis,​ BME-IIT, UPV 
- 
-[[http://​biomed.grid.creatis.insa-lyon.fr/​en/​Biomed-Shifts:​history|Past schedule]]. 
- 
-[[http://​biomed.grid.creatis.insa-lyon.fr/​en/​Biomed-Shifts:​history|Minutes]] of the shift take-over conferences. 
- 
-===== Daily Tasks and Best Practices ===== 
- 
-See this wiki **[[Biomed-Shifts:​Practices|page]]**. 
- 
-===== Status of biomed Resources ===== 
-<​well>​ 
-====  Resources Supporting the VO ==== 
-List and details of all resources that currently support the biomed VO on VAPOR: [[https://​operations-portal.egi.eu/​vapor/​vapor-voSupportingResources?​vo=biomed|supporting resources]] 
-</​well>​ 
- 
-<​well>​ 
-====  Unavailable & Faulty Resources ==== 
-Resources that are currently unavailable on VAPOR: [[https://​operations-portal.egi.eu/​vapor/​vapor-voResourcesNotInProduction?​vo=biomed|faulty resources]]. 
- 
-This list is consolidated from 2 sources: 
-  *  the GOCDB provides status **downtime, not in production and not monitored**,​ as well as **site uncertified**,​ 
-  *  the BDII provides status **draining, closed, unknown...**. 
-</​well>​ 
- 
-<​well>​ 
-====  Decommissioned resources & decommissioning procedure ​ ==== 
-When a SE is to planned for decommissioning,​ launch the specific [[Biomed-Shifts:​decommisioning|SE decommissioning procedure]]. 
-</​well>​ 
- 
-===== Monitoring Tools ===== 
-The following monitoring tools are currently under development / being tested. They could be used to give information on the VO status but they can't be assumed 100% reliable now. In any case problems should be reproduced manually before a GGUS ticket is submitted. 
- 
-  * [[Biomed-Shifts:​Nagios|biomed VO Nagios]] 
-  * [[Biomed-Shifts:​Biomed-support-tools|VO Support Tools]]: collection of CLI tools, some of them are integrated into VAPOR. 
-  * [[https://​operations-portal.egi.eu/​vapor/?​vo=biomed|VAPOR]] provides several operations features related to the monitoring of computing and storage resources. It is complementary of Nagios. 
-  * [[https://​vodashboard.lip.pt|VO Admin Dashboard]]:​ this portal provides an integrated view over several other portals based on the selected VO. Integrated portals: Ops Portal, CIC, GGUS, GStat, GOCDB, VOMS Admin, Apps DB, RT. Still under development,​ later on Nagios should be included into that view. 
- 
-<​well>​ 
-**Top BDII:**\\ 
-Make sure the ''​lcg-infosites''​ tool that is used to query the BDII has **version > 2.6.9**, that comes with gLite 3.2. \\ Use command: ''​rpm -qa | grep infosites''​ 
- 
-For Biomed it is advised to refer to the **top BDII in IN2P3** \\ ''​export LCG_GFAL_INFOSYS=cclcgtopbdii02.in2p3.fr:​2170''​ \\ The script below may help you detect inconsistencies between top BDIIs: <​code>​ 
-#!/bin/bash 
-# This script compares the list of BIOMED SEs returned by 2 top BDIIs: CERN and IN2P3 
-lcg-infosites --vo biomed --is cclcgtopbdii02.in2p3.fr se | cut -f2 | sort > /​tmp/​list_se_bdi_in2p3 
-lcg-infosites --vo biomed --is lcg-bdii.cern.ch se | cut -f2 | sort > /​tmp/​list_se_bdi_cern 
-diff /​tmp/​list_se_bdi_in2p3 /​tmp/​list_se_bdi_cern</​tt>​ 
-</​code>​ 
-</​well>​ 
- 
-===== Team Coordination and VO Administration ===== 
-This section is intended to guide VO administrators in the usual team coordination and administration tasks: 
-  *  [[Biomed-Shifts:​Coordination|Team coordination usual tasks]]: this page describes the coordination of the support team and the VO management tasks. 
-  *  [[Biomed-Shifts:​Nagios|Nagios box]]: this page briefly describes paths and configuration parameters on the biomed Nagios box, in addition to the procedure on how to renew the Nagios proxy certificate. 
  
  • biomed-shifts/start.1454329887.txt.gz
  • Last modified: 2016/02/01 13:31
  • by fmichel