Biomed shift takeover phone conference
Date: February 24th 2014, by mail
Attendees:
- Doan Trung Tung (IFI)
- Franck Michel (CNRS I3S)
- Jerome Pansanel (CNRS IPHC)
Next conference: March 3rd 2014, 10h00.
General information
Remind about best practices:
- Start by following up on open tickets and verifying solved tickets, before submitting new ones.
- Before submitting a ticket verify that:
- Another ticket does not already exist for the same problem
- The resource is not in downtime and is in production status ⇒ you can use the brand new VAPOR portal to do so
- The alarm can be reproduced manually
- For CEs: ignore the alarms in the cases described here.
Takeover report
Franck has closed a lot of tickets (10 more or less), of which 3 shouldn't have been submitted:
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=101107 : for the known SoftVer error that we ignore
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=101118 : for a site in downtime.
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=101105 : quyeue disabled by the admin and for 3 others there was no response from the admin but there was no more alarm.
- Few tickets have been submitted for SEs and CEs reporting 444444 jobs. Most are fixed and verified.
- Ticket on decommissioning of vm134.grnet.stratuslab.eu has been closed (it is now managed).
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=100950 : New ticket about excessive number of jobs, probably using OpenMole. I copied the user as well as Romain Reuillon.
Otherwise, Franck did not check on CE alarms, as I spent quite some time dealing with problems of CEs detected by VAPOR, that are not always detected by Nagios. In particular Tung has not to worry about ticket 101040, this is managed with JSAGA developer.