05NOV2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 5 Nov 09: Update: Stratus Maintenance Outage and Data Migration Date: Thu, 05 Nov 2009 07:46:38 -0500 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Stratus Quarterly update completed successfully and the data migration between /gpfs/s and /gpfs/s4 is progressing. Once all of the targeted data has been migrated, IBM will begin the verification process. At this time we estimate that user access to Stratus will be enabled by 14:00 local. We will send out an update if more time is required. Once access has been restored IBM will send out an annoucement via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 04NOV2009 There could be no data flow to nomad[1,3,5] today. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 04 Nov 09: Stratus Maintenance Outage on Wednesday 4 Nov. 2009 Date: Wed, 04 Nov 2009 06:33:40 -0500 From: Madhuveer Konidena (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 06:30 local, all LoadLeveler jobs will be drained. At 07:30 all remaining jobs will cancelled, users logged out, and LoadLeveler will be stopped. Upon completion of scheduled maintenance and following successful verification, the IBM team will begin data migration from /gpfs/s to /gpfs/s4. This work is expected to take a minimum of 24 hours to complete. User access will not be restored until all data has been migrated and verified. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 28OCT2009 [Things seems a little confusing today. I place the announcements I received in reverse order below. (*j*)] -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Correction to previous message about Stratus Maintenance Date: Wed, 28 Oct 2009 09:07:00 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Stratus maintenance is NOT scheduled for today, 10/28. Stratus will be returned to developers today when Production Management Branch is satisfied that the production switch has been successful. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Stratus Maintenance Today Date: Wed, 28 Oct 2009 07:46:51 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Correction to prior message: Stratus will be undergoing maintenance, not returned to developers at 08:30 AM local. Sorry for any inconvenience. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Production Switch to Cirrus at 08:30 AM local Date: Wed, 28 Oct 2009 07:41:43 -0400 From: catherine.schaefer@noaa.gov To: NCEP.List.SP-Announce@noaa.gov Today, October 28th at 08:30 local, production will be switched to Cirrus. Momentarily, the development queues will be drained on Cirrus in preparation for the switch. At 08:30, any remaining development jobs will be killed and developers will be logged off Cirrus. Production will resume on Cirrus, and Stratus will be returned to developers at approximately 08:30 local. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 27OCT2009 The following means that development nomads nomad[1,3,5] will not receive any data flow from operations beginning 12Z (8AM EDT) 10/29. Use http://nomads.ncep.noaa.gov, the high availability 24/7 server. It is rumored that the data flow will be interrupted for about 3-hours. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 29 Oct 09: 3 Hour Stratus Outage Thursday 29 October 09:00 - 12:00 Local Date: Tue, 27 Oct 2009 15:38:54 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov IBM will be conducting a "Black Start" test of the NCEP Phase IV electrical and cooling systems. During this test Stratus be unavailable to users. Beginning at 08:00 (Local) on Thursday 10/29 all loadleveler queues will be drained. At 09:00 all remaining jobs will be cancelled and users will be logged out. Upon completion of the test, access will be restored and users will be notified via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce 16OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage Date: Thu, 15 Oct 2009 18:05:52 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The Cirrus outage will continue until approximately 19:30 local time. Once work is complete and access has been restored notification will be sent out via sp-announce. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 15OCT2009 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 15 Oct 09: Update to Cirrus Outage on 10/15/2009 Date: Thu, 15 Oct 2009 12:58:35 -0400 From: root@ccws.ncep.noaa.gov To: NCEP.List.SP-Announce@noaa.gov The current Cirrus outage will be extended until approximately 17:30 local. Once the work is complete user access will be restored. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 14OCT2009 Beginning 09:30 local on Thursday 10/15 Cirrus will be unavailable to non-production users. --------- 13OCT2009 (Anytime there is production switch and dev is taken by production there will be no data flow to nomad[1,3,5]. One should use the http://nomad.ncep.noaa.gov high availability server. Thus, "Vapor" changes do not affect nomad servers. If Cirrus is unavailable then there is no data flow to nomad servers. (*j*) ) - Please note the following scheduled maintenance activities and plan accordingly: Vapor update 10/14/09: Beginning 14:00z Vapor will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Cirrus update 10/21/09: Beginning 11:30z Cirrus will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Production Switch to Cirrus 10/28/09: A production switch from Stratus to Cirrus is scheduled to begin 11:30z on 10/28/09. Beginning at 10:30z all development queues on Cirrus will be drained. At 11:30z all remaining development jobs will be cancelled. Users that do not have production access will be logged out and their crontabs will be moved to their home directories. Non-production users will be granted access to Stratus once the failover has successfully completed. Stratus update 11/4/09: Beginning 11:30z Stratus will be unavailable for 24 hours. Upon completion of scheduled maintenance and system verification user access will be restored. Production switch back to Stratus 11/12/09: A Procuction switch from Cirrus to Stratus is scheduled to begin 11:30z on 11/12/09. Beginning at 10:30z all development queues on Stratus will be drained. At 11:30z all remaining development jobs will be cancelled. Users that do not have production access will be logged out and their crontabs will be moved to their home directories. Non-production users will be granted access to Cirrus once the failover has successfully completed. --------- 02OCT2009 High Availability Server: http://nomads.ncep.noaa.gov Users of NOMADS are reminded that they should use the URL http://nomads.ncep.noaa.gov/ to access the system and they will always be placed on the current active server. Starting on Tuesday October 7, 2009 at approximately 1400 UTC, users that have been using direct IP addresses to access NOMADS systems may no longer be able to access the system. --------- 30SEP2009 The anouncement below means that there will be no data flow to development NOMADS servers nomad[1,3,5].ncep.noaa.gov until 10/5/2009. The high availability server http://nomads.ncep.noaa.gov will continue to have the data on time. All, Production has been switched to Cirrus. Stratus is down due to scheduled new disk drive installation by IBM. Production is scheduled to be switched back to Stratus on Monday 5 October 2009. Dew will be shut down at 1600z today and will remain down until further notice. --------- 26SEP2009 Another nomad[1,3,5] data flow interruption is scheduled as indicated below in the sp-announce list_server (it means no data flow will be availabile to the development NOMADS servers but the high availability server should remain uo to date: Beginning 07:30 local on Wednesday 30 September, production will be switched from Stratus to Cirrus. Upon successful completion of the production switch, Stratus will be shutdown in order to facilitate the new disk installation. The disk installation process is expected to take approximately 36 hours. Development access to Stratus will be enabled as soon as the new disk has been installed and validated. Production will remain on Cirrus through the weekend. We will switch production back to Stratus on Monday 5 October at 07:30. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 25SEP2009 NOMADS development servers have returned to service including DODS/OPENDAP. The super computer development side returned around noon on Thursday but file corruption kept DODS/OPENDAP from running. Also the servers came up with a secure shell problem so no data was written to development serves. These problems are fixed (1500Z) now and data is flowing. (*j*) --------- 23SEP2009 The message below, received today, indicates that there will be no data flow to the development systems, particulary nomad[1,3,5] until further notice. The high availability server, http://nomads.ncep.noaa.gov will have all the data. These warning are also available from the list_server: NCEP.List.SP-Announce@noaa.gov (*j*) ------------------------- Original Message -------- Subject: [NCEP.List.SP-Announce] Cirrus/Dew offline Date: Wed, 23 Sep 2009 13:11:18 -0400 From: SDM To: _NCEP.List SP-Announce All, Dew and Cirrus are being taken down due to cooling problems in Fairmont. SDM - Joe Carr --------- 18SEP2009 The message below means the development super computer will be unavailable on Monday near Noon and will not return until Thurs after the data mirror is replinished therefore no real time data will get to development nomad[1,3,5]. (*j*) The high availability server will have the data. Beginning 13:00 EDT Monday 9/21/2009 Cirrus will be unavailable. The IBM team will be installing additional disk on Cirrus during this outage. All development queues will be drained at 12:00 EDT. At 13:00 all remaining jobs will be cancelled and any remaining users will be logged out. The cron daemons will be stopped on all cirrus interactive nodes. Therefore, when access to cirrus is restored existing crontabs will be resumed. Work is scheduled to complete by 01:30 9/24/2009. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce --------- 09SEP2009 This means that there will be no data flow for the developnment servers 1,3,5, but the high availability 24/7 sever http://nomads.ncep.noaa.gov will have all the data. (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Scheduled Fairmont Power Outage...Shutdown of Cirrus and Dew Date: Wed, 09 Sep 2009 09:45:20 -0400 From: SDM To: _NCEP.List SP-Announce On Friday, September 11, facilities work at Fairmont will result in the shutdown of both Dew and Cirrus. They will be unavailable for up to 18 hours. The schedule that we are working to is as follows: Note: EDT has been utilized for all times. September 11th 12:00PM-1:00PM Configure Cirrus nodes 1-8 to run production (DEV jobs will be suspended). 12:00PM-1:00PM Take down Cirrus frames 9-14. 2:00PM-2:45PM Relocate Cirrus networking to Cisco 6509. 2:45PM-4:15PM Cirrus and Dew shutdown 4:15PM-4:30PM Shutdown lnxfmt1, lnxfmt2, smsfmt1, smsfmt2, and svn-fmt 4:15PM-4:30PM Shutdown sdmfmta and sdmfmtb 4:30PM-4:45PM Shutdown Dew Force 10, Cirrus Force 10, and Cisco 6509 5:00PM- Fairmont power shutdown September 12th 1:00AM - Power is restored 1:15AM – 1:45AM Power up 6509, Cirrus Force10, and Dew Force10. 1:45AM – 2:15AM Power on lnxfmt1, lnxfmt2, smsfmt1, smsfmt2, svn-fmt, sdmfmta sdmfmtb, and CWS. 2:15AM – 2:30AM Verify connectivity, routing, power supplies and redundancy 2:15AM – 2:30AM Bring up disk for Cirrus and Dew. 2:30AM – 7:00AM Power up Cirrus, and Dew and test. 7:00AM - Release Cirrus and Dew --------- 29JUL2009 There have been a number of firewall problems as we move onto new super computing systems. When develdopment servers are down one can use the 24/7 high availabilty server http://nomads.ncep.noaa.gov In addition there will continue to be switching of the operational and development systems and when this happens it often means the development has to be shut down to enable the operations and when this happens there is no dataflow to the development servers althought dataflow continues to the high availability server. The schedule is complex but I include the following summary and invormation from John Ward: Hopefully you have all received the e-mail outlining the final round of tests that NCO will be performing over the next two weeks. I won't repeat the schedule here, but basically, as of Friday afternoon Cirrus & Stratus will be configured as the Development & Production, followed by a week or more of nearly daily switches of Dev & Prod. During the weeks of August 3 & 10, production will be switched about 8 times and the systems will be rebooted at least twice. Dev users will only have access to the Development machine throughout these switches. Since it isn't practical to mirror all your data between Cirrus & Stratus, Dev users should attempt to copy the bare minimum they will need to continue some level or work during these two weeks. All Classes & Groups should be configured the same as on Mist & Dew. I would recommend that all DevOnProd & Class1onProd users verify their access to both Cirrus & Stratus on Friday afternoon, after the systems have been configured as Prod & Dev. If you have any problems with access or running jobs, you should immediately notify IBM support, since the system will remain in that configuration for the weekend. --------- 10APR2009 The problems from bandwidth reductions noticed in early March should be mitigated by the decision below. nomad[1,3,5] will be back to normal. All should note that by the end of this year NCEP plans to switch completely to GRIB2. When all operations stops producing GRIB1 files, perhaps by this Fall, there will be no choice but to only have grib2 files. This should be transparent to most NOMADS users. -------- Original Message -------- Subject: Bandwidth Increase to NOMADS RFC Approved Date: Thu, 09 Apr 2009 13:59:06 -0400 From: Bill Lapenta Organization: NCEP/EMC To: Jordan Alpert CC: daniel Starosta , shrinivas Moorthi RFC approved and expected implementation date is 14 April. Thanks for the coordination. --------- 24MAR2009 -------- Original Message -------- Subject: [EMC #10789]: Slow network speed on NOMADS data Date: Tue, 24 Mar 2009 17:08:07 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: Jordan, I have submitted an RFC today to have the port speed increased from 10 Mbps to 100 Mbps on both nomad3 and nomad5. -Kyle ----- I am hoping this will fix the slow data transfer problems that we have been having on the development servers, nomad[1,3,5]: --- System admin wrote... Your request #128762 was updated by reginald.pace: Kyle, I got the green light to proceed with port speed increase from 10-100Mbps. Can you submit the RFC today and schedule for first thing next week? -Reggie -------- 20MAR2009 We have noticed a degradation in the transfer speeds for the last week, from our development servers, nomad[1,3,5], and the system admins are working on the problem. 13MAR2009 NCEP production/development IBM-SP super computers are changing. Development NOMADS will move today from Dew to Cirrus as Dew will not be available to development accounts (NOMADS) at COB today. Users should find this switch transparent. NOMADS high availability 24/7 server at the Web Operations Center http://nomads.ncep.noaa.gov will be unaffected. --------- 09FEB2009 nomad1 is in a state where all the data areas are showing read_only. I have sent a message to the helpdesk. -------- Original Message -------- Subject: [EMC #10515]: nomad1 problem Date: Mon, 09 Feb 2009 11:54:32 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200902091453.n19Er0wk030565@mailrt1.ncep.noaa.gov> Jordan, The disk arrays ran into issues because of an overflow of I/O to the RAIDs which caused the controller to shut them off. I have rebooted the system and they are now back online. At this time it is functioning normally; however, if this issue occurs again, it will require firmware updates which we will coordinate at that time. -Kyle Jordan --------- 18DEC2008 The testing for the change (see 5DEC 2008) had system/ops taking the develpment super computer for their work on Dec 17. As the message below states the system has returned, and we have restarted cron. "Transfer" jobs refers to the development mirror from which the experimental NOMADS servers nomad[1,3,5] get their data. NOMADS jobs need to be started manually which Jun and I have already done. Even though the system came back early this morning it takes a day to get the mirror replinished so all should use the new high availability server, 24/7, at http://nomads.ncep.noaa.gov Below is the message from super computer system. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Dew maintenance work Date: Thu, 18 Dec 2008 05:50:59 -0500 From: SDM To: _NCEP.List SP-Announce All, MIST is now available for developers. All upgrading and testing has been done, all transfer jobs have been restarted. SDM Mike Wooldridge Correction: Mist is expected to be returned to development approximately 07:30 18 Dec 08. -Don -----Original Message----- From: ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov [mailto:ncep.list.sp-announce-bounces@lstsrv.ncep.noaa.gov] On Behalf Of Don Avart (sysadmin) Sent: Wednesday, December 17, 2008 5:54 AM To: ncep.list.sp-announce@noaa.gov Subject: [NCEP.List.SP-Announce] 17 Dec 08: Mist Maintenance Wed. Dec 17 2008: 24 hours Beginning 07:30 on Wednesday 17 Dec 08 Mist will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Mist will be turned over to production for parallel operations and testing. Mist is expected to be returned to development approximately 07:30 17 Dec 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ---------- 05DEC 2008 {This means that data flow may begin on nomad[1,3,5]. Note also that there will be another data flow interuption when Dew and Mist are interchanged later this month -- not yet announced.} Subject: [NCEP.List.SP-Announce] 5 Dec 08: Dew testing complete. Cron Available Date: Fri, 05 Dec 2008 05:11:15 -0500 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production testing of Dew is complete. Users are now free to restore crontabs. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ---------- 01DEC 2008 Beginning 07:30 on Wednesday 3 Dec 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 4 Dec 08. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} ---------- 25NOV 2008 I hope everybody is aware of the switch of Operations to Dew on 3 December as part of the quarterly OS upgrade. Production will be on Dew for 2 weeks, so please check to be sure you are ready for the switch. {This means that the data flow to nomad[1,3,5] may be interrutped or late but the flow to http://nomads.ncep.noaa.gov should be OK.} ---------- 21NOV 2008 nomads6.ncdc.noaa.gov tenure as a backup server will end beginning on December 1! I have been informed by the NCDC group due to security limits on the nomads6 server, the backup server, will be turned off shortly. It will be unavailable for a time (most of December) but will return with http and ftp service only -- at least a first. Some NOMADS applications and GDS might be returned at some point but it will no longer be the backup server. By the end of this year, NOMADS real time model files will be on the high availability server at the WOC so we should not need such a backup. I encourage all to use that server, http://nomads.ncep.noaa.gov and also the development servers are in operation. Other applications on the backup server in the last week have made it impossible to update the GDS server as it is too busy so GDS is unlikely to return at all. A lot of the problems you have encountered this week has been due to competition (high load average) of other applications on the server in anticipation of the renovation. In the future, the server will continue to run ftp, http services and pdisp, ftp2u, http and GDS are operating but updating new files will be erratic, and will continue that way through the rest of this month. A copy of the reanalysis and other data sets should remain when the system is returned to operation in 2009 and still be available. ---------- 12NOV 2008 Update #3 All, Dew has returned to service and developers can use DEW. Transfer jobs are currently running and may take some time to catch up (overnight) with all model products. SDM Grant Newby ---------- 12NOV 2008 This just in from action director GWCB: 1600Z: i Folks, Power was lost to Dew this morning. IBM will have to fsck the file system once power is restored. I would not expect Dew to become available until very late today or tomorrow morning. DevonProd will also be unavailable until /com is synced between Mist & Dew. John Another outage for the Super-computer.... -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW down due to power problems Date: Wed, 12 Nov 2008 09:28:21 -0500 From: SDM To: _NCEP.List SP-Announce All, The Dew supercomputer is currently down due to power problems at the Fairmont Facility. It is currently unknown how long Dew will be down. Updates will be provided as more info becomes avbl. SDM - Joey 08NOV 2008 ---------- Sorry I did not get this out on time ..... (*j*) -------- Original Message -------- Subject: [NCEP.List.SP-Announce] DEW is now available to developers Date: Thu, 06 Nov 2008 17:31:33 -0500 From: SDM To: _NCEP.List SP-Announce All, Dew is now available to developers. Please note that over 24 hours of production data needs to be mirrored over from Mist to Dew...this will take a while. Therefore a full current set of production data in /com on Dew will not be available until tomorrow sometime. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #5 Date: Wed, 27 Aug 2008 14:13:28 -0400 From: SDM To: _NCEP.List SP-Announce All, Dew continues to be unavailable to development..production baseline testing will begin shortly. We expect testing and data syncing will take most of the rest of the day to accomplish, so we expect DEW will be available to developers no earlier than 12z tomorrow morning. Sorry for any inconvenience this may cause. SDM - Mark Shirey/Grant Newby IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. 02OCT 2008 We will be shutting down Nomad1 on Monday, 15Z, October 6th to setup additional storage. It should be down for a few hours. The GFS and NAM will be on the http://nomads6.ncdc.noaa.gov/ncep_data backup. All other data sets should be available during this period. ---------- 08AUG 2008 This following means no data flow for 8/26-27 for nomad[1,3,5] -- backups at nomads6.ncdc.noaa.gov/ncep_data or nomads.ncdc.noaa.gov and ftpprd. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #4 Date: Wed, 27 Aug 2008 09:38:08 -0400 From: SDM To: _NCEP.List SP-Announce All, GPFS is still not available on Dew...the file system check continues to run on Dew...it is believed that the fsck is running successfully...once the fsck is done and analyzed a more firm time will be able to be provided as to when Dew will be available again. Sorry for any inconvenience this may cause. SDM - Mark Shirey -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Dew GPFS problems - update #3 Date: Tue, 26 Aug 2008 21:56:28 -0400 From: SDM To: _NCEP.List SP-Announce All, IBM has no estimate of when Dew will be back in service. It is highly likely that this could possibly be an extended outage. Also, because NCEP is in a critical weather day, all developers will be taken off of Mist in order to minimize risk to Mist. Sorry for any inconvenience this may cause. SDM - Joe Carr ----------- 18 AUG 2008 Attention NCDC-NOMADS users, The NCDC NOMADS servers will soon undergo a reconfiguration that will change the way users access data. These changes will simplify and stabilize the Uniform Resource Locators (URLs) used across the NOMADS systems; and most importantly will remove the need for specific port numbers to access data. This way future NOMADS systems changes will be transparent to users. Users will need to modify any stored URLs they have for accessing the NCDC NOMADS suite of servers which contain specific references to port numbers. (Note: these changes will have no impact on the NCEP suite of NOMADS servers.) A transition period will be used to allow users to modify their access scripts. From the period Tuesday, August 19th, 2008 to September 01, 2008, the existing access points will remain in parallel with the new configuration, which is currently in place. On September 02, 2008 all URLs that contain port numbers will be discontinued. We urge users now to change their bookmarks, OPeNDAP applications, URL references in upcoming publications, or access scripts of any kind to remove all port numbers from their links and substitute the following: Service Current URL New URL Ensemble Probability Tool http://nomads.ncdc.noaa.gov:9091/EnsProb/ http://nomads.ncdc.noaa.gov/EnsProb/ GrADS Data Server (GDS) http://nomads.ncdc.noaa.gov:9090/dods/ http://nomads.ncdc.noaa.gov:9091/dods/ http://nomads.ncdc.noaa.gov/dods/ Live Access Server (LAS) http://nomads.ncdc.noaa.gov:8085/las/servlets/dataset http://nomads.ncdc.noaa.gov/las/servlets/dataset SRRS / NCEP Charts http://nomads.ncdc.noaa.gov:9091/ncep/NCEP http://nomads.ncdc.noaa.gov/ncep/NCEP Thredds Data Server (TDS) http://nomads.ncdc.noaa.gov:8085/thredds/ http://nomads.ncdc.noaa.gov/thredds/ ----------- 07 JUL 2008 The message below means there could be data delays on Wednesday, 7/9 for nomad[1,3,5], and a week later when the production switch is reversed. The backup http://nomads6.ncdc.noaa.gov/ncep_data and http://nomads.ncdc.noaa.gov are on a separate data flow and should not be affected (*j*) ------- Original Message -------- Subject: [NCEP.List.SP-Announce] 9 Jul 08: Production Switch to Dew Date: Mon, 07 Jul 2008 08:40:44 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Production will be switched from Mist to Dew beginning 07:30 local Wednesday 9 July. All non-production classes on Mist will be drained at 06:30. At 07:30 all development users will be logged off of Dew, their LoadLeveler jobs will be cancelled, and their crontabs will be moved out of /var/spool/cron/crontabs and placed into their home directories. Once production has switched to Dew, all non-production classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 01 JUL 2008 This means that the data flow on nomad[1,3,5] could be delayed. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 2 Jul 08: Dew Maintenance 2 Jul 08: 24 hours Date: Tue, 01 Jul 2008 08:56:41 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 on Wednesday 2 Jul 08 Dew will be unavailable for system maintenance. At 06:30 all development LoadLeveler queues will be drained. At 07:30 any remaining jobs will be cancelled and all users will be logged off. Once system maintenance has been completed Dew will be turned over to production for parallel operations and testing. Dew is expected to be returned to development by 07:30 3 Jul 08. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 10 JUN 2008 It appears that the cache on Nomad1 was overloaded and as a result cut off communications to the storage array. I have cleared the cache, updated the kernel in order to prevent a similar situation from occuring, performed disk checks and rebooted the system and it appears to be proper working order now. The system had been online for 145 days which may have contributed to the issue occuring. -Kyle -------- Original Message -------- Subject: [EMC #8905]: nomad1 problem? Date: Tue, 10 Jun 2008 09:26:52 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200806101217.m5ACH4kx028481@mailrt1.ncep.noaa.gov> Jordan, For some reason, Nomad1 is not recognizing the storage array attached to nomad1. We have checked the connections and the hardware indicates everything is fine. I am going to unmount the drives in a short while and run some disk checks. This will require a reboot of the system. I will perform the unmounts at 9:45 a.m. this morning and the reboot shortly there after. -Kyle ----------- 23 MAY 2008 This means that NOMADS may not have data flow for all or part of this weekend... All, What: Production will switch from Mist to Dew. When: 2345Z (7:45 PM EDT) Fri May 23. Why: Due to planned power maintenance on the IBM campus in Gaithersburg, Mist will be placed on back up generator at 10 PM Fri May 23 and remain on generator through Sat May 24 at midday. It is anticipated that Mist will remain up and viable through the period. A Critical Weather Day remains in place through 12Z (8 AM EDT) Sat morning. Due to the above factors, production will be switched to Dew. Developer Impact: Developers will be switched from Dew to Mist beginning at 7:45 PM Fri May 23 and remain on Mist through 7:45 AM Tue May 27. It is anticipated that production will switch back to Mist Tue morning at 7:45 AM. Duration: 84 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch -------- Original Message -------- Subject: [Fwd: warning: Production may switchover to dew this weekend.] Date: Fri, 23 May 2008 14:48:01 -0400 From: Tammy Braun Organization: NOAA To: _NCEP All EMC FROM GEOFF DIMEGO: It looks like there may be a switchover between mist and dew this weekend. Eric saw a message while logging in and he confirmed it with Doris Pan. IBM is doing work on the power system in Gaithersburg. We knew this because they are taking haze & hpss down. Apparently, they are worried the power work will effect mist and want to (AT THE LAST MINUTE) move production to dew. I've complained to Don Avart ... Since we are in Critical Weather Day, they won't be able to do the change until it is lifted - maybe Saturday! I can't change this. I am powerless. If you have critical jobs or crons that have to be switched by hand when there is a switchover, you might want to look in on the machine situation this weekend. ---------- 08 MAY 2008 -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 8 May 08: Production Bufr Lib Test on Dew 18:00 - 03:00 Date: Wed, 07 May 2008 21:20:30 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 18:00 local 8 May 2008 until 03:00 9 May production will be conducting a parallel bufr lib test. During this time Dew will be inaccessible to development users. Beginning at 17:00 local all LoadLeveler classes will be drained. At 18:00 any remaining jobs will be cancelled. Once maintenance has been completed and all systems testing and validation have completed all LoadLeveler classes will be resumed. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ----------- 07 MAY 2008 The following means that there may be some disruption in the data flow for nomad1, 3, 5 on 9MAY2008: -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Production switch to Dew from Mist Date: Wed, 07 May 2008 06:49:23 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Production will switch from Mist to Dew. When: 1045Z (645 AM) Wed May 7 through 13Z (9 AM) Fri May 9. Why: NOAA COOP Exercise. Developer Impact: Developers will switch from Dew to Mist for the duration of the period. Duration: ~50 hours NOTE: More information is forthcoming on the scheduled bufr library test on Dew which was scheduled from 6 PM Wed May 7 through 3 AM Thu May 8. SDM - Joe Carr ----------- 28 APR 2008 A (premature) switch back to dew development after emergency: Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Availability of Dew Date: Mon, 28 Apr 2008 14:23:17 -0400 From: SDM To: _NCEP.List SP-Announce All, What: The mirroring of production data from Mist to Dew continues. When: The mirroring process is expected to last until about 29/0000Z. Why: The process is required as a result of an emergency switch to Mist Sunday morning April 27, and the power down of Dew at that time. Developer Impact: Developers are not expected to have complete access to all data until the mirroring process is complete. SDM - Bill Kneas ----------- 27 APR 2008 Following from Central Operaions indicating that there will be no data flow for nomad1, 3, 5. Use nomads6.ncdc.noaa.gov and nomads.ncdc.noaa.gov View message header detail SDM Sent Sunday, April 27, 2008 12:29 pm To "_NCEP.List SP-Announce" Subject [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Dew Out of Service All, What: Due to a power problem at the Fairmont Site the Dew computer has been powered down. When: Power loss was approximately 1130Z (7:30 A.M) Sunday April 27, 2008 Why: Dew was shut down. The power interruption caused a loss of cooling to the facility. Developer Impact: Developers will not have access to Dew until further notice. Duration: Unknown SDM - Bill Kneas ----------- 09 APR 2008 The message below implies that on 20080409 nomad1, 3, 5 will not receive data. We have switched development and production machines last week (sorry I did not announce this), and switching back will casue no data access for a day. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] SP-ANNOUNCE LIST NOTIFICATION - Planned Mist maintenance work today Date: Wed, 09 Apr 2008 07:19:45 -0400 From: SDM To: _NCEP.List SP-Announce All, What: Mist maintenance work has begun. IBM began draining the Mist nodes at 06:30 AM and will take the entire system by 08:00 AM. When: 06:30 AM Wed Apr 9 through approximately 08:00 AM Thu Apr 10. Why: Quarterly maintenance on Mist Developer Impact: Developers will not have access to Mist during the maintenance window. Duration: ~24 hours ---------- 25 MAR 2008 nomad3 is returned to service with a grib2 feature for ftp2u called g2sub which we are testing on GFS output. The grib1 holdings are still present as before. Also: -------- Original Message -------- Subject: [EMC #8090]: stale nfs handle Date: Tue, 25 Mar 2008 12:02:54 -0400 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: Jordan, Nomad5 has been updated and rebooted. -Kyle ----------- 21 MAR 2008 nomad5 will be rebooted (after over 285 days of running) on Tuesday, March 25, 2008 at 1300Z. We did not boot it last Dec because nomad3 was rebooted and did not come back and we had to deal with it. (*j*) ----------- 18 MAR 2008 All, (The following means that nomad1,3,5 data sets will be delayed/missing 3/20-21. Use nomads6.ncdc.noaa.govi/ncep_data or nomads.ncdc.noaa.gov.) Dew maintenance work scheduled for Wed Mar 19 is delayed by one day. Dew will not be available to developers during the maintenance window. When: 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Why: Critical Weather Day was declared through 12Z (8 AM) Thu Mar 20. Developer Impact: Developers will not have access to Dew from 10Z (6 AM) Thu Mar 20 through 13Z (9 AM) Fri Mar 21. Duration: 27 hours SDM - Joe Carr Senior Duty Meteorologist Senior Duty Meteorologist NCEP Central Operations Production Management Branch ----------- 29 JAN 2008 After security/firewall problems are worked out (any day now) a new nomads server is coming up at address: http://nomad1.ncep.noaa.gov nomad1 contains is own independent copy of the NCEP reanaysis (unlike nomad5 which pointed to nomad3). 0.5 degree GFS and SREF are already present and operating with more data sets to follow. Tests have been completed with these datasets, and we will work to get the rest of the datasets on nomad1 as well as resolving security/firewall problems so outside users can use the server. The server should be accessible soon as it is in the hands of sys admin security. We hope nomad3 will return to service but we do not know what is keeping it from restructuring to raid5 with new drives. nomad3 server which holds 2/3 of NOMADS real time data has not been working since Dec 24 2007. nomad3 has been "broken" since xmas when the power was found off and a subsequent restart showed a bad drive. New disks were placed in the raid5 but the raid would not restructure the disk meaning that the system was no longer a "raid(5)" and the next disk that was lost would cause all the system and data to be lost. We have saved off the code/data and the sys admins are working to report a new system. nomad5 continues to hold most of the data but reanalysis and some other data sets are not present. It has been running as a lone server since Dec 24. NCO (Last June) decreased the bandwidth of all NOMADS servers because of the possibility of NOMADS interfering with operations whenever an IBM-SP swap of prod and dev needs to be done. Even though an IBM-SP swap does not happen often, NCO felt that the increased all around usage of the network required that NOMADS bandwidth remain throttled. This may contribute to users having problems downloading data that is present. Some data like SREF is not on the nomads6 backup at NCDC since band width has been decreased. nomad3 continues to be worked on by EMC sys admin. Efforts to make NOMADS operational and move applications to the WOC/ftpprd with 24/7 service and improved reliability and band width continue and implementation is on schedule for end of this summer. ----------- 11 JAN 2008 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Thu, 10 Jan 2008 23:44:54 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: Jordan, Just an update, after talking to Yinka up in NCO, he agreed that one of the disks should be replaced. The replacement disk that I put into Nomad3 was a used disk that was labeled as a replacement. He and I are going to wait until tomorrow to see if the disks arrive, if they do not, then we are going to recompile the driver and reinstall it. -Kyle ------------------------- 07 JAN 2008 Sorry. On Friday PM nomad3 would not answer or allow a login. A message was sent to emc.helpdesk@cerberus.ncep.noaa.gov. -------------------------- 04 JAN 2008 16Z nomad3 is operating. GDS/OPENDAP(DODS) will come up a few hours (waiting for the data logjam to ease). -------------------------- 31 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Mon, 31 Dec 2007 11:11:56 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov, Kyle.Nevins@noaa.gov References: We are currently rebuilding the two raid5s on the system as the disks were reporting not in use and not that they were dead. We are also running another fsck on /raid2. We have shut down all network connections on the machine to ensure that no outside interference occurs. We will keep you updated upon further details. -Kyle ------------ 27 DEC 2007 nomad3 status: From EMC Helpdesk: 15:34EST: The system was rebooted again and the root filesystem and the raid1 filesystem checked out as clean but the raid2 is still running the file system check. We will let that run overnight and may need some input tomorrow. Once that has completed the machine should be back online. So our target time for Nomad3 to be back online is tomorrow afternoon. fsck is still running as 0800 Thursday on file system #2. Unfortunately,I cannot give you an accurate time frame for the disk repair. However,Kyle should be in by 0930, once he arrives we will make this issue our focal point today. ----------- 26 DEC 2007 -------- Original Message -------- Subject: [EMC #7042]: nomad3 not responding Date: Wed, 26 Dec 2007 09:45:58 -0500 From: EMC Helpdesk Reply-To: EMC Helpdesk To: Jordan.Alpert@noaa.gov References: <200712261259.lBQCx78F028930@mailrt1.ncep.noaa.gov> When I arrived this morning, I received warnings regarding nomad3, after inspection in the sever room, I noticed that nomad3 was not powered on.Nomad3 is currently powered up, however, disk checks will delay the progress of it being reachable for now. ----------- 28 NOV 2007 28 Nov 07 Mist outage extended 6 hours Due to unforeseen circumstances, development access to Mist will be delayed an additional 6 hours. Upon completion of testing, notification will go out via ncep.list.sp-announce@noaa.gov. ----------- 27 NOV 2007 From ncep.list.sp-announce@noaa.gov .... 24 hour scheduled outage on Mist 11/27/07. Beginning 06:30 on 11/27/07 all jobs on Mist will be drained. At 07:30 all users will be logged off and all remaining LoadLeveler jobs will be cancelled. Upon completion of maintenance and testing, a parallel production test will be run. Development access to Mist will be restored approximately 07:30 11/28/07. Notification will go out via ncep.list.sp-announce@noaa.gov. (This means that on 27NOV2007, real time NCEP NOMADS servers, nomad3 & 5 may have an interruption in data flow during this time.) ----------- 06 NOV 2007 This (below) means data flow on nomad5 and 3 may be late on 11/07/2007 for a number of hours before 12z: Dew will be unavailable beginning 04:00 on 11/07/07. All non-production jobs on Dew will be drained beginning 03:00 local. Beginning 04:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, a 6 hour parallel production test will be run for the the 12Z cycle. Upon completion of the 12Z test cycle users will be allowed on Dew. Notification will go out via sp-announce@noaa.gov upon completion of maintenance and testing. ----------- 31 OCT 2007 nomad5 has been up/running 139 days and nomad3 has been up 98 days. We like to reboot servers every quarter so at 3PM today we will reboot nomad3 and then nomad5. ----------- 15 OCT 2007 Change of date/time see below and 09 OCT 2007... ********* UPDATE: System Maintenance on Dew Pushed Back 1 Week *************** A 24 hour maintenance period is scheduled for Dew on 10/23/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. ----------- 09 OCT 2007 The following means that on 16OCT2007, NCEP NOMADS servers nomad3 & 5 most likely will have an interruption in data flow during this time. http://nomads6.ncdc.noaa.gov and http://nomads.ncdc.noaa.gov should be unaffected: ------------- Subject: [NCEP.List.SP-Announce] 16 Oct 07: Dew Scheduled Maintenance 16 Oct 2007 Date: Tue, 09 Oct 2007 08:45:58 -0400 ------------- A 24 hour maintenance period is scheduled for Dew on 10/16/07. Non-Production jobs on Dew will be drained beginning 07:00 local. Beginning 08:00 any remaining jobs will be cancelled and all users on Dew will be logged off. Upon completion of maintenance and system testing and validation, users will be allowed back on Dew. This work is not anticipated to take the entire 24 hour maintenance period. Notification will go out via sp-announce@noaa.gov upon completion of maintenance. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce ------------------------------------------------------------------ 11 SEP 2007 There are changes to the GFS post on 9/25. See http://www.nws.noaa.gov/om/notification/tin07-59gfs_upgrade_unifiedpost.txt for an official statement. The GFS 0.5 degree "master" file, on 9/25, which was a GRIB1 file, will not be made in the same way anymore, but there will be a new file to replace it on the IBM-SP. The file that is currently copied from the dev IBM-SP machine known as "0.5 degree master" with 48 levels and land surface and other fields will not be there any more.... but there will be a replacement. There will be a feed to NCDC through ftpprd 0.5 degree file: The file on ftpprd will be composed of the ...0p5... file (sometimes called the "military" file) which has 28 layers compared to the 48 layers of the nomad3 master file (and some land surface fields), and the difference between these two files in one ftpprd file so it will be --should be -- the same, except the new file is in GRIB2. NCDC should get this on their ingest system and the potential for, and planning for, a 0.5 degree data set archive there, the first of its kind, for this data set. In addition, I hope to have a copy on the real time backup server, nomads6.ncdc.noaa.gov in GRIB1 so ftp2/4u and DODS works, as well as for real time backup of nomad3. These files will not be available to the public from ftpprd. On nomad3 & 5, starting 9/25, the old 0.5 degree master file will be replaced by the ...pgrb2... (the "military" 0.5 degree) and the difference between this "file and the old master (in) from a separate file, .....pgrb2b.... which is being placed on the IBM-SP. Our plan is to get both GRIB2 files, change them to grib1, and append them, and name them so the same "master" file data set will continue on nomad3. The name "master" now refers to an internal (native model vertical and horizontal grid) GFS gaussian model (hybrid) vertical coordinate grid or GFS "physics grid" file (this is not a lon/lat pressure GRIB1 file!). It is unfortunate that we also used that name for the 0.5 degree pressure lon/lat grid. Ultimately in the future, all GFS files/products will be posted/made from this master and unify the post processing code for all NCEP models. The NOMADS goal here is to make this transition transparent. We will keep our 0.5 master file name the same and the contents should also be the same. ------------------------------------------------------------------ 12 July 2007 Recalling the 03 July 2007 announcement from the SP: > 16 July 07:30 local, production will switch from Dew Back to Mist. This means on July 16, at 0730 the data flow will be interrupted and that data may be delayed or unavailable for a time on nomad3 and nomad5. ------------------------------------------------------------------- 03 July 2007 This means data may not be present on NOMADS for this period! http://nomads6.ncdc.noaa.gov/ncep_data backup server will continue to operate. -------- Original Message -------- Subject: [NCEP.List.SP-Announce] 10 Jul 07: Updated CCS Maintenance Schedule Date: Tue, 03 Jul 2007 10:38:13 -0400 From: Don Avart (sysadmin) To: NCEP.List.SP-Announce@noaa.gov Beginning 07:30 local Tuesday July 10 through 07:30 local Thursday July 12, Mist will be unavailable for scheduled system maintenance. In the event that work concludes early, the system will be returned to the users and notification will go out via ncep.list.sp-announce@noaa.gov. Upon completion of this maintenance, Mist will continue to operate as the development cluster. 16 July 07:30 local, production will switch from Dew Back to Mist. 23 July 07:30 local, LSI patch will be applied to Dew Storage. This work is concurrent and should not impact users on Dew. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------------------------------------------------------------------- 21 JUN 2007: Updated ftp2u to 0.8.0 beta (1) reduce incidence of premature "done" of web pages (I think the server should be updated to fix this problem.) (2) code cleanup (3) remove option to send files to user updated reanalyses and gdas ftp2u only. other updated code is in !wd23ja/cgi/ Wesley -------------------------------------------------------------------- 15 JUN 2007 Subject: Re: NOMADS Network Usage Date: Thu, 14 Jun 2007 16:29:21 -0400 From: Luis Cano Organization: DOC/NOAA/NWS/NCEP Louis,: Here is our current status. We have implemented the rate-limiting between the WWB NOMADS and the NOAA NOC (Internet). We see relief with the infrastructure and this component of the infrastructure is now better configured to allow proper sharing of resources. Jordon, Please let me know if there is any feedback from customers of degraded services. Thank you, Lou Luis Cano wrote: > Louis and etal: > > We are experiencing a two-fold increase of WWB NOMADS traffic to the > Internet that started two weeks ago. This usage is placing other > requirements that share the same networks to NOAA NOC at risk. In > addition, we are also experience higher-than-expected latencies with the > CCS production dataflow to the TOC. > > Here is our plan: > > 1. Today at 11:00 Eastern, we will conduct a test of rate-limiting the > NOMADS (DMZ) to an acceptable rate. This will allow NOMADS to better > share common infrastructure with other requirements. This change has the > potential of increasing transfer times to NOMADS customers. The change > will become permanent assuming a valid solution. > > 2. In parallel, we are investigating the lower latency issues with the > TOC. We will have a better understanding of this problem by this afternoon. > > I'll send a follow-up status Email by 3:00. Please call my cell if there > are questions: 202-345-7384. > > Thank you, > > Lou > -------------------------------------------------------------------- 31 MAY 2007 20070531: nomad3 and nomad5 servers are back on line, that is access to the servers has been restored. Data was being transmitted to nomad3 and nomad5 during the outage period. Most of the model data is present, back to (and before) May 14, except for a few days missing, and these appear to be from external problems with the IBM-SP when operations had to move to the development system, or when system administration had taken the servers. In all, the problem seems to have been in firewall conflicts that happened with the firewall settings. Some items that are still not operating: The network communications between servers, nomad5 and nomad3 are not yet up so a few data sets like the 0.5 degree master is not available for datasets shared between servers. Please Check both nomad3 and nomad5 for data sets until this can be resolved. Join the NOMADS (NCEP) list server https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.nomads-announce to get updates about problems and changes. -------------------------------------------------------------------- 01 MAY 2007 GFS implementation day, 01MAY2007 in case you forgot. GFS native history file changes: The GFS restart or history file (also called sigma or sigma spectral) is changing due to changes in the vertical coordinate as described below. This file is considered an internal file and is not recommended for public use. These changes should not impact most NOMADS users. This implementation of the GFS goes into operations 01MAY2007 12Z. An excerpt from "A guide to using the new GFS history file" located at http://wwwt.emc.ncep.noaa.gov/gmb/para/guidehistory/ is below: The vertical coordinate of the operational GFS forecast model will become a hybrid sigma-pressure coordinate in 2007. This will affect the file structure of the native GFS history files used by many other applications. In addition, the GRIB surface flux files will have several more fields. The implementation will not affect the GFS surface files or the posted pressure files {pressure-GRIB files}. The GFS restart files will be in an even newer format to accommodate coming anticipated changes to the GFS in succeeding implementations. No application outside of the global system needs to read GFS restart files In the near future, the GFS will output Gaussian grid files as the history files. Unfortunately, we are not ready yet to make them operational, so yet another conversion will be necessary when these files are implemented. -------------------------------------------------------------------- 23 APR 2007 There was an unannounced (power) outage in our central computer, and all development and data flow was down most of the day. The following day some model output files were also missing. It caused a gap in some data on NOMADS. We mention it here, a week later, for completness. -------------------------------------------------------------------- 21 FEB 2007 All on the list; The NCEP Operations switched to the development system and is having a problem reseting the firewall access for NOMADS data flow. NCO is working on the problem. I have shifted into a backup mode (ftpprd) and will try to get the 0.5 and nam fields operating tomorrow (2/22). The 1x1 should be OK on nomad3 and 5. Jordan -------------------------------------------------------------------- 25 JAN 2007 Large scale super computer changes are taking place at NCEP. As you can see from the message below (date stamp included) it is out with the old super computer system and in with the new. The new Dew supercomputer, as it is called, did not have access to NOMADS servers until Jan 24 so we are working to get the data flow moving again. It would have been better to have the data flow running on the old Blue supercomputer for a few days overlap with the new system so we could make the move transparent, but as NOMADS is an experimental prototype this was not to be. I can report that there has been progress on making NCEP Real Time NOMADS servers have operational data flow and operational user client applicaitons. This may happen by 2008. NOMADS has tried to keep the most used data sets like GFS (1x1) (0.5) and NAM up to date first but some of the less used data sets, like the MRF (legacy) 2.5 degree data set will not get updated for awhile. -------- Original Message -------- Subject: nomads Date: Wed, 24 Jan 2007 14:43:57 -0500 From: Joe Carr To: Jordan Alpert CC: Brent Gordon , John Ward Jordan, NOMADS has been turned off on both Blue and Mist. It is allowed on Dew. If you have any problems, please contact Matt Springer or Cameron Shelton. Thanks, Joey -------------------------------------------------------------------- 06 Dec 06 NOMADS issues. > 1. NCO will switch to Blue for operations tomorrow [06 Dec see below]and when that > happens we will not have enough bandwidth to support both operations > and NOMADS traffic. NOMADS will be out of service until Friday. Even > when NOMADS is back on line, only a little more than 50% of the NOMADS > data is available [on alternate offical servers]. > > 2. This will be an ongoing problem until the "new" TOC is up and > running. They are currently using their old system. Once the TOC is > up and running the NOMADS data will be stored at the TOC's Web > Operations Center and NCDC can pull nearly 100% of the NOMADS data > from the Web Operations Center. > > 3. The CIO is having major problems getting the new system fully on > line. If I recall correctly, the new systems was supposed to be fully > operational in Jan. 06. Ben (NCO) has agreed to send people to the > TOC to help them resolve their problems. > ----------------------------------------------------------------------- 05 Dec 2006 -- NOMADS DATA FLOW OUTAGE -------- Original Message -------- Subject: [NCEP.List.SP-Announce] Bue/White: Production Switch to Blue Date: Tue, 05 Dec 2006 15:22:18 -0500 From: Don Avart To: NCEP.List.SP-Announce@noaa.gov Due to a required network outage in Fairmont between midnight and 6 am on December 8, production will be switched from white to blue beginning 7 am local on 6 Dec until 7 am 8 Dec. Beginning at 5 am 6 Dec. LoadLeveler queues will be drained, blue will be rebooted at 6 am. Only NCEP Production and operational accounts for the NCEP Service Centers will be permitted on the system. All user accounts, cron, and interactive access will be denied. NFS will only be mounted on interactive nodes. White will remain available for user access except during the network outage. ---------------------------------------------------------------------- 11 Nov 06 IBM-SP (Blue) data flow returns to nomad3 and nomad5 Many of the data sets you need are now available on nomad3 and nomad5. Data flow ramps up to almost normal for nomad3 and nomad5 6NOV. All parties have agreed to a long term plan for making NOMADS Operational. Some data sets are not yet transmitted, such as olr, sst, rtofs, sref etc, and we are working to get these back to normal, perhaps in a week. Check data on nomad3 or nomad5 before giving up. I can not promise that missed data in all cases will be replinished but we will see what we can do. In the short term, the data flow to the backup server at National Climate Data Center (NCDC) will not resume from the "dev" machine, but can be pulled from the ftpprd service. nomads6.ncdc.noaa.gov will still operate for archived data. (We hope that) ftpprd holdings will be improved to have more complete data sets with the goal of duplicating the content of variables, levels, times, that NOMADS presented before the outage. We will write programs and attempt to populate nomads6.ncdc... from ftpprd but it will take a little more time. Having data at the backup server, nomads6.ncdc... in real time, as well as the NCEP servers, nomad3 and 5, kept these systems from becoming over extended. Thank you all for your support and patients. The message I want to send is that NOAA management recognizes the importance of getting data out to users of all categories and is committed to making NOMADS Operational, 24/7/365. It is your requirements that are driving this process. ---------------------------------------------------------------------- 26 Oct 06 Blue/White: Production Switch to Blue A production switch from White to Blue will occur beginning 6 am Thursday October 26 and ending 2 pm Thursday October 26, (8 hours). During this time period, only NCEP production and operational accounts for the NCEP Service Centers will be permitted on the system. All nfs mounted filesystems will be dismounted from compute nodes including /u (user home directories). NFS will be available on Interactive and Class 1 nodes. White will not be available for development use during this time period. _______________________________________________ NCEP.List.SP-Announce mailing list NCEP.List.SP-Announce@lstsrv.ncep.noaa.gov https://lstsrv.ncep.noaa.gov/mailman/listinfo/ncep.list.sp-announce -------------------------------------------------------------------- announce.txt : 20061016 -------- Original Message -------- Subject: Production Switch to White Date: Sun, 15 Oct 2006 17:46:02 -0400 From: Susan.Fenwick@noaa.gov To: ncep.all.hands@noaa.gov CC: John.Ward@noaa.gov > Corruption of the GPFS file system on White prevented the schedule > switch of Production to White on Saturday. GPFS has been > restored, but > the entire Production file system was lost. The file system is > currently being mirrored from Blue. > > Production is expected to be switch to White by 12Z on Tuesday, 17 > October. > -------------------------------------------------------------------- announce.txt : 20061012 FYI SJL -------- Original Message -------- Subject: Access To Blue Date: Thu, 12 Oct 2006 06:57:02 -0400 From: John Ward Organization: NCEP/NCO/Production Management Branch To: Stephen Lord , Jim Laver Steve & Jim, We were not able to turn on the limited list of users on Blue yesterday. We have been pushing the limit on the system this week, with on time delivery at only 94%. In addition, we have had unexpected network contention with Mist, which caused lengthly delays in delivering products. We feel that adding any additional load to Blue will cause additional delays in production and on the network. The good news is that work is ahead of schedule in Fairmont. There is a chance we will have White back on line 24 hours earlier than expected. We'll have a better estimate latter this morning. John ------------------------------------------------------- announce.txt : 20061009 All: Dave Michaud has informed me that the earliest date when EMC jobs will be turned on is Tuesday 10 October. Earlier dates proposed by Dave were rejected by NCO Configuration Board. Dave will contact you individually regarding turning on your jobs. If he doesn't contact you, your stuff will not run. I'm sorry for this situation. It is out of my control. Please pass the word if I have left someone off this email list. SJL