Sunday, August 27. 2006Wildland Discovery Hike: Dishpan GapUpdate: I made a Topo map of the hike with elevation profile. It's an 11" x 17" format PDF created with National Geographic's Topo set for Washington state. You can download the PDF by [clicking here] (7.5 MB). Friday, August 25. 2006Confab - August 24, 2006.
My "online dating" update was lackluster. No news to report due to focus on other projects. I better get cracking before I have to join one of those senior citizen sites. Guest List: Brian Gaither, David Goldstein, Schelley Olhava, Gavin Shearer, and Keith Vaitkus. Listen to the show at: www.confabshow.com. Sunday, August 20. 2006Data Center Update: Cooling, Power, Environmental Monitoring.
Over the past week I've finished quite a few projects at the data center. These were the sorts of projects that took months to coordinate (as evidenced by my March 5th, 2006 post about being "halfway there"). I wish I could claim that all of these projects were proactive; that they were borne out of my impeccable engineering competence. Sadly, I have to admit that some of them were provoked by necessity.
Nathan Rolander sheds some light on the fundamental issue in his December 2005 masters thesis, "An Approach For The Robust Design of Air Cooled Data Center Server Cabinets", when he states: "A lifecycle mismatch is present in data center operation. This is because data centers receive new high powered [sic] servers every 2 to 3 years, whereas the center infrastructure is only upgraded on the order of every 25 years. This means that the center must be reconfigured to handle the increased heat load quite frequently, and after a few iterations of the process the center is required to dissipate far greater loads than initially intended." Since heat production in a data center is intimately tied to power consumption (every Watt of power used produces approximately 3.4 BTUs), the lifecycle mismatch should be understood to encompass both cooling and power. This is definitely true in our case, and is a problem exacerbated by what amounts to "absentee landlords" at most data center facilities. I don't wish to place all the blame on data center operators, but they really should be taking a position of leadership in this situation. The power outages on July 24th and July 28th at the Garland Building in Los Angeles, California (knocking out DreamHost and MySpace), and the power outage on July 30th at the Fisher Plaza building in Seattle, Washington (knocking out LiveJournal and us [Geckowerx]), are - at a minimum - ominous. As background, each cabinet (28" W x 36" D x 84" H) at our data center is equipped with one 20 Amp primary and one 20 Amp secondary AC power circuit. The total power available to a cabinet should be understood as 20 Amps - not 40 Amps. (The secondary circuit is intended for redundancy purposes.) The cabinets are cooled using the vertical flow design, where chilled air enters the cabinet through an opening in the base and exits through a fan (rated at 500 CFM) in the top of the cabinet. The doors and sides of the cabinets are not perforated, whereas they would be perforated in a horizontal flow design. It's important to note that - unlike power - cooling is an issue of degrees. (Yes, I'm sorry. That's a pun.) Data center equipment can operate within a fairly permissive temperature range (e.g., 41F to 104F, in the case of our servers); however, power is usually present or not present - there's very little leniency for "partial power". For this reason (and because heat production and power consumption are so closely correlated), adequate cooling tends to receive attention only after power shortages occur. The attention given our cooling and power adequacy happened, coincidentally, in parallel. Following an upgrade to several servers, I become concerned with what appeared (anecdotally) to be a pooling of very hot air in the top of a cabinet. Not wanting to rely on anecdote, I ordered and installed AVTECH's TemPageR 4E with one sensor at the base of the cabinet and one sensor at the top (both at the rear). The empirical evidence confirmed the anecdote. The cabinet was experiencing a differential temperature of more than 20F (which is considered the maximum differential you would want in an enclosure of this sort). After some trial and error, I settled on Delphi's Enclosure Blower (rated at 250 CFM) and sealed any empty rack spaces with blanks. Although this did not exactly resolve the pooling of hot air in the top of the cabinet, it did ensure that each device - despite its location in the cabinet - received a supply of chilled air. (Note: Since most of the equipment has internal temperature sensors that can be queried, I know whether or not each device is operating within its specified temperature range.) Adding two more temperature sensors, this time to the front of the cabinet, confirmed that the Delphi Enclosure Blower was effective. In fact, the temperature at the top, front of the cabinet is about 5F lower than its bottom, front. This is clearly the result of keeping the bottom temperature sensor outside of the blower's direct airflow, whereas the top temperature sensor cannot avoid the chilled air delivered by the blower. "What about power," you ask? It was during some of the server upgrades responsible for the additional heat that I tripped the primary 20 Amp circuit during a boot cycle. It's not a pretty sight to watch a cabinet full of equipment lose power in an instant. Running everything with journaled filesystems and really good backups meant that I didn't need to pop the cyanide pill that all systems engineers and some administrators (the smart ones) carry with them in case of a really catastrophic failure of their own doing. Nonetheless, it was a serious slap across the face. My power requirement calculations were obviously off, even accounting for the inrush demand of booting. Two tasks were necessitated by the revised power estimates. First, we needed to start collecting empirical power usage data (similar to what we had already started doing for the temperature). Power estimates are just that: estimates. Real-time data is the experiment that can validate or invalidate your estimates (a.k.a., hypothesis). Second, we needed larger circuits or additional cabinets. The planned power upgrade and installation of the smart PDUs gave me an opportunity to test another power enhancement for the cabinet. Even though much of the data center equipment contains redundant power supplies, certain items (such as high-density servers) are notorious for their lack of power redundancy. A potential solution, which I'd had no prior experience with, is the "dual input PDU". The dual input PDU accepts power from two sources, and switches between them in less than one cycle if power is lost or restored on the primary source. Although dual input PDUs are no replacement for redundant power supplies, they do eliminate a single point of failure that exists when they aren't present. They also allow for complete dependence on a primary circuit, while failing all equipment over to the secondary circuit only when the primary fails. I was a little skeptical at first, but the engineers at Pulizzi were incredibly knowledgeable and informative. I purchased three of the EMF/RFI filtered TPC2234s for the cabinet getting the 30 Amp circuit upgrade. They've been tested under heavy load since installation, and operate exactly as advertised. Friday, August 11. 2006Confab - August 10, 2006.Confab continues its summer trend of light-hitting topics with this week's discussion about high-maintenance girls, appropriate dress for theatre, who pays for dinner on a date, and dating up or down (socio-economically speaking). All of this talk appears to be Gavin's way of prying the latest news out of me about my "online dating" exercise. The consensus is that I will need much editorial help with my "profile". Guests: Rachele Cawaring, Elaine Chu, Nick Onken, Gavin Shearer, and Keith Vaitkus. Partake in the madness at: www.confabshow.com.
(Page 1 of 1, totaling 4 entries)
|
Calendar
QuicksearchCategoriesSyndicate This BlogSocial Network
|
|||||||||||||||||||||||||||||||||||||||||||||||||