| « Assp and postfix and saslauthd | Æ Ø Å charset problem in Tivoli client » |
UPS #FAIL
Last fall I bought my first UPS'es for the sequencing centre.
Two rack mountable 3000VA 2u for the servers, 2 tower 3000VA for the sequencing machine and one tower 2200VA for the cluster station. All from APC.
The 3 tower based ones were requirements from the vendor of the sequencing machines. Those for the servers was because we are in a lousy facility for the time being and we might as well protect ourselves against power glitches. All the UPS'es are much to small to run through a longer power failure.
On the servers I set up apcupsd to monitor the two rack based UPS'es.
Earlier this week my apcupsd sent me an email telling me to "Change battery NOW!" on one of those. The web status monitor said the same and there is a red lamp glowing on the UPS.
Since this is my first UPS, I took a probably very naive approach to this. I thought that since this equipment is still under warranty, our vendor would send us a replacement for the faulty battery. This turned out not to be the case.
Our vendor sent us to the manufacturer. When I expressed the thought that it had been a lot more convenient for us if they could handle it themselves, they let us understand that APC has chosen this service model and the vendor was not allowed to do it. While I am not in a position to verify the truth of this statement, I have no reason to doubt it either.
I was able to open a support case through APC's web page. However, it did not get me a new battery.
Supposedly, to get a new battery, I have to do a test to find out if it is the battery or the UPS. Sounds reasonable, except the only way to find out is to run a so called manual calibration, which means running the UPS on battery with a load of at least 30% until the battery is empty. And the load crashes.
I can't really do that to my servers or storage. So, I'll have to magic some alternative load into being. Further, it is complicated by kit and science IT having attached their network equipment to my UPS, so I'll also have to find a place to plug in 4 or 5 switches meanwhile.
My first thought was electric kettles. But they stop after a few minutes. Smart people on my irc channel suggested I leave the lid open and just refill them if they run dry. But I don't like boiling water in my server room... The APC support guy who's week I've ruined, later suggested a heater or halogen lights. I don't have either - I am not going to bring my private home appliances in on the commuter train from Slagelse in order to be able to do my job - and I would need to cut the power chord and attach a plug that would go into the UPS. I don't even know it that is legal to do, and I cannot understand that such a weird Micky Mouse procedure should be required of every single UPS owner.
APC claims that I am supposed to do this anyway as maintenance every 6 month. Otherwise the UPS cannot know exactly how much time is left on the battery.
So, I thought that others would have had to run into this problem before, and asked around a bit. The IT guys I talked to certainly does not bother to do this kind of test. They just buy new batteries when they spot a UPS with a red lamp. They don't care if the UPS knows how much time is left on the battery, and neither do I. And we really don't have man power to fuss about every single UPS twice a year.
I also talked to some nice people from APC in Denmark who were much more understanding and very service minded, but the support guy told me afterwards that their suggestions would only work for larger systems, not our small UPS.
APC will not send me a new battery because they do not know whether it is the battery or the UPS that is faulty. I can respect that, and I would have done their trouble shooting if there had been any clean well described standard, non clown, non lamp/microwave/heater procedure I could follow - preferably without down time. But I can not respect that they send me out on a quest like that.
In my view it is their responsibility to make their UPS in a way so it can send a clear error message in the form of a blinking lamp, an alarm sound, some proprietary software monitoring program, whatever, but something clean and well defined. It is not my responsibility as a UPS owner to have a standby non critical load I can crash for them.
So, the "solution" is that we buy a new battery despite the fact that the product is under warranty. That is probably what they wanted anyway, to get out of their warranty obligation. We will just have to budget with that in the future and not bothering with APC support. (Or finding a different UPS manufacturer). We are probably to them what PC owners are to the large hardware manufacturers. Small and insignificant.
But I was wondering about other people's solutions to this.
Does anyone really do a UPS battery calibration every 6 months? Do people buy a couple of extra UPS'es to use in maintenance windows? Is it good for a UPS to stand by with little or no load most of the time, and will it then work when you need it?
We normally don't do down time at all if we can avoid it. We have people working at all hours and computers working 24/7, and we're not filthy rich. That combination seems to surprise the hardware vendors every time.
2 comments
I do not at this point of my understanding of UPS's see any point of buying more UPSs than I can load. You are welcome to explain to my why you think that should be a solution.
After the fact, the rather rude APC support guy rather accidentially did give me the hint I had been looking for. It should be possible, with the redundant power supplies in the servers, to put one on net power and the other one on the UPS and failover when the UPS runs dry. So the servers can be used as load.
When I wrote this post, I was hoping someone would tell me what other people do. My conclusion now is that most "small UPS owners" never the test and just buys a new battery when they have to. And that if you cannot afford one of the big solutions, you are basically screwed, backups or not.
Comments are closed for this post.