Q & A on the FeedLounge DNS Saga : alexking.org

From some of the comments on my previous post, I think I haven’t done a very good job explaining what happened and the nature of how this has affected FeedLounge. I’ll try to do this here, in the form of an O’Grady-style Q & A.

Are you telling me your server literally burned up?

Yes, the fan that was supposed to keep the CPU cool stopped working. The CPU overheated and burned a little CPU-sized crater in the motherboard.

So FeedLounge was running on this server?

No, neither FeedLounge or the feedlounge.com web site were running on the server that burned up.

Umm, then why can’t I get to the ‘Lounge or the feedlounge.com web site?

The dead server was the primary DNS server for feedlounge.com – it was the magic beans that told the rest of the internet where to go when you typed ‘feedlounge.com’ into your browser.

Wow, this box was responsible for making feedlounge.com accessible to the rest of the internet and you only had one of these?

Actually, we had three of them. We had the box that burned, and two other separate boxes hosted by Austin Web Development.

I don’t get it. If there was more than one box, then why didn’t the back up boxes kick in?

About a week ago, Austin Web Development re-configured their DNS servers. They gave us warning beforehand and we checked through all the sites on the now-crispy server to make sure we weren’t relying on the Austin DNS for any of them.

Why didn’t you also check this for feedlounge.com?

Unfortunately, the answer is really simple – we forgot. We moved the feedlounge.com web site to a dedicated server in a data center in New Jersey last summer around the same time we move the ‘Lounge onto our big servers in our rack space in San Francisco.

Since feedlounge.com was no longer on boxes at the Austin data center, we didn’t think to check the DNS records for feedlounge.com.

Do you now feel that was monumentally stupid?

Um, yeah. And then some.

Are you saying that if this had happened a week ago, it wouldn’t have caused any trouble?

Most likely, yes. We’d have replaced the fried box just like we’ve done, but the backup DNS servers would have shouldered the load while we did so.

Sounds like you guys should have paid more attention to this.

Agreed. Lesson learned – the hard way.

So why is it taking so long for FeedLounge to become available again?

DNS is a bit complicated. Scott has written up a great post on this on the FeedLounge blog. The root of the problem is that a DNS change can take 24-48 hours to propagate throughout the internet.

That’s ridiculous, can’t you speed that up?

Unfortunately, no. I really wish we could.

Why can’t you put up a message saying that the server is down – at least an explanation for people?

That’s the real problem. The DNS information is what allows your ‘feedlounge.com’ request to go to the proper server. If we could get you routed to a server to see the message, we could just serve up the ‘Lounge and the feedlounge.com web site to you. The problem is getting you to the right place, not that FeedLounge is down.

How can you say that FeedLounge isn’t down? If I can’t get to it, it is down!

I can certainly understand how you could feel that way, but there is a little bit of a difference. The way the DNS…

Stop blaming this on DNS! I don’t care about the technical details – I care that I can’t get to FeedLounge.

Ok, let’s try an analogy.

Let’s say FeedLounge is a car (maybe a BMW). You drive to a nice restaurant and give the car and the keys to the valet – not thinking much of it. After dinner when you go get your keys, the valet tells you that there was an accident and unfortunately your key was broken.

“No problem” you think, “I gave my buddy my spare set of keys when I bought the car – I can just get the spare set.” Unfortunately, when you call him you find that your buddy left last week for vacation on a remote island – he’s not going to be able to help you.

Now which is more accurate in this situation?

My car is broken.
I can’t get into my car.

This is where we are with FeedLounge. The service isn’t down or broken, the problem is getting to it.

So are you saying this isn’t your fault?

Of course not – it’s definitely our fault. We weren’t diligent enough with our backup DNS servers and got burned¹ because of it.

It’s the same way you’d be responsible for not having the forethought to make sure your spare set of keys was available “just in case” when you valet’ed your car.

So if the service isn’t down, there has to be some way for me to get to it.

If you’re having trouble accessing the server, you can add a DNS server listing that has the proper information for FeedLounge to your computer’s list of DNS servers. That DNS server is “65.90.218.228”, however we’ve gotten lots of reports from folks who are already back in the ‘Lounge already so hopefully it won’t be long now for anyone still affected.

I think it’s totally unreasonable to ask me to do this.

I wish there was another way, but this will give you access to FeedLounge now. The alternative is to wait for the DNS changes to reach your DNS servers. This is how DNS works.

So what do you do now?

Besides offering an apology to our users, there isn’t much we can do. We have to wait for the changes we’ve already made to take effect.

Can you at least promise this won’t happen again?

Unfortunately, no. In fact (as I’ve said before), I’m quite sure it will happen again at some point. What we can promise is that we’ll continue to take responsability and to be open and honest about what is going on – even when things go wrong and being honest makes us look stupid.

As hard as we try to make sure they don’t, web services go down or are unreachable at times. This time was our fault for not having working backup DNS entries. While I think it’s highly unlikely this particular problem will surface again, the last time folks had trouble accessing FeedLounge was due to a DDOS attack on Live Journal that affected our entire data center; unfortunately that is hardly something we can promise won’t happen again.

That is totally unacceptable for a service I’m paying for!

I’m sorry you feel that way, and it’s not an uncommon response whenever there is a problem with FeedLounge or any other paid service.

We work really hard to keep the service up and running all the time, and I feel our track record in our first 4 months of service is excellent. However, the fact of the matter is that this is a service funded by Alex and Scott, and run by Alex and Scott. There is no Yahoo! or Google or VC with deep pockets to pay for mirrored data centers, or 24 hour IT staff managing the servers.

This is how bootstrapping works and we’ve tried hard to be transparent and upfront about this since the beginning. I hope that it doesn’t come as a surprise to anyone.

I’m not satisfied or happy about this.

Trust me, we’re not either.

Ok, now what?

It was a bad day for FeedLounge. With apologies to our users, we fix the problem and move forward. As we do, we’ll continue to work hard to make FeedLounge as reliable as possible and continue building the features our users are asking for.

Pun intended. [back]

This post is part of the project: FeedLounge. View the project timeline for more context on this post.