Monday, December 23. The office is buzzing, at least from the people that didn’t take off the whole week. No one can concentrate, it’s almost time for presents and food, and family, and more presents and food, and even a holiday trip for some. Everyone is talking about where they’re going, what they’ll be doing. Yeah, well, good for you all.
I’m an IT guy. You know what my presents are? Upgrading our Hadoop infrastructure to Hadoop 2.0 (YARN) in an office that for the week is mostly filled with IT guys (of course IT managers are also on vacation, leaving us plebes here). I’ll need to setup and install it on our shiny new rack, migrate all the data there, and cross my fingers the data gets crunching so that we won’t rollback. You know what kind of Christmas leftovers I get? Take out Chinese and cold pizza.
Anyway, it’s time to get this ball rolling.
First, I check that the hardware rack is fully working - machines, cables, switches, etc. Just found out that the machine that was going to host the Name Node is failing to boot due to hardware error...D’OH! It’s a bad memory card. Not the end of the world though, I can get a new one quickly. I call our vendor’s 24/7 support and they promised to send a brand new card with a courier tomorrow morning, right before the holiday.
Still getting the occasional technologically challenged person coming in to my office and telling me that they can’t print because of a paper jam. People think I can help them with printing because I’m "good with computers". Have they tried turning it off and on again?
Anyway, I’m finally able to get to my actual work - writing a rollback procedure. Since I’ll be using new hardware for a side-by-side upgrade there’s no downgrade headaches - I’ll simply revert to the previous rack with the old version if we can’t get it all to work in time. I’ll have to move new data back manually, no such thing as upwards compatibility, I’m afraid. I define the rollback conditions together with my boss - the mission critical processes that have run.
Next, I prepare migration scripts - distcp will take care of copying all data to the new clusters. I have a little script to copy all the Oozie jobs.
Final task for the day is to prepare configuration files - update mapred-site.xml and setup yarn-site.xml. I’ve already tackled them last week when I tested the Hadoop 2.0 upgrade on our staging environment.
Tuesday, December 24. Preparing the systems. I replace the bad memory card with a new one that just arrived via courier and start setting up the production Hadoop cluster. Thanks to my DevOps and their automation skills I get a good start. They have awesome scripts that update to the latest Linux, Java, SSH, Oozie, etc., so that everything is ready for the latest Hadoop. They’ve been working on them for ages, or so they say. Today they get lucky and don’t have to come in, but they deserve it. Enjoy, ladies and gents!
Wednesday, December 25. Not even I’m in the office. Tomorrow, yes. But for now...Merry Xmas!
Thursday, December 26. Tumbleweeds. This place is as dead as Chernobyl. Except Chernobyl has some sort of weird tourism thing going on and this place is as deserted as the rebel base on Dantooine. Just empty open space and cubicles. Don’t get me wrong, it’s not like they held a gun to my head making me work while my wife and kids are eating leftovers and playing with new toys. I’m not staying all day anyway, but I’ll be back Friday because no one will be here, so it’ll be the perfect time to continue setting up the cluster without any paper jams.
Good news - Hadoop YARN is up and running in secure mode on the new cluster! The configs just need some updating for the new network addresses and hardware resources. I check the daemons are working by executing jps and took a look at the log files. Everything is A-OK. Half-day accomplished.
Friday, December 27. Come to think of it, it’s actually kind of nice with no one around. I can play my music as loud as I want without anyone saying, "I’m gonna need you to go ahead and turn that down". I have all day to watch the progress of distcp migrating the data from the old cluster to the upgraded one.
It goes pretty well. I use HFTP to copy the data between the different cluster versions. It fails just five times after lots of data being migrated. To verify the copy was complete, I generate listings of the sources and destinatinations and cross-checked them, including data sizes and hashes. Finally, it’s done.
Monday, December 30. The last full day of 2013. How is this even possible? I feel like Strata conference and AWS re:Invent were just yesterday. Oh well. Today I’m going to migrate all data processing tasks to the new cluster. We use Oozie for scheduling all our workflow jobs. It takes forever to move all processes and check that they’re ok - many fail. If we don’t make the critical processes work by tomorrow, the rollback heat will be on me.
Tuesday, December 31. The most critical processes are working! Apparently many of the processes used hard coded values for stuff we had put in environment variables ages ago so I had to change all of that. I connect to the ResourceManager via the web interface. I see the jobs running and celebrate with a fresh mug of coffee. My boss is fine with that - according to our rollback policy this suffices. I’ll have to work overtime to get everything else working, but hey, at least I don’t have to rollback! The folks at accounting won’t get their reports tomorrow, poor them. I can go raise a glass of champagne for 2014 at the company meeting.
Mission effectively accomplished minus some loose ends. In the last week, I migrated our clusters to Hadoop YARN. I hit some snags thanks to Murphy's Law, but I managed to fix hardware errors, prepare a rollback procedure, setup a new cluster, install the new Hadoop, copy all data and processes, and make sure it’s working - my belated Christmas gift to the people upstairs.
Now though, it’s my turn to relax. I smell a Star Wars marathon, Chipotle, and a six-pack of Peak IPA in my future.