I got into work at 7:30am. As soon as I made it up to the 11th floor I started pouring rain outside. It rained very hard for about fifteen minutes and then suddenly it was sunny & blue skies. It was cool how such an isolated storm blew through so fast.
The ‘live’ data feed was turned on this morning. In other words this was the first time we saw the twenty perl scripts parsing all twenty files in realtime. The result of this was twenty different database updates taking place across two tables. Fortunately the frequency of the updates wasn’t saturating the database but it certainly had an effect.
Even though we have summary tables and have summarized all of the previous day’s worth of data, the two main tables are very large (about 11 million rows). When we switched the servlets to query against the real tables (now that we have a live feed) we saw very horrible performance. Just as we noticed this, we had to attend a 10am meeting.
During the meeting we explained the dismal query performance. We came up with a few ideas of things to try and planned on deleting all of the rows for the data currently summarized. Jeff Chambers had set up a computer with the 40-inch LCD display panel to demo the GUI. We dismissed the meeting early to continue work on the SQL.
I skipped lunch so we could continue working on the problem. I got a page from Alonzo asking for me to come to the 11East conference room. When I went inside there were a lot of important people there. The first one I noticed was the CEO of Delta Technology and CIO of Delta Airlines, Curtis Robb. Along with him were my senior vice president Mike Childress and another senior vice president. My vice-president Brent was driving the GUI. Fortunately it was working (but not updating due to the queries taking so long). I walked up to Alonzo and he mentioned to me that they want to get the computer out of kiosk mode in order to pull up the Brio reporting tool.
Curtis and the senior vice-presidents were very impressed with the GUI and the real-time analysis that it represented. Brent seemed to be having a good time explaining everything as well. When Brent asked how to get the computer out of kiosk mode I pulled up another instance of internet explorer for him. While he was demonstrating the reports and more importantly the GUI, I had a great sense of pride. All of our executives were very impressed with the work that we did over the past week.
When they dispersed I went back to working with Sameer. While we were waiting for the old data to be truncated from the database, I took the opportunity to set up an instance of Tomcat on our HPUX development unix server. I worked with Andy Mitchell from release engineering to get the installed version of tomcat imported into our clearcase vob as a set of clearcase elements. Sameer sent me a copy of all the files required to run on the webserver and I configured tomcat. Once I got this working and tested everything, Sameer and I started testing on this server instead of his workstation.
We noticed the queries still taking a long time. This didn’t make much sense since the HPUX server is an 8-processor machine with 4-8GB of RAM. Unfortunately it seems that we have to much activity on the server there aren’t enough resources to devote to java as we would like. We then decided to move back to the tomcat instance on Sameer’s NT4 desktop workstation.
It was frustrating to find out that the now-truncated table was still taking about 5-6 minutes per query. This was still entirely too slow. We made some more tweaks to the servlets and rebooted the tomcat server. I walked over to the 11East conference room to restart the client for the GUI. I knew that the slow response times would mean that it could take thirty minutes to come online.
When I was about to leave the conference room Brent walked in and informed me that Curtis and Mike were coming back up to take another look at the GUI. When he told me this, my heart sank because I knew it would be quite some time before the GUI was running. I asked Brent to stall them as much as possible to buy us time.
Unfortunately Curtis and Mike walked in shortly after that. They questioned why the GUI wasn’t running and I explained that we’re tweaking the database queries still and it takes some time to load the very first time. They decided that they would just wait. The minutes ticked by as everyone stood in front of the display’s white screen waiting for something to draw. Fortunately Brent offered to demo the Brio reports while we wait.
When Brent did this I went over to Sameer’s cube to update him on the situation. He seemed pessimistic that the GUI would come up at all. I went back into the conference room dreading what would happen if they couldn’t see the GUI again. After a few more minutes Curtis asked Brent to switch back to the GUI to see if it loaded yet.
I held my breath as he switched to the browser. I through for sure it would still be a white screen and there would be questions about why it isn’t working now. Fortunately by some stroke of luck it was online and they could see everything.
With a potential disaster averted, I took a short break and saw Jegan in the hallway. We discussed a meeting scheduled for tomorrow to talk about hardware planning for the 1.2 release. Carole has been out of the office and doesn’t know about the meeting and won’t be in tomorrow. We were concerned about some of the people invited to the meeting and the perceptions some of those people might have about how the applications team is handling things. We decided to call her.
Jegan and I got on a conference call and called Carole at home. We updated her on the situation and she thanked us for the call. She suggested that we talk to Alonzo before the meeting to let him know our point of view on the situation since he’s running the meeting.
When we got off the phone with Carole I called Alonzo and proposed meeting with him before the meeting. He mentioned that he was going to reschedule the meeting for Monday so that Carole will be able to attend too.
By this time it was around 6:30pm. Sameer and I were waiting for the DBA, Andre to run an errand and come back to the office to analyze our queries to see why they are taking so long.
While we waited we brainstormed about possible solutions to the problem. Jegan came by at one point and asked about us running the queries in parallel. I disputed the idea thinking that three selects against the same tables at the same time would certainly take more time than running them sequentially. We decided to do an experiment. To my great surprise it took about seven minutes to run the queries sequentially but only 3.5 minutes to run in parallel! This looked like the break we were looking for.
Sameer and I spent much of the rest of the evening trying to get the servlets to run in parallel. No matter what we did, they would always execute in sequence. At one point we even had the crazy idea of hosting two of the servlets on my workstation to do a form of distributed computing in order to get them to run in parallel. This did not work either.
We finally found a way to make them run in parallel by re-architecting the HTML and javascript such that the servlets are all in their own hidden frame instead of one frame. Ours HTML knowledge wasn’t so great in this area so Sameer called one of his friends for some tips on the syntax.
It was now around 10pm and Andre finally called us back. He did some analysis and told us that he could ‘borrow’ 300MB of memory from another database to give to ours since our queries are so memory-intensive. Sameer also requested the primary keys be given another index to improve the query time as well. Both of these changes would require the database to be bounced. We didn’t want to disrupt the data feed still running so we decided to do this first thing tomorrow morning.
We completed our work on the server side and finally left work at 11:30pm. I had been at work 16 hours and was pretty tired.
When I got home, I cooked up a Totinos pepperoni pizza and fired up the VPN to get into work. Elizabeth asked me to check in on the perl parsers after midnight to make sure that they didn’t stop working when the old files got archived at 12:10am. I checked in on this and everything seemed fine. I went to bed at 1am.