Overview
|
|
We should sub title this case: Ready for production? We all understand business. It’s a delicate balance between providing quality, in a timely manner within a tightly controlled budget. However there comes a point when no matter how much skill a craftsman has, wrong tools and/or materials will never produce quality results. Expecting an un-supervised neophyte to produce even a working product is, in a word, crazy. There are two quotes that have over the years become the mantra for our team.... The only people who do not make mistakes are the people who do nothing.... and... If you don't have the time to do it right, when will you find the time to do it over. This particular case involves a printing company, spending valuable money on state of the art printing equipment. And then marketing their services using web based software developed by a friend to save money. Its only software right? We applaud giving someone their 'hand up' a first job to break into the industry... but would you ever consider going to production with such software and expecting it to keep your multi-million dollar investment alive... well this story is sad but true. |
Structure
|
|
Our client reports: We are having problems finding a coder who can do the 'advanced' work required for a NEW web site. It’s an issue of complex math, involving on screen graphics, fonts, and placement. The system allows WYSIWYG creation of high quality printed materials. Of course when using a computer monitors certain scaling and aspect ratios must be accounted for. We believe these issues are beyond our current development skills. We have 'tried' to engage a number of other coders and feel they simply don’t have the experience we require. What we propose is that you 'prove' yourself by fixing some 'minor' little bugs in our current web site. Based on your results we may give you the entire new web site project. So this was a software contract and not a problem resolution opportunity for us. We knew that we could locate root cause, affect change and deliver solutions within hours for 'minor' issues. In fact when we were told the issues and that the client has been advised for years: this one is just not possible. Our developer created a 'proof' web page which performed exactly the impossible task. This proof was built in only 30 minutes. And we still had not received their code to work with. OK so we are patting ourselves on the back here. And that is always a mistake. |
The Stated Problem
|
|
We received the existing code for the purpose of inserting our four line 'fix' to address the first of these minor issues. It took our same developer three full days just to get their code somewhat running in house. Not that the code was the issue here... it was a complete ignorance on the part of this code to deal with relative paths. We had to create the EXACT same directory structure as used on their production web site for over 54 directories and over 3000000 files. And if we left out a SINGLE file the software would simply choke. Once the software was 'kind of' working in our shop we immediately went into Software Problem Resolution mode. We started by recording only the activity of the web server. Not the client browser or the database. We executed the home page, a product catalog page, and one product order page. Our development machine crashed... Dot Net code crashing the machine... No Way!... Let’s look at that recording. Once we had the machine back up, we noticed the hard drive was FULL! It ran out of space with all the errors and exceptions reported. We had generated a 40 GIG error report by requesting only 3 pages. And we learned it was NOT the Dot Net code that caused the crash but our testing tool running out of disk space. This simply means we did not catch all errors. Wow! Of course we suspected the worst when the performance of those three pages was so awful... awful because we knew the recording tool was working overtime building a large report. |
The Root Cause
|
|
This section is Root Cause... and we will get to it.... but first.... For certain you would be curious as to the contents of that 40 gig error log. We were, and then it became a source of entertainment. It has now become the standard training tool for our staff. For this one test log contains every single mistake (and some we never dreamed of), one will EVER see in code. Best practices simply did not exist. We have cataloged this report as having 75 'types' of errors. It typically takes our entry level staff two weeks to find half of them. This is truly a mess not worth touching. The application contains 12 web pages; each page contains approx 3000 lines of server side code. All application initialization code is duplicated in every page.... (Yes the Global was an empty stub)... In fact 60% of the server side code was a Cut & Paste into every other page. There are over 12 thousand lines of JavaScript code. Most of which perform exactly the same functions as the server side code. There are over 50 hidden fields duplicated on every page. 78 query string value pairs, AND 138 values pairs in the client side cookie. The Majority of these value pairs exist in ALL THREE places.... and in the server side session state and in the client side view state. There are 12 stored procedures in the database... only 4 of which are called by code. 98% of SQL code is embedded in the web server code.... We stumbled into a client side query string variable containing the connection string to the database. (OUCH) Of the thousands of local variables few had names longer than three characters. All the ensuing 'problems' from this coding approach, were masked from the user (and client) by a massive use of try/catch blocks. Few of these catch blocks were not empty. With all those facts it is not surprising that the 'minor' problems can all be described as: 'This feature works for awhile and then bang... it just gets all messed up on screen". We have figured out exactly how to recreate each of these problems... watch this. Then with a serious of no less than 6 steps a problem could be reproduced. This simply 'inferred' all the value pair maintenance was not being performed correctly in all code. At this point in our relationship with this client we didn't even have a contract yet... they insisted on seeing that 'proof' we had shown working but on their system. When we located the right spot, inserted our four lines of code.... the entire house of cards simply collapsed. We tried to explain to the client that taking this code into another business path was not advisable. IN FACT they were strongly advised to immediately discard the existing code and start over. The real root cause here was simply trying to save money and cutting corners. Of course the client simply responded with: Thank You for your report. And we never heard from them again....
|
The Results
|
|
As in the Perfect Software Case, we never thought we would ever find impossible code... but one should never say never. Here was a piece of production code in use making sales for a company and we have no idea HOW it executes once... let alone provides results. There comes a time when you simply must abandon a broken item. With software there is always hope, there is always a work-around to keep it going or to extend its capabilities. HOWEVER, in this one case it was simply not cost justifiable for them or us to engage in hacking the current code.... nor was there any interest in rebuilding a better system. This piece of software so poorly designed, developed, not tested, no code reviews AND still it provides us value in the software problem resolution world. As a training tool for problem resolution experts. We no longer use hypothetical sample code cooked up to 'demonstrate' certain things. For now we have a real world piece of code that demonstrates errors we never thought possible. We have assisted clients after they have spent days even weeks looking at test data which contained only a single real error. Still they could not locate the error in the data. In the case at hand, there are so many real errors it is difficult to decide which are root cause, which are simply symptoms and which are 'white noise'. With the shear volume and varied nature of these errors the task of analysis demands a structured approach to gain even a slight comprehension.
|
Summary
|
|
It is interesting to note that in this case we were not engaged as software problem resolution experts but rather as accomplished coders being offered the job of advancing an existing system to its next incarnation. Still our tools and methods were able to visually solidify the heart of problems, without finger pointing, without question in even the minds of non developers. Once again with such concrete facts there is no room for ambiguity. We wish to point out that the shear size of that error log sounds impossible to deal with. But our analyist was able to obtain ALL the facts presented here in a matter of only two hours. Of course we have torn apart that data since... but the fact remains using the right tools and methods... root because problem identification can be minutes away. An invaluable tool when deciding a piece of code is in fact 'Ready for production'. |