This was my first internship. IBM, upstate New York. A chance for a fresh start: I was tired of how cynical and jaded I had become in college. I would approach this internship with optimism, sincerity, and diligent effort. When I arrived, they didn't have a place for me, so I got to sit in the manager's-manager's office. It was glass paned with a huge desk and a sculpture on the wall.
The language was REXX: interpreted, with built-in efficient indexes. You could step right up to a line, then change that line in the editor, and when you stepped over it it would execute what you had just typed. VERY cool. The system, CDS, was a Course Display System: it allowed you to browse a catalogue of courses. It had been built and maintained by generations of summer interns, like me. I picked a random routine, CalculateAvg(), and used that as a template for what code in the language should look like.
Two days later, in a normal cubicle, I was feeling a lot less optimistic. CalculateAvg() was doing lots of syntactic tricks I couldn't find documented. I knew how to calculate an average, but this routine had some extra loops and duplicate variables and other things I just couldn't parse at all. Finally I backed up and looked for the callers of CalculateAvg(). There weren't any. I realized, this is an interpreted language! Since this wasn't ever executed, it didn't matter if it worked, or even if it was even syntactically correct. I hadn't been confused because it was using advanced techniques. I had been confused because it was uncompilable gibberish. Well, that's progress! I switched to working on a known bug, seeing how being a known bug proved it got executed.
That night, eating dinner (Spiedies!) with other interns, one was bragging how productive he was. He'd written over 10,000 lines of code so far that summer. Woah, I thought. I haven't changed anything so far. I'd better concentrate.
The next day I worked on my bug. A window wasn't being displayed in the right spot. Debugging the code, I actually got the system to step into the place causing the problem! Excellent. Live code. The problem, it turned out, was this 3rd-level submenu had HARDCODED which character position it was suppose to be displayed at. What? Shouldn't it be the window above it, plus n down and plus n right, for some configurable n? There was a routine for the 2nd-level submenu, and it had hardcoded positions too! In fact, there were 200 routines, one for every possible submenu, all with hardcoded position constants. They were indeed hardcoded to 2 down, 2 over, except for my bug, which was 4 down 2 over. Well let's see. I could rewrite the whole codebase to place menus parametrically. Or I could correct the constant and close the bug. I corrected the constant.
Next bug was a course that crashed the system when you searched for it. Breakpoint, run, bang there's the context, voila it was checking some bogus logic. I corrected the logic, ran it, it worked, closed the bug. Yeay.
The next day I got a bug about a different course that crashed the system when you searched for it. Breakpoint, run, bang there's the context, and it's the same bogus logic I fixed yesterday. Huh? Did I fail to check in my changes? I searched for my changes, but voila, there they were. So, how come I just saw them not fixed? I searched again, and found it not fixed again. Hum. Hum hum hum. I went to the top of the file and searched for the buggy code. I found it. I kept searching. I found 15 copies of it. 16 if you include the one I'd fixed yesterday. The surrounding 2 pages was identical for each, except for the one I had fixed yesterday. I See. I checked how many lines were in this course display system. 200,000. Approximately 4000 pages.
I had discovered the copy-paste-modify technique, which looks like "if (testcase) { tweakedcode } else { oldcode }", which produces 2-to-the-n slightly different copies of code very quickly. Did that intern I was talking to yesterday cause this? There were no comments for who had modified what when. Sherry had a chalkboard showing who was currently modifying which file, so we didn't step on each other's toes. Hum. WELL, given that I KNEW this particular 1600 lines was all virtually the same 100 poorly written lines with simple tweaks, I decided to delete them all and do it right. 16 copies meant 4 iterations of copy-paste-modify, so I found the 4 bug fixes that caused the combinatorial explosion, and accounted for them. Came out to 60 lines afterward.
The next day we had a code review meeting. Code reviews were always done by printing out code and handing copies to 24 senior employees sitting around a table. I only printed out the new code. I explained I was fixing a crash. Several crashes actually. And I had reduced the code from 1600 lines to 60. I was rather proud of myself. They asked what testing I had done. I explained the two bug testcases, plus four more I'd tried to account for the four other issues. Was I certain I'd caught everything? Well, no. It could still have crashes I didn't know about. So the committee rejected the change, keeping the original code.
I went to my boss. "I'm trying to be optimistic about this," I said. "This bad code was entertaining at first. But it's getting to the point where it's just not fun."
"Don't take it too seriously, Adam" he said. "All the people working on this are new. A lot of them are going to figure out programming isn't for them. We purposely have you working on a low-stakes project. Once you're working with more experienced coworkers, things are better."
The next week I presented the same thing. I lied. I said it was a simple logic correction and totally safe. They allowed it in.
By the end of the summer I had written minus 5000 lines of code, which included 3000 that I'd added myself. One of the bigger changes had replaced a 4000-line REXX program with a 300-line one that hijacked the local text editor's scripting language to do search-replace on the course catalog, producing a table of contents instead. Lots of regular expressions. Another did transitive closures of courses, did several experimental layouts, and displayed courses of study in a flowchart similar to ones drawn by hand. I commented that one rigorously, and explained it all in detail to the next intern. As I was leaving, I asked him what he thought of it. "Well, one thing I know, I'm not gonna TOUCH that code. I have no friggin idea how it works," he said. Another learning: it's not enough to finish code and check it in. You have to get customers to actually use and depend on it. Otherwise it'll get forgotten and maybe deleted. One intern's tight inspired code is the next intern's uninterpretable spaghetti code.
The code strongly reminded me of DNA. Junk code. Duplicate code, with small tweaks. Tons and tons of code. Patches of logic copied to strange places. Big uninterpretable data segments, some of which I could prove weren't referenced. Some junk code strangely spliced together. All seemingly at random, with no recorded history. Could life have evolved truly at random? Well, CDS did successfully run, most of the time. From what I have learned so far, the DNA code is actually significantly worse.
This was in response to a prompt on reddit.com r/WritingPrompts, "The code had been maintained solely by generations of summer interns. It was the worst mess he had ever seen."