Change Management, the watchdog of Complex Systems?

I once read an article on a new implementation of an autopilot system. My colleagues and me found it quite amusing that it used a dog-bites-man strategy, in the sense that the expert system would guide the pilot and warn him if he was doing something stupid or hazardous acting as a watchdog over human actions and interventions.

The above story is somewhat relevant to a pattern, which have seen at my workplace. To begin with we have increased the number of techies, also meaning more developers and I am one of the newest kids on the block (or the mill is perhaps a better term). After my start and much driven by me we have addressed our processes quite intensely.

We have relaunched the concept of version control since the existing system was not really put to use – it was just used by a single developer and not for everything. So we set up a new server and updated the version control system (Subversion) and migrated the existing projects from the old server.

In addition we made the same exercise with our ticketing system, we had an old Request Tracker installation. We set up a new one in parallel and started creating new issues (RTs) there – the old system had not been maintained and the number of tickets were quite low, so an update or migration seemed like a lot of work for almost nothing, the old version has been phased out now – and we don’t miss it.

So with the two corner stones in place, we have seen a lot of good practices being put to use. Our RT processing has been integrated with or Change Management process (based on an old version of Twiki – which is next in line to be updated IMHO). Our Change Management “system” is based on a list of Change Requests (CRs). We set some guidelines and to act as watchdog perimeters for changes to production environment components.

In addition to a high number of RTs (and lower number of CRs), our change process and work flow are both slowly falling into place. Control over what we do and what we need to do is also improving.

I have printed out the poster from ‘Ship It’ and put this up next to my desk. It is not the complete truth, but it is at such a high-level that it is difficult not to agree with it’s take on the good practices it lists.

So on the infrastructure side we now have the following established:

- Version control (Subversion)
- Feature tracking (RT)
- Issue tracking (RT)

We are a bit behind on the use of Subversion, but I guess we will get there eventually. For the techniques, we have the following:

- Technical lead(s)
- (working from) The list
- Code Reviews

The old dogs in our company are good technical leads, since they now the business by heart and have been employed for a long time.

We have had some basic code reviews, which have demonstrated the power of code review, we have basically used these as sort of daily meetings, just bi-weekly and focus has been more on security and general architecture, digging more into what should be the development process – I will get back to this later. I hope our code review process will be able to support QA more in the long run, but for now this is working great.

We have become really good at structuring our tasks in RT and communication are happening in the RT system instead of in diverse misc. media like email and jabber (well it does, but the essentials are propagated into RT), so we are all basically working from the list, which is perhaps one of the best practices. It gives us an idea of: who is working on what and how much we are actually doing, we are not into micromanagement, but it is a good idea to visualize the pressure under which you technical department is, workload wise

I myself is a heavy user on the version control system, working primarily as a developer and with source code. The page for our homepage (a Typo3/PHP setup) has also been put under version control. For that part we have seen a number of practices evolving out of thin air.

- Release management
- Scheduled (monthly) releases

This is great, we are much more in control of what features are released and they are actually put through an acceptance test cycle before final deployment, which has also proven quite good – slow, but good.

This all sounds incredible! – but we have problems. I have mentioned some of these already, but here is an accumulated list, with some new ones added.

  1. Not all systems are under version control
  2. Not all people are using version control, so we have differences between components in production and components in the central repository
  3. Our project model is practically non-existing so we do not work as a team
  4. Ad hoc changes are made to systems in production, see also 2.

Which leads me back to the story I began this block post with.

It is of outmost importance that we are very strict on our change management. I am all for ad hoc changes to the code as long as you do not break anything or introduce bugs, so regression test is quite important and you test the changes before releasing them to production.

We have on one occasion seen changes to a component, where it impacted an imminent release. I was working on a new release of a component, where some logging would be enable a minor thing, I was then notified of a change in production, I requested a diff, evaluated it and it looked all good so I applied the changes – missing a single aspect of the diff. It broke backwards compatibility.

So when I released, my non-intrusive change, it did not work. It took some time for me to fix the bug and get it right. This is of course my own fault and that is what happens if you apply changes in the 11th. hour. So I got my release up and running in another environment with both the change from production and my own change.

A few days after I was talking to another user of the component on another system, the system where the change had been made to production directly. He had not been notified of the change either so his use of the component was also broken, this had nothing to do with my release, since it had not been deployed to the system he was working on, but the issue fell back to me because he knew what I was working on, all I could do was enlighten him on the changes I had applied to make it work.

Apart from my little stray release, our main problem here is changes made for no apparent reason, but the good idea. The change was not in our list of changes to make so nobody could prepare themselves for the change. 2 people spent a lot of time working on correcting things, because somebody had an good idea about a change to an API.

So in retrospect I have gathered that we need to move on with some of the best practices we need to apply and a few other things, here is the short term list:

  1. Tighten up our policies about changes to production
  2. Code change notifier (RSS feed on our subversion server)
  3. Script builds
  4. Write and run tests

I had actually scripted the build and written tests, but making the change directly in production does not necessarily require these to practices to be executed and my test suite should have caught the API change – bummer.

So it is back in the saddle…

Leave a comment

0 Comments.

Leave a Reply


[ Ctrl + Enter ]