Legacy Code: How To Reverse-Engineer

1. Introduction

In a previous post I came to the conclusion that reverse-engineering might be needed the most by new team members working in an already established project.

This covers the scenario when knowledge is not widespread and there is little time to get “the full picture” of what’s going on before being asked to be a contributive member of the team.

The only possibility is to go through the code and try to understand what the team has come up with over the lifetime of the project. Of course, experienced and /or talented programmers could be able read and understand the code fast, but they would still need to rebuild all the discussions with the customers, if they want to understand what the system is supposed to do. For the latter, some diagrams can definitely help.

In principle, reverse-engineering should be used to bring source code into a Model Driven Development process and, for this reason, it should be necessary only once. However, I have seen the process applied either just once to create some diagrams and produce some kind of documentation, or by having someone doing it regularly from the source code, just to maintain the diagrams in sync with the code. I normally call the output of the latter case UML “pictures” or “paintings“and not “diagrams“, because they are not used to communicate among stakeholders before or during development, but probably only for some contractual clause.

Instead, I’ll identify some clear objectives that, depending on the size of the project, can be achieved relatively quickly. We are going to use our tools

  1. to identify all the entities defined in the source code to implement the system
  2. to model the most important relations between the defined entities.

2. First Steps

  1. Reverse engineer the code by using the tool at our disposal. This is normally the simplest step, but it is also the one that determines the amount of work needed after that. For C++, any tool like Enterprise Architect would probably ask to point a project to a place in our filesystem where the definition of “user defined types” are contained, normally .h or .hpp files.
  2. After selecting the files, we can start the action, which might take some time, depending on the size of the code base.
  3. The result of what we get depends on the code we are starting from and the options selected. If we don’t like it we can always start again after choosing different options.

3. Create Relationships

The steps in the previous paragraph will provide all the user defined types in the software that we are reverse-engineering. This is the first objective described in the introduction.

As far as objective 2. is concerned, the functionalities provided by the tools may vary and some manual work is most likely needed. A class diagram is normally created and an element of the project is dragged into it. Then some kind of “Show related elements” feature is selected (in Enterprise Architect is Context Menu on the element in the diagram -> Insert Related Element -> choose what to show -> OK).

This wiil probably enter more elements in the diagram, all connected to the one we entered by some relationship.

If we repeat for all the classes or, better, for the transitive closure of the set of classes in the model, we have a model that corresponds to the code. It may be not elegant or not help at all in understanding the code itself, but it’s a graphic representation of the current version of the code and it’s worth keeping a copy.

Now we need to decide which level of detail we want to achieve in the diagram. Because in our scenario we are not going to use it for generating source code, that level should be the one at which the understanding of the relations between entities is sufficient for a newcomer in the project.

A reasonable choice could be to have shown the the most used relationships

  1. Generalization (normally quite well automatically identified)
  2. Composition (manual work probably needed)
  3. Aggregation (manual work probably needed)
  4. Usage (normally quite well automatically identified, though may hide some of 2. or 3.)

Of course, after this refactoring of the model, the diagram will have almost certainly lost its correspondence to the source code.

As reported previously, in other software development methodologies widely adopted nowadays the creation of the model starts from the source code and the “reverse-engineering” step normally ends at the automatic recognition mentioned above. I am almost sure that I am not the only unlucky one that has been involved in projects where Architectural and Design documents magically appear (or “are updated to the latest version of the software“)  overnight after the code has been developed, even in very, very, “serious” projects. In this case I am going to apply strictly a convenient expression of my hometown, “Name the Sin, but not the Sinner“.

Happy Reverse-Engineering!

Notice on copyright holders
Visual Studio is a registered mark of Microsoft Corp., USA. Enterprise Architect is a registered mark of SparxSystems Ltd., Australia. Embedded Engineer is a registered mark of LieberLieber Software GmbH, Austria

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *