Migration of computer language code can be done in 2 essential ways:
In most projects a combination of both is used.
Translators or automating migration of existing code is an exiting and challenging exercise in the modern Information Technology. It enables a way out for
- replacement of out-of-service-level software
- replacement of out-of-service-level platforms
- replacement of no longer used computer language(s)
- replacement of computer language which programming skills are difficult to find
Translating a computer program language to another language has the following advantages:
- Creates a replica. Meaning the code is an exact copy of the existing code.
- Automated translation is less error sensitive than manual translation or rewriting the code.
- Automated translation needs less testing time.
But has also disadvantages:
- Computer code is not improved. What was wrong stays wrong.
- The generation of the computer language (2GL, 3GL or 4GL) is (very) difficult to alter.
- The computer paradigm (f.e. Object Orientation) is preserved. A non OO language is translated in a very poor/bad OO language.
To write a translator for a computer program it is essential the syntax of the program can be understood. A parser syntactic analyses a computer program by reading a sequence of tokens and determine the structure with respect to a formal grammar. By doing this the parser understands the syntax of a program.
Such a parsing mechanism is enough for counting the computer language reserved words in an application, but not for translation purposes. Computer statements belong to other statements and are often meaningless alone. The understanding of the relations between lines is part of the semantics of a computer program.
Another part of the semantics is the program-flow (sequential execution) of the statements. The execution sequence method of the source language may differ from the destination program. All these characteristics have to be stored for later processing.
The parsing mechanism deals with these features by storing the program statements into an Abstract Syntax Tree enabling the storage for additional program information. The need for additional information depends not only on the source code, but also on the destination code and the mix way of transformation used in the project.
In theory every line of code could be transformed by the above described mechanism. The resulting translation program will be very complex, because it has to imbed a solution for all possible specific language options such as the handling of:
- User Interface (Screen)
- Files IO
- Specific elements of the source and destination languages.
The code of these kind of program parts is often very platform specific. It is more effective to recognize the occurrences in the source code and replace the code by a predefined manual written generic code. Examples are:
- CRUD engine for a database.
- Tools library for screen handling
- Replacing interfaces by (existing) (web) services
A common solution does not exist, each project has to find the proper solution.
At this point the result is a binary tree with branches, nodes and edges representing the parsed program code with enriched context necessary for the building of the destination program application. To migrate the source code to the destination code the computer programming Visitor pattern is used. The usage of this pattern enables:
- Addition of new operations to existing objects in the binary tree
- Simplifying the simultaneously working of programmers on the same translation project.
- Break up the parsing and translation parts of the migration process.
- Supports the Object Oriented approach.
The following is an example to illustrate a translation code migration process:
- Grammar for the parser is written in the context-free grammar Extendend Backus Naur Format (EBNF) format. Multiple instances are available on the internet as starting point.
- For parsing the code the combination of the grammar and JavaCC, an open source parser generator for the Java language, creating an Abstract Syntax Tree (AST) which can be programmed using Java.
- The in Java written visitors can transform the tree code to the destination language programming code.
- The building of special operations (screens, files, databases, interfaces) can be easily taken out. The writing of the library or manual rewritten code is separated build and later united with the by JavaCC parser and Java visitors written translated code.
- The generated code snippets of the Java visitor are tested before usage and the resulting code is part of the JUnit-tests of the Visitors.
Given the above descripted breakdown the following roles with their responsibilities can be distinguished
- Overview of all parts
- Integration of the build components/blocks.
- Parser programming
- EBNF format
- Abstract Syntax Tree
- Visitor programming
- Destination code snippets
- Unit tests
- Generation of the code snippets
- Library programming
- Special code of the destination platform.
- Library build
- Destination language library calling code snippets.
A project to write a translator have the following skills and costs:
Parser & Grammar writing
A grammar can with help of the Grammar found on the internet can be build in roughly:
- 80 hours for a very simple
- 240 hours for a very large
computer language. Please keep in mind that for every computer language the reserved keywords are always very limited. The build-in library functions can be immense. The writing of the libraries is largely depended on the migrated application. Especially a full application using screen io, files access, database access and a lot of platform specific behavior is far more time consuming. Rules of thumb are:
- 160 hours for simple code migration 3GL to 3GL.
- Add 160-320 hours extra time for 3GL to 4GL.
- 160 hours for simple screen with the same behavior.
- Add 160-320 hours extra for different screen behavior (block mode vs event driven)
- 80 hours for simple database access (f.e. sql to sql)
- 200-300 hours for different access (f.e. file access vs sql)
- 10 hours for each comparable interfaces
- Add 20 hours for each different way interface
Please keep in mind these are (very) rough estimates.
A very brief description of parsers is made to give some idea of the parsing process. Parsers are made to analyze computer programs. The parsing process consists on:
- Token generation or lexical analysis. All input is split into meaningful symbols defined by a grammar.
- Syntactic analysis, checking if the the tokens form a allowable expression with reference to a grammar.
- Semantic parsing or analysis working out the implication of an expression.
The task of a parser is to determine if and how input can derived from the start of the grammar, which essential can be done in 2 ways:
- Top-down, basically from left to right
- Bottom-up, basically from right to left.
- Grammars, A more detailed description on the study of the structure of a language. Also on EBNF
- Javacc, Parsing tool written in Java.
- Software migration, Description of Business Application Modernization (BAM) Software migration.
- http://en.wikipedia.org/wiki/Abstract_syntax_tree, Abstract Syntax Tree (AST).
- http://en.wikipedia.org/wiki/Binary_tree, Binary tree
- http://en.wikipedia.org/wiki/Design_pattern_(computer_science), Computer design pattern
- http://en.wikipedia.org/wiki/Category:Software_design_patterns, Computer software design patterns category
- http://en.wikipedia.org/wiki/Grammar, Grammar
- http://en.wikipedia.prg/wiki/JavaCC, JavaCC parser generator
- http://en.wikipedia.org/wiki/Parser, Parser
- http://en.wikipedia.org/wiki/Visitor_pattern, Visitor pattern