当前位置:首页 >> >>

Toward practical automated program understanding. Working Notes of the Third


Toward Practical Automated Program Understanding
Alex Quilici University of Hawaii at Manoa Department of Electrical Engineering 2540 Dole St, Holmes 483 Honolulu, HI 96822

Extended Abstract
It is an open question whether automated program understanding can become a practical, useable tool in the reverse engineering or maintaining of existing, real-world legacy systems. However, there are clearly several traits that any deployable automated program understanding tool must possess: 1. It must be based on an understanding algorithm that scales in practice to large programs. 2. It must produce an understanding targeted to the speci c reverse engineering or maintenance tasks it is being used to support. 3. It must provide mechanisms that allow the programmers who perform reverse engineering or maintenance tasks to update its knowledge base. 4. It must integrate with other, existing tools that support maintenance and reverse engineering. 5. Finally, it must help the end-user achieve tasks more simply and more cheaply than alternative approaches. This abstract provides an overview of our approach to constructing a program understanding tool that possesses these traits. In particular, it discusses the real-world problems motivating our work on program understanding, the evolution and current performance of our approach to program understanding, our attempts to simplify the task of providing the knowledge the program understanding algorithm requires, and our initial attempts to create a program understanding environment on top of our automated program understanding tools.

1 Introduction

2 A Real-World Problem Requiring Program Understanding
Our particular research focus is the use of program understanding tools to support answering common conceptual queries asked by maintenance programmers. For example, consider the task of a maintenance programmer charged with updating an existing legacy COBOL program so that all error messages that arise from input validation include the time. This task immediately gives rise to the question: \Where is all the code dealing with displaying error messages?". 1

Existing tools are inadequate to answer this query directly. For example, simply using standard string searching to look for all locations of the text string \Error" is only useful if the error messages or the elds used in constructing them are explicitly identi ed in the code. As a result, maintenance programmers are forced to improvise to try to answer this query. One common approach, for example, is to start by locating all the WRITE statements. But this is problematic for large programs, as they may contain a large number of WRITEs, where only a small percentage of those WRITEs actually display error messages. Determining exactly which WRITEs are relevant then involves examining how each record being written was constructed to determine if it is actually writing an error message. Furthermore, the code that constructs the error messages may be delocalized from the code that displays them, making it time-consuming to verify that a particular WRITE is actually displaying an error message. Of course, there are other, related techniques programmers might use that work better for this particular query (such as rst reading the code to try to determine the le onto which error messages are written and then locating all WRITEs on that particular le, and so on), but none of these are completely satisfactory. The result is a clear need for tools that help programmers nd the answers to their conceptual queries about existing source code. Our approach is to construct tools that allow programmers to describe characteristics of the code they are searching for|in this case, general characteristics of the code that forms and displays error messages. Speci cally, our approach is to allow the programmer to describe these characteristics at an abstract (planning) level and to have an automated program understander then locate other, similar code. In fact, we want programmers to be able to interactively provide these descriptions in the same way that UNIX users form and provide complex regular expressions to a tool like \grep". Saving these programmer-provided plans will eventually form a plan library, and the program understander can start processing each new program by trying to nd the library entries that appear within it. In constructing these tools, we are being forced to address two key problems. The rst is the scaling problem in automated program understanding. The second is nding a mechanism that allows users to describe abstract code characteristics in a simple way.

3 Toward A Program Understanding Algorithm That Scales
We have based much of our program understanding work on the pre-existing framework for representing programming plans originally used in the Concept Recognizer 1, 2]. In this framework, plans consist of components and constraints. Below is a description of one COBOL plan for producing error messages:
Display-Labelled-Message(Message: ?label-string, Record: ?label-record) Components Clear-Record: MOVE(source: SPACES, dest: ?print-record) Provide-Label: MOVE(source: ?label-string, dest: ?print-label-field) Provide-Record: MOVE(source: ?label-record, dest: ?print-label-record) Dump-Line: WRITE(source: ?print-record) Constraints literal(?label-string) field(?print-label-field, ?print-record) field(?print-label-record, ?print-record) data-dep(provide-label, dump-line, ?print-label-field) data-dep(provide-record, dump-line, ?print-label-field) data-dep(clear-record, provide-label, ?print-record) data-dep(clear-record, provide-record, ?print-record)

Essentially, this plan consists of moving spaces into the print-record to clear it, moving in the text of the error message (as a literal string), moving in the erroneous input record, and nally writing the print-record. (While in this case the components are all basic COBOL statements, the components can themselves be plans.) 2

Given this representation for plans, the Concept Recognizer recognized plan instances by running through each of the plans in the plan library, recursively locating their components, forming all possible combinations of these components, and evaluating the constraints. However, it is far too expensive, even for small programs, to simply generate all combinations of plan components and then apply constraints to each combination. As a result, the concept recognizer interleaved forming partial combinations of plan components with constraint evaluation (e.g, nd the MOVEs that correspond to the Clear-Record component, then the MOVEs that correspond to the Provide-Label component, apply the constraints evaluable on this combination of MOVEs, lter out the combinations that fail to achieve these constraints, and so on). Despite these optimizations, however, the Concept Recognizer's approach doesn't scale well in practice 1]. In our own implementation of this approach, we have observed two reasons for this poor performance. One reason is that its top-down recognition process is library driven rather than code driven; that is, it considers all plans in the plan library, whether or not they are likely to be in the code. The other reason is that there is no mechanism to specify exactly how to interleave constraints and combine component instances when recognizing plans|yet it is exactly those steps where most of the program understanding algorithm's work lies and where di erent orderings and approaches can have a major impact on the overall e ciency of the understanding algorithm. In earlier work, we have attempted to address these problems 4]. In particular, we have addressed the problem of minimizing the number of plans tried by constructing a codedriven version of the Concept Recognizer that used indexing to determine exactly when a plan should be considered. This indexing information was a combination of components and constraints that when they occurred in the code suggested that the remainder of the plan was worth checking for. For example, for DISPLAY-LABELLED-RECORD is indexed under a WRITE with a data dependency to a MOVE of a literal. That is, the understander only checks for an instance of a DISPLAY-LABELLED-RECORD when it sees a WRITE and that WRITE happens to have a data dependency to a MOVE of a literal. This indexing also somewhat addresses the problem of determining an order for combining components and constraints in that the indexing components and constraints are guaranteed to be evaluated rst and their failure prevents the remaining combinations or constraints from being evaluated. We compared our indexing approach to our implementation of the Concept Recognizer on a basket of textbook COBOL and C programs. The indexing approach showed approximately an order of magnitude improvement in performance (10-15 times) over the Concept Recognizer's approach. However, in addition to this pleasing result, we also observed several troublesome problems with our indexing approach. One is that it uses a tremendous amount of memory. The reason for this turns out to be that not all components of a plan may have been recognized when the plan is indexed, which forces the plan recognizer to save partially matched plans and gradually match their remaining components to the program as instances of those components are recognized in the code. Another problem is that the indexing only provides a partial order on component matching and constraint evaluation. Unfortunately, this leaves the understander little guidance on what order to try to locate and evaluate constraints on the remaining plan components.

3.1 An Index-Based Program Understander

3.2 An Improved Index-Based Program Understander

This study has led us to a modi ed version of our indexing approach that addresses both of these problems. To eliminate the memory intensive \waiting" that the indexing algorithm goes through, we have divided the plan library into layers, where each layer is a set of plans dependent only on the plans at the previous layer being recognized. The bottom 3

layer is those plans only dependent on items in the program's abstract syntax tree, the next layer is a set of plans only dependent on plans at the bottom layer, and o on. The plan recognizer then rst runs through the program looking for indexes on the plan elements in the bottom layer, then the next layer, and so on. This guarantees that any indexed plan will have all of the expected components present, eliminating the need for any \waiting". The second improvement is in specifying a default heuristic ordering on constraint evaluation and matching based on di erent classes of constraints, the number of available instances of each component, the cost of evaluating constraints, and so on. This has resulted in allowing the understander to better choose the order in which it performs component matching and constraint evaluation. We compared our revised indexing algorithm against our original and have found better than another order of magnitude performance improvement. On textbook COBOL programs on the order of one thousand lines or so of code, with our plan library containing on the order of one hundred or so plans, this is fast enough to have reduced our program understanding algorithm's performance from several hours (for the original, non-indexed version) to ve to ten minutes. While this is certainly no guarantee that the algorithm has solved the scaling problem in program understanding, it suggests that the algorithm in its current form can be used to automatically examine and understand modules of large programs. Given recent research into tools for automatically modularizing large COBOL programs 3], combining our automated understander with these tools appears to be a promising approach to understanding large programs.

4 Tools for Providing Programming Plans
Ideally, we want programmers to be able to provide the plans our program understander attempts to recognize. This gives programmers a mechanism for describing items the programmer expects to nd in the source code. While we allow programmers to provide plans by explicitly specifying components and constraints, programmers do not nd this approach especially convenient. As a result, we have been examining an alternative: providing programming plans by example. The idea is that the programmer locates one instance of a particular plan within the code and then highlights its components (by highlighting source code lines). The programmer, for example, locates and highlights the three MOVEs and a WRITE for creating and displaying one of the many error messages in a large program. The highlighted lines can be viewed as a maximally-constrained programming plan that will only match those particular lines of code. The system displays all the components and all the constraints that apply to them (e.g., that a particular value must be SPACES or must be a string literal; that one statement has a control ow or data dependency on another statement; and so on). The programmer can then examine each constraint and determine whether it is crucial to the plan or not, marking those that are likely to be unimportant. In addition, the programmer can specify which components and constraints the plan should be indexed under. Once the programmer is nished, the system tries to recognize other instances of the resulting plan in the code. Based on the resulting retrieved occurrences, the programmer can then attempt to generalize or constrain the plan further. This initial approach works surprisingly well|so long as programmers select small groups of statements as the example instance. But larger plans result in too many constraints to make it easy for the programmer to determine which ones are relevant, and plans with many components tend to signi cantly slow the program understander. As a result, we are now exploring mechanisms by which the system can substitute existing plans recognized within the user-highlighted lines for the underlying components those lines contain. Despite the need for more work, providing programming plans by example does suggest one possible approach to allowing the users of program understanding tools to provide the patterns they use to understand the program. 4

5 Cooperative Program Understanding

One limitation of pattern-based program understanding tools is that they are unlikely to be able to completely understand existing legacy systems. As a result, to form databases that completely describe a program's behavior at a level suitable for answering conceptual queries, it is necessary to provide mechanisms by which programmers can extend the system's automatically extracted understanding. We have been working on a cooperative program understanding environment that does just that 5]. Our automated program understanding tool initially extracts what it can automatically, and then allows programmers to add to this understanding graphically. In particular, programmers create object-oriented design elements and then link them to highlighted sections of code or to system recognized plans. The result of using the system is an object-oriented design that corresponds to the source code, along with detailed links to where various objects and operations are actually implemented. All of this information is stored in a single internal representation, so that programmers can query the resulting system/programmer-extracted program understanding without regard to what was extracted automatically versus what was extracted by hand.

6 Conclusions
We have been working on constructing a program understanding tool that can have an impact on maintaining real-world software systems. This has led us to focus on three issues. First, we have been working on a program understanding algorithm that scales. In particular, we have focused on an indexed-plan library, with an algorithm that can take advantage of these indexes. Second, we have been working on mechanisms by which programmers can easily specify programming plans by providing examples and editing the constraints the examples imply. Third, we have been combining automated understanding with a case-like tool that allows programmers to link design elements to source code. Our belief is that all of these issues must be addressed in order to successfully apply automated program understanding to real-world legacy systems.

References
1] Kozaczynski, Voytek; and Ning, Jim Q. Automated Program Understanding By Concept Recognition, Automated Software Engineering 1, 1 (March 1994), 61{78. 2] Kozaczynski, Voytek; Ning, Jim Q.; and Engberts, Andre. 1992. Program concept recognition and transformation. Transactions on Software Engineering 18, 12 (December 1992), 1065{1075. 3] P. Newcomb and L. Markosian, Automating the Modularization of Large COBOL Programs: Application of an Enabling Technology for Reengineering. In Proceedings of the Working Conference on Reverse Engineering, Baltimore, MD, (May 1993), 222-230. 4] Quilici, Alex. 1994. A memory-based approach to recognizing programming plans. Communications of the ACM 37, 5 (May 1994), 84{93. 5] Quilici, Alex; and Chin, David N. 1994. A cooperative program understanding environment. In Proceedings of the Ninth Knowledge-Based Software Engineering Conference. Monterey CA, pp. 125-132.

5


相关文章:
Toward practical automated program understanding. Working ....pdf
Working Notes of the Third_专业资料。It is an open question whether automated program understanding can become a practical, useable tool in the reverse ...
Practical, AutomatedLarge ScaleSoftware Reengineering_图文_....ppt
Practical, AutomatedLarge ScaleSoftware Reengineering_...2011-3-26 12 DMS transforms work on ASTs, not...aid code understanding ? enable application ...
Improving the Efficacy of Automated Sign Language Practice ....pdf
Springer Lecture Notes in Artificial Intelligence. About the author Helene ...She is working on her dissertation "Improving the Efficacy of Automated Sign...
Practical Techniques for Automated Specification-Based ....pdf
Practical Techniques for Automated Specification-Based Testing_专业资料。Formal ...In Proceedings of the 29th Annual IEEE/NASA Software Engineering Workshop, ...
Unit 3 Communication by phone TP.doc
understanding of the texts and to let the ...4. Teach the students some practical writing ...is the abbreviation for automated teller machine. ...
...the Gap between Proof Theory and Practical Automated Proof....pdf
Theory and Practical Automated Proof Syst_专业资料...The second stage of the work will be concerned ...Understanding Z: A Speci cation Language and its...
DUNEDIN NEW ZEALAND Automated Scoring of Practical Tests in ....pdf
DUNEDIN NEW ZEALAND Automated Scoring of Practical ...(1) have acquired an understanding of the ...[1] who have used macros to assess work ...
Future Work and Practical Applications of Genetic Programming....pdf
Future Work and Practical Applications of Genetic ... in an automated way, a computer program that ...(1996) in understanding images represented by ...
Acquisition of Swarm Intellegence (SI) Principle in Practice ....pdf
and also certain computer programs that are ...practical teaching mode of ‘group synchronous’ ...Acknowledgements This work is supported by the ...
Toward Practical Applications of Software Synthesis.pdf
Toward Practical Applications of Software Synthesis_...1 Introduction We have been exploring automated ...In Proceedings of the Workshop on Innovative ...
自动化实用模型.doc
The Practical Organization of Automated Software ...working in a variety of software development ...understanding of where automation can take a ...
Understanding and Performing MIPI_图文.pdf
Understanding and Performing MIPI_电子/电路_工程...the HS entry under these practical circumstances. ...The automated functionality provided by TekExpress ...
A Test for Evaluating the Practical Usefulness of Deduction ....pdf
A Test for Evaluating the Practical Usefulness of...of the working language and the speed of proof ...Even for fully automated systems, a ranking by ...
Approaches to Automated Morphological Classification of ....pdf
Approaches to Automated Morphological Classification ...From a practical point of view, too many input...Understanding galaxies, however, is not just about...
towards understanding an....pdf
Towards Understanding and Modeling Individual Behavior... automated support system for video surveillance, ...Video cameras have been chosen for practical ...
Current practice in the development and evaluation of spoken ....pdf
practical working procedures in both development and...[2], the Vocalis Operetta automated call routing...In Proceedings of Natural Language Understanding and...
understanding counterexa....pdf
Understanding counterexamples with explain_专业资料。...practical applicability, the need for automated ...In SPIN Workshop on Model Checking of Software,...
practical machine tran....pdf
Practical machine translation system allowing complex... and implementing an automated scheme of feature ...understanding of the internals of the system was ...
Future directions of automated deduction Distributed ....pdf
Future directions of automated deduction Distributed ... both theoretical and practical, arising from ...In terms of understanding of search problems and ...
towards a standard upper....pdf
Automated natural language understanding, both spoken...pose any serious practical or theoretical problems....” Working Notes of the IJCAI-2001 Workshop on...
更多相关标签: