当前位置:首页 >> >>

Toward practical automated program understanding. Working Notes of the Third

Toward Practical Automated Program Understanding
Alex Quilici University of Hawaii at Manoa Department of Electrical Engineering 2540 Dole St, Holmes 483 Honolulu, HI 96822

Extended Abstract
It is an open question whether automated program understanding can become a practical, useable tool in the reverse engineering or maintaining of existing, real-world legacy systems. However, there are clearly several traits that any deployable automated program understanding tool must possess: 1. It must be based on an understanding algorithm that scales in practice to large programs. 2. It must produce an understanding targeted to the speci c reverse engineering or maintenance tasks it is being used to support. 3. It must provide mechanisms that allow the programmers who perform reverse engineering or maintenance tasks to update its knowledge base. 4. It must integrate with other, existing tools that support maintenance and reverse engineering. 5. Finally, it must help the end-user achieve tasks more simply and more cheaply than alternative approaches. This abstract provides an overview of our approach to constructing a program understanding tool that possesses these traits. In particular, it discusses the real-world problems motivating our work on program understanding, the evolution and current performance of our approach to program understanding, our attempts to simplify the task of providing the knowledge the program understanding algorithm requires, and our initial attempts to create a program understanding environment on top of our automated program understanding tools.

1 Introduction

2 A Real-World Problem Requiring Program Understanding
Our particular research focus is the use of program understanding tools to support answering common conceptual queries asked by maintenance programmers. For example, consider the task of a maintenance programmer charged with updating an existing legacy COBOL program so that all error messages that arise from input validation include the time. This task immediately gives rise to the question: \Where is all the code dealing with displaying error messages?". 1

Existing tools are inadequate to answer this query directly. For example, simply using standard string searching to look for all locations of the text string \Error" is only useful if the error messages or the elds used in constructing them are explicitly identi ed in the code. As a result, maintenance programmers are forced to improvise to try to answer this query. One common approach, for example, is to start by locating all the WRITE statements. But this is problematic for large programs, as they may contain a large number of WRITEs, where only a small percentage of those WRITEs actually display error messages. Determining exactly which WRITEs are relevant then involves examining how each record being written was constructed to determine if it is actually writing an error message. Furthermore, the code that constructs the error messages may be delocalized from the code that displays them, making it time-consuming to verify that a particular WRITE is actually displaying an error message. Of course, there are other, related techniques programmers might use that work better for this particular query (such as rst reading the code to try to determine the le onto which error messages are written and then locating all WRITEs on that particular le, and so on), but none of these are completely satisfactory. The result is a clear need for tools that help programmers nd the answers to their conceptual queries about existing source code. Our approach is to construct tools that allow programmers to describe characteristics of the code they are searching for|in this case, general characteristics of the code that forms and displays error messages. Speci cally, our approach is to allow the programmer to describe these characteristics at an abstract (planning) level and to have an automated program understander then locate other, similar code. In fact, we want programmers to be able to interactively provide these descriptions in the same way that UNIX users form and provide complex regular expressions to a tool like \grep". Saving these programmer-provided plans will eventually form a plan library, and the program understander can start processing each new program by trying to nd the library entries that appear within it. In constructing these tools, we are being forced to address two key problems. The rst is the scaling problem in automated program understanding. The second is nding a mechanism that allows users to describe abstract code characteristics in a simple way.

3 Toward A Program Understanding Algorithm That Scales
We have based much of our program understanding work on the pre-existing framework for representing programming plans originally used in the Concept Recognizer 1, 2]. In this framework, plans consist of components and constraints. Below is a description of one COBOL plan for producing error messages:
Display-Labelled-Message(Message: ?label-string, Record: ?label-record) Components Clear-Record: MOVE(source: SPACES, dest: ?print-record) Provide-Label: MOVE(source: ?label-string, dest: ?print-label-field) Provide-Record: MOVE(source: ?label-record, dest: ?print-label-record) Dump-Line: WRITE(source: ?print-record) Constraints literal(?label-string) field(?print-label-field, ?print-record) field(?print-label-record, ?print-record) data-dep(provide-label, dump-line, ?print-label-field) data-dep(provide-record, dump-line, ?print-label-field) data-dep(clear-record, provide-label, ?print-record) data-dep(clear-record, provide-record, ?print-record)

Essentially, this plan consists of moving spaces into the print-record to clear it, moving in the text of the error message (as a literal string), moving in the erroneous input record, and nally writing the print-record. (While in this case the components are all basic COBOL statements, the components can themselves be plans.) 2

Given this representation for plans, the Concept Recognizer recognized plan instances by running through each of the plans in the plan library, recursively locating their components, forming all possible combinations of these components, and evaluating the constraints. However, it is far too expensive, even for small programs, to simply generate all combinations of plan components and then apply constraints to each combination. As a result, the concept recognizer interleaved forming partial combinations of plan components with constraint evaluation (e.g, nd the MOVEs that correspond to the Clear-Record component, then the MOVEs that correspond to the Provide-Label component, apply the constraints evaluable on this combination of MOVEs, lter out the combinations that fail to achieve these constraints, and so on). Despite these optimizations, however, the Concept Recognizer's approach doesn't scale well in practice 1]. In our own implementation of this approach, we have observed two reasons for this poor performance. One reason is that its top-down recognition process is library driven rather than code driven; that is, it considers all plans in the plan library, whether or not they are likely to be in the code. The other reason is that there is no mechanism to specify exactly how to interleave constraints and combine component instances when recognizing plans|yet it is exactly those steps where most of the program understanding algorithm's work lies and where di erent orderings and approaches can have a major impact on the overall e ciency of the understanding algorithm. In earlier work, we have attempted to address these problems 4]. In particular, we have addressed the problem of minimizing the number of plans tried by constructing a codedriven version of the Concept Recognizer that used indexing to determine exactly when a plan should be considered. This indexing information was a combination of components and constraints that when they occurred in the code suggested that the remainder of the plan was worth checking for. For example, for DISPLAY-LABELLED-RECORD is indexed under a WRITE with a data dependency to a MOVE of a literal. That is, the understander only checks for an instance of a DISPLAY-LABELLED-RECORD when it sees a WRITE and that WRITE happens to have a data dependency to a MOVE of a literal. This indexing also somewhat addresses the problem of determining an order for combining components and constraints in that the indexing components and constraints are guaranteed to be evaluated rst and their failure prevents the remaining combinations or constraints from being evaluated. We compared our indexing approach to our implementation of the Concept Recognizer on a basket of textbook COBOL and C programs. The indexing approach showed approximately an order of magnitude improvement in performance (10-15 times) over the Concept Recognizer's approach. However, in addition to this pleasing result, we also observed several troublesome problems with our indexing approach. One is that it uses a tremendous amount of memory. The reason for this turns out to be that not all components of a plan may have been recognized when the plan is indexed, which forces the plan recognizer to save partially matched plans and gradually match their remaining components to the program as instances of those components are recognized in the code. Another problem is that the indexing only provides a partial order on component matching and constraint evaluation. Unfortunately, this leaves the understander little guidance on what order to try to locate and evaluate constraints on the remaining plan components.

3.1 An Index-Based Program Understander

3.2 An Improved Index-Based Program Understander

This study has led us to a modi ed version of our indexing approach that addresses both of these problems. To eliminate the memory intensive \waiting" that the indexing algorithm goes through, we have divided the plan library into layers, where each layer is a set of plans dependent only on the plans at the previous layer being recognized. The bottom 3

layer is those plans only dependent on items in the program's abstract syntax tree, the next layer is a set of plans only dependent on plans at the bottom layer, and o on. The plan recognizer then rst runs through the program looking for indexes on the plan elements in the bottom layer, then the next layer, and so on. This guarantees that any indexed plan will have all of the expected components present, eliminating the need for any \waiting". The second improvement is in specifying a default heuristic ordering on constraint evaluation and matching based on di erent classes of constraints, the number of available instances of each component, the cost of evaluating constraints, and so on. This has resulted in allowing the understander to better choose the order in which it performs component matching and constraint evaluation. We compared our revised indexing algorithm against our original and have found better than another order of magnitude performance improvement. On textbook COBOL programs on the order of one thousand lines or so of code, with our plan library containing on the order of one hundred or so plans, this is fast enough to have reduced our program understanding algorithm's performance from several hours (for the original, non-indexed version) to ve to ten minutes. While this is certainly no guarantee that the algorithm has solved the scaling problem in program understanding, it suggests that the algorithm in its current form can be used to automatically examine and understand modules of large programs. Given recent research into tools for automatically modularizing large COBOL programs 3], combining our automated understander with these tools appears to be a promising approach to understanding large programs.

4 Tools for Providing Programming Plans
Ideally, we want programmers to be able to provide the plans our program understander attempts to recognize. This gives programmers a mechanism for describing items the programmer expects to nd in the source code. While we allow programmers to provide plans by explicitly specifying components and constraints, programmers do not nd this approach especially convenient. As a result, we have been examining an alternative: providing programming plans by example. The idea is that the programmer locates one instance of a particular plan within the code and then highlights its components (by highlighting source code lines). The programmer, for example, locates and highlights the three MOVEs and a WRITE for creating and displaying one of the many error messages in a large program. The highlighted lines can be viewed as a maximally-constrained programming plan that will only match those particular lines of code. The system displays all the components and all the constraints that apply to them (e.g., that a particular value must be SPACES or must be a string literal; that one statement has a control ow or data dependency on another statement; and so on). The programmer can then examine each constraint and determine whether it is crucial to the plan or not, marking those that are likely to be unimportant. In addition, the programmer can specify which components and constraints the plan should be indexed under. Once the programmer is nished, the system tries to recognize other instances of the resulting plan in the code. Based on the resulting retrieved occurrences, the programmer can then attempt to generalize or constrain the plan further. This initial approach works surprisingly well|so long as programmers select small groups of statements as the example instance. But larger plans result in too many constraints to make it easy for the programmer to determine which ones are relevant, and plans with many components tend to signi cantly slow the program understander. As a result, we are now exploring mechanisms by which the system can substitute existing plans recognized within the user-highlighted lines for the underlying components those lines contain. Despite the need for more work, providing programming plans by example does suggest one possible approach to allowing the users of program understanding tools to provide the patterns they use to understand the program. 4

5 Cooperative Program Understanding

One limitation of pattern-based program understanding tools is that they are unlikely to be able to completely understand existing legacy systems. As a result, to form databases that completely describe a program's behavior at a level suitable for answering conceptual queries, it is necessary to provide mechanisms by which programmers can extend the system's automatically extracted understanding. We have been working on a cooperative program understanding environment that does just that 5]. Our automated program understanding tool initially extracts what it can automatically, and then allows programmers to add to this understanding graphically. In particular, programmers create object-oriented design elements and then link them to highlighted sections of code or to system recognized plans. The result of using the system is an object-oriented design that corresponds to the source code, along with detailed links to where various objects and operations are actually implemented. All of this information is stored in a single internal representation, so that programmers can query the resulting system/programmer-extracted program understanding without regard to what was extracted automatically versus what was extracted by hand.

6 Conclusions
We have been working on constructing a program understanding tool that can have an impact on maintaining real-world software systems. This has led us to focus on three issues. First, we have been working on a program understanding algorithm that scales. In particular, we have focused on an indexed-plan library, with an algorithm that can take advantage of these indexes. Second, we have been working on mechanisms by which programmers can easily specify programming plans by providing examples and editing the constraints the examples imply. Third, we have been combining automated understanding with a case-like tool that allows programmers to link design elements to source code. Our belief is that all of these issues must be addressed in order to successfully apply automated program understanding to real-world legacy systems.

References
1] Kozaczynski, Voytek; and Ning, Jim Q. Automated Program Understanding By Concept Recognition, Automated Software Engineering 1, 1 (March 1994), 61{78. 2] Kozaczynski, Voytek; Ning, Jim Q.; and Engberts, Andre. 1992. Program concept recognition and transformation. Transactions on Software Engineering 18, 12 (December 1992), 1065{1075. 3] P. Newcomb and L. Markosian, Automating the Modularization of Large COBOL Programs: Application of an Enabling Technology for Reengineering. In Proceedings of the Working Conference on Reverse Engineering, Baltimore, MD, (May 1993), 222-230. 4] Quilici, Alex. 1994. A memory-based approach to recognizing programming plans. Communications of the ACM 37, 5 (May 1994), 84{93. 5] Quilici, Alex; and Chin, David N. 1994. A cooperative program understanding environment. In Proceedings of the Ninth Knowledge-Based Software Engineering Conference. Monterey CA, pp. 125-132.

5


相关文章:
...practical automated program understanding. Worki....pdf
Toward practical automated program understanding. Working Notes of the Third_专业资料。It is an open question whether automated program understanding can become...
Toward an Understanding of the Motivation of Open S....pdf
Toward an Understanding of the Motivation of Open...We also discuss practical implications of our ...(Gnu Image Manipulation Program, http://www.gimp...
高考英语预测试题(9).doc
practical basis.Since then, computerized systems ...the third step simply consists of noting the ...(handsome) (直接地)toward me. (straight/ ...
2019届安徽省六安市高三下学期第九次月考英语试卷【含....doc
A. to change the working conditions of models B...A. practical B. reasonable C. acceptable___ D...the right track toward becoming confident people. ...
软件测试理论与实践 考试....pdf
cannot be directed toward specific segments of ...the testers have significant understanding of the ...6.Short Notes 6.1 What is a test case? ...
201403国外技术调研5组 (dslpart of the file).doc
(also called image understanding) is in be- ...consider the area of automated analysis of text....toward successful solution of imaging problems that...
OPRD 2001,5,508(重结晶条件筛选).pdf
toward the Development of Direct-Drop Processes ...G. Practical Process Research & DeVelopment; ...technique can provide understanding of the polymorphic...
2019-2020年高一英语上册 nit9 Technology(第一课时)教....doc
of how practical,yet fast computers should be ...toward a larger range of applications for cheaper...T: Right. The third sentence, if you have a ...
软件工程-chapter08_图文.ppt
notes Fault involving efficiency or correctness of ...working program Component drivers needed Stubs ...the testing is complete 8.7 Automated Testing ...
Applying the Precautionary Principle to.pdf
By its bias toward inaction, the strict version...alternatives, and use them instead if practical. ...3. Lack of understanding of the technology will ...
WinterProblems[1]_图文.pdf
This practical advice, helpful at any time of ...photographs and detailed notes on their life ...orchids and sends it up toward the main exhaust...
自动化钻井_图文.pdf
the repetitive nature of automated drilling to ...of engineers is leaning decidedly toward science. ...those involved in the section are working toward ...
The Boy and the Bank Officer.ppt
charges on loans ran as high as one third. The...The amounts of the bank notes issued depended on...The attitude of the author’s friend toward bank...
Research Experience.pdf
(program comprehension; requirements engineering; ...Toward a Synthesis", Annals of Software Engineering...Practical Introduction to Getting started in the ...
D004999.pdf
of the type of practical clinical content created...Many healthcare enterprises are working toward ...Linde, Naming notes: transitions from free text ...
Toward the semantic geospatial web.pdf
Toward the semantic geospatial web_专业资料。With ...understanding; and the processing of geospatial ...2489, Lecture Notes in Computer Science, Springer...
A Believable Agent for First-Person Shooter Games.pdf
maintain achieved subgoals while working on others...toward the current goal, it executes the ...would require three-dimensional image understanding....
Toward Human-Level Machine Intelligence.ppt
Toward Human-Level Machine Intelligence Lotfi A. ...The new world was the world of machine ...image understanding and pattern recognition ? ...
surface patches. In Eurographics '87, 1987..pdf
IFIP WG10.2 Advanced Research Working Conference,...volume 683 of Lecture notes in Computer Science,...Progress toward a system that can acquire pallets...
Online, interactive learning of gestures for human-....pdf
Much e ort is being directed toward the research...understanding of gestures it already knows in an ...practical use in modeling doubly stochastic systems...