当前位置:首页 >> 计算机软件及应用 >>

On object initialization in the Java bytecode


On object initialization in the Java bytecode
Abstract
Java is an ideal platform for implementing mobile code systems, not only because of its portability but also because it is designed with security in mind. Untrusted Java programs can be statically anal yzed and validated. The program?s behavior is then monitored to prevent potentially malicious operations. Static analysis of untrusted classes is carried out by a component of the Java virtual machine called the verifier. The most complex part of the verification process is the dataflow analysis, which is performed on each method in order to ensure type-safety. This paper clarifies in detail one of the tricky aspects of the dataflow analysis: the verification of object initialization. We present and explain the rules that need to be enforced and we then show how verifier implementations can enforce them. Rules for object creation require, among other things, that uninitialized objects never be used before they are initialized. Constructors must properly initialize their this argument before they are allowed to return. This paper also deals with initialization failures (indicated by exceptions):the object being initialized must be discarded, and constructors must propagate initialization failures. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Java bytecode; Object initialization; Dataflow analysis; static analysis; java security

1. Introduction
The Java architecture is particularly well-suited for implementing mobile code systems. A mobile code architecture allows a computer to fetch a program (or parts of a program) from a network source and execute it locally.However, security is a critical aspect of mobile code architectures.The very essence of mobile code is to execute a program that originates from a remote source. This is inherently dangerous because it is not known what actions that program will take. By executing the mobile code, we are allowing it to perform operations on our machine and we are giving it access to our local resources. Java is especially well-suited for implementing mobile code systems for three reasons: *Java source is compiled into a platform-independent intermediate form called Java bytecode. Java bytecode is then interpreted by the JVM (Java virtual machine). This makes Java bytecode completely portable, which means a piece of Java code in

compiled form should run on any receiving machine. *It is dynamically linked: the JVM will load classes from different network sources as they are needed and will link them into the program while it runs. *The Java architecture is built with security in mind:its design makes it possible to enforce sufficient security to make mobile code safe and practical.Currently, the most Popular Manifestation of Java mobile code is applets. A JVM (bytecode interpreter) is incorporated in web browsers. Web pages can then include links that point to the compiled (bytecode) form of programs which are called applets. The applet can then be loaded by the browser and executed locally with no special effort on the user?s part. The verifier is a key component of the Java security architecture.Its role is to examine compiled classes as they are loaded into the JVM in order to ensure that they are well formed and valid. It checks that the code respects the syntax of the bytecode language and that it respects the language rules. Another component of the Java security architecture,called the security manager, monitors access to system resources and services. The security manager is a security layer, which goes on top of the verifier and relies on its Effectiveness. The most complex step of the verification process performed by the verifier requires running a dataflow analysis on the body of each method. There are a few particularly tricky issues regarding the dataflow analysis. In this paper, we focus on the issues relating to the initialization of: *Issues relating to object creation: A new object is created in two steps: space is allocated for the new object, and then it is initialized. When performing the dataflow analysis, the verifier must ensure that certain rules are respected: the constructor used to initialize an object must be appropriate, an object must not be used before it is initialized, an object must not be initialized more than once and initialization failures (indicated by exceptions) must be handled properly. *Issues relating to constructors: The constructor is responsible for initializing a new object. The first part of the constructor?s work performs initialization from a typing point of view, which implies directly or indirectly calling a constructor from the superclass. The rest of the constructor performs application-specific initialization. The verifier must ensure that a constructor properly initialize the current object before it returns, that it does not use the current object in any way before calling the superclass constructor and that it propagates any initialization failure occurring in the

superclass constructor. The Official documentation on the verifier, provided in(Ref. [1], Sections 4.8 and 4.9) and in Ref. [2], is relatively sparse; the portions discussing object initialization are very brief, vague, and leave out some important issues. Independent work presented in Ref. [3] has clarified many aspects. Freund and Mitchell have extended the formalization of a subset of the Java bytecode language introduced in Ref. [4].They used a type system to describe the verifier?s handling of object initialization. Our paper reviews and explains the rules related to object initialization and discusses how a verifier implementation can enforce them. We also touch on a few issues not discussed in Ref. [3]. Exceptions thrown during object initialization indicate initialization failures and must be handled properly, both inside and outside of a constructor. We also provide a comprehensive, intuitive explanation of how the rules for object creation can be enforced with minimal effort. We assume that the reader has some knowledge of the Java bytecode language, as well as a basic understanding either of dataflow analysis in general or of the particular analysis technique used by the Java bytecode verifier. The unfamiliar reader may consult the following references for more complete information: for the Java language the reader may refer to the official specification of the language [5].The best way to learn Java or to find a more understandable explanation of its concepts is to read Ref. [6]. For details on the Java standard library, see Ref. [7]. The workings of the JVM and the bytecode instruction set are described in the official JVM specification [1]. For a lighter approach, see Ref. [8]. To gain a good understanding of the Java bytecode language, it is necessary to experiment with it. Two tools are essential: a class file disassembler, that will print out a class file (and in particular the bytecode) in a readable format Sun?s javap tool, which comes with the JDK can be used for this, although other alternatives are available. A bytecode assembler, that produces class files from some source with a manageable syntax. Otherwise, constructing binary class files by hand would be difficult and time consuming. A great solution is the excellent jasmin[9]. This paper is organized as follows. Section 2 provides a brief overview of the dataflow analysis in order to show the context in which verification of object initialization occurs.Section 3 deals with the creation of new objects, while Section 4 explains the special requirements imposed on constructors. Each of these sections first presents the necessary rules that the verifier must somehow enforce, and then discusses

how an implementation could achieve the desired result. Section 5 shows that constructors may “leak” or“save” a copy of their this reference, which means that it is possible for incompletely initialized objects to be actually used. Section 6 lists some of the related work. Some concluding remarks are ultimately sketched as a conclusion in Section 7.

2. Dataflow analysis
The Java bytecode verifier ensures that the classes loaded by the JVM do not compromise the security of the system,either through disrespect of the language rules or through compromise of the integrity of the virtual machine. The verifier validates many syntactical aspects of the class file. It validates field and method declarations. It makes some checks relating to the superclass. It verifies references to other classes, other methods and fields and it enforces access restriction mechanisms (like protected, private and final). The body of each method is examined in turn: each bytecode instruction and its operands are validated. The most complex yet most interesting part of the verification process is the dataflow analysis. It is performed independently on each method. The dataflow analysis checks that each bytecode instruction gets arguments of the proper type (from the stack or from the registers), detects and prevent overflows and underflows of the expression evaluation stack and ensures that subroutines are used consistently. The dataflow analysis also must check that object initialization is performed correctly. This paper will attempt to clarify the properties that need to be enforced on object creation and constructors. We will also propose ways in which a verifier implementation can enforce those rules. In order to perform the dataflow analysis, it is necessary to keep track of the type of each value on the stack and in the registers at each program point. We will assume that each instruction of a method constitutes a program point, although it is possible to use fundamental blocks of instructions as program points. The type, which is recorded by the dataflow analysis for a given location at a given program point must be consistent, irrespective of the execution path used to reach that program point. When there is a conflict because two or more paths would yield different types of values for the same location, then we record for that location a common supertype of all the types that could actually occur. For instance, if at a given program point a certain location could contain either an instance of FileInputStream or an instance of ByteArrayInputStream, the dataflow analysis “merges” the two types and records the

type Input-Stream instead. If there are no common supertypes for the possible types in a certain location, then the type unusable is used, indicating that the value cannot be used by the following instructions. This generalization of types does imply a loss of information and precision. This is what makes the analysis conservative, in the sense that it is pessimistic. Types used in the dataflow analysis are primitive types (single-word int or float or double-word long or double) and reference types (the types associated to references to objects or arrays). A reference type may be a class, interface or array type (which specifies a base type and a number of dimensions). The type return Address will be used to describe the return address to a subroutine, as created by the jsr instruction. The special type named unusable is used to mark uninitialized registers. The special reference type null is used to represent the type of null references produced by the aconst_null instruction. Also note that implementations will generally use other special types to represent allocated but not yet initialized objects.

3. Object creation
Creating a new object is done in two steps. First, space for the object is allocated through the use of the new instruction,which returns a reference that points to the newly allocated memory space. Then, the object is initialized by invoking one of its constructors (a method named <init>). For example, the Java statement new String() is translated to the following bytecode instructions: ; allocate space for String and push ; reference to it onto the stack new java/lang/String ; duplicate top stack item (reference to ; newly allocated space) dup ; call String.String() constructor, uses ; up one of the references to newly allocated ; space as “this” argument. invokespecial java/lang/String/ <init>()V

; This leaves a reference to the new ; String object on the stack. The constructor is responsible for putting the object in avalid state. Until initialization of the new object completes,its state remains undefined and may be inconsistent. The language semantics therefore disallows using a newly allocated object before it is initialized. Enforcing this is one of the verifier?s responsibilities. The verifier must keep track of which object is initialized and which is not, ensure that proper constructors are used to initialize new objects and make sure that uninitialized objects are not used before they are initialized. This is one of the tricky points of the dataflow analysis.Ref. [1] covers this aspect briefly. Ref. [3] presents a detailed analysis and formal specification of the language rules related to object initialization.Unfortunately, neither Refs. [3] nor [1] discuss the interaction between object initialization and exception handlers. We will first discuss the rules that the verifier should enforce, and we will then consider how a verifier implementation can enforce them. 3.1 Rules The verifier must enforce the following properties: * An object must not be used before it is initialized. *An uninitialized object must be initialized by one of the constructors declared in its class. A constructor from another class cannot be used. Notice that methods named kinitl are not inherited. *An object must not be initialized more than once. *If an exception is thrown by the call to the instance initialization method, then the new object must not be used because its initialization is incomplete. We first discuss what it means for an uninitialized object (or rather a reference to it) to be “used”. The reference pushed onto the stack by the new instruction should be considered to have a special type, indicating that the object it points to is not initialized. The verifier must allow moving and copying the reference on the stack and into registers. Any other use of the reference must be disallowed. To be precise: Copying the reference to and from registers using aload and astore is permitted. Moving the reference around on the stack through swap, pop and its variants is permitted. Duplicating the reference through dup and its variants is also allowed. Putting the reference in an object or a class field through putfield or putstatic is not allowed. Accessing fields of the uninitialized object itself (through getfieldor

putfield) is not allowed either. This means that the new reference is unacceptable as either of the two arguments of putfield. The reference must not be passed as a parameter to amethod or used to designate the object on which a method is called. It is therefore disallowed as any of the parameters of invokevirtual, invokespecial and invokeinterface, except of course that an<initl>methods can be invoked on it by invokespecial. The reference may not be thrown as an exception byathrow. The current method may not return the new referencethrough areturn. The new reference may not be stored into an arraythrough aastore. The reference’s type may not be checked through checkcast or instanceof. The monitor of the new object may not be accessed through monitorenter or monitorexit. A newly allocated object can be initialized by calling oneof its constructors (instance initialization methods, named <init>. Only the invokespecial instruction may be used to invoke such methods. When the constructor returns, the object is considered to have been properly initialized. Classes may provide several constructors (methods named <init> with different signatures. There is no restriction as to which constructor should be called. In fact, the class being instantiated might not have been linked yet and the verifier might not even know which constructors are available: existenceof the constructor will be checked during resolution, in the same way as any other method invocation. Invoking a method named kinitl is a special case for invokespecial. The verifier should validate the parameters being passed to the method as it would normally. The reference indicating on which object the method is being invoked should be a reference to an uninitialized object of the proper type: that is, a reference to an uninitialized object of the same type as the class from which the <init>method is taken. Suppose a class named C is being instantiated. An instruction of the form new C has been used to allocate space for the new object. It can be initialized by calling one of the <initl> methods of class C on the reference returned by new: invokespecial C/,<init> . (…)V The class from which the kinitl method is taken must correspond to the target class of the new instruction that created the reference. Hence new C

invokespecial D/<init>()V is not acceptable, even if D happened to be C?s superclass. This is not to simple as it is possible to have several different references to uninitialized objects on the stack at one time. The following Java statement, for example, new BufferedReader(new InputStream Reader(System.in)); is compiled to ; Allocate space for BufferedReader new java/io/BufferedReader dup ; Allocate space for InputStreamReader new java/io/InputStreamReader dup ; Get in field of System (System.in) getstatic java/lang/System/in Ljava/io/ InputStream; ; Call InputStreamReader?s constructor, ; taking System.in as parameter. invokespecial java/io/InputStreamReader/ kinitl(Ljava/io/InputStream;)V ; Call BufferedReader constructor, taking ; reference to properly initialized ; InputStreamReader as parameter. invokespecial java/io/BufferedReader/ kinitl(Ljava/io/Reader;)V ; Leaves a reference to the properly ; initialized BufferedReader on the stack. There can even be references to multiple uninitialized instances of the same class on the stack at the same time.Consider for example the Java statement: URL u . new URL ( new URL (“http”,“myhost”,

8000, “/dir1/dir2/page.html” ), “../index.html”); In this example, two distinct objects are created and both need to be initialized independently. The corresponding bytecode would be: ; Outer URL new java/net/URL dup ; Inner URL new java/net/URL dup ldc “http” ldc “myhost” sipush 8000 ldc “/dir1/dir2/page.html” ; Initializing innermost URL invokespecial java/net/URL/kinitl (Ljava/lang/String;Ljava/lang/ String; ILjava/lang/String;)V ldc “../index.html” ; Initializing the other URL, using the ; innermost initialized URL as one of ; the parameters. invokespecial java/net/URL/kinitl (Ljava/net/URL;Ljava/lang/String;)V ; Assuming that variable u is contained ; in register 1 astore_1 We discuss a way to cope with this situation inSection 3.2. Once the instance initialization method returns, the objectis considered to have been initialized. The type of the references to the object should be changed to show

the real type of the initialized object. This will make it possible for the references to be used normally as argument to various bytecode instructions. The only complication here is that the reference to the new object may have been copied to many stack locations and registers. Consider the bytecode: new C dup astore_1 dup astore_2 dup dup invokespecial C/kinitl()V This code leaves two references to the new object in registers 1 and 2 as well as two references on the stack. This creates an aliasing problem. When the invokespecial instruction completes, the type of each of these references should be changed from the uninitialized type to type C. When the invokespecial instruction is used to call a method named kinitl (a constructor), the this argument it receives must not be a reference to an already initialized object. This ensures that an object will not be initialized more than once. Finally, if an exception is thrown by the call to a constructor, then it can be assumed that the initialization process terminated abnormally. In that case, there is no guarantee that the object was properly initialized. Any use of an incorrectly initialized object must be disallowed. It might not even be safe to try to initialize the object again, therefore invoking a second constructor on the object should not be permitted. The object practically becomes worthless as nothing can be done with it. If the exception is not caught,then the problem is avoided as execution of the current method terminates and the reference to the incorrectly initialized object is lost. Any handler that might catch an exception thrown during the invokespecial instruction that calls a constructor must be prevented from using the incorrectly initialized object. The best way to do this is probably to change the type of all references to the object to the type indicating an unusable value. As the stack is discarded when an exception is caught, only references in registers are affected.1 The JVM specification (Ref. [1], Section 4.9.4) imposes the following requirements:

A valid instruction sequence must not have an uninitialized object on the operand stack or in a local variable during a backward branch, or in a local variable in code protected by an exception handler ora finally clause. This restriction is not necessary. The strategy it suggests is impractical and has been abandoned. 3.2 Enforcing the rules Devising an implementation to verify the rules for object initialization is tricky. There are subtle issues, which make the analysis of proposed solutions rather complex. Fortunately, once defined, the solution is easy to implement. New types must be used to distinguish uninitialized objects from other objects, so that uninitialized objects are rejected as arguments to most instructions. The types used to represent uninitialized objects must somehow *include or point to the type that the object will have when it is initialized, so that the dataflow analysis can verify that a constructor invoked on the uninitialized object is appropriate; *allow distinction between multiple uninitialized instances of the same class, so that when an object is initialized, only the references pointing to that object have their type changed. Refs. [1] and [3] agree on roughly the same solution. The uninitialized type should include the instruction number or the offset of the new instruction that created the uninitialized object. Thus, the type that the object should have when it is initialized can be obtained from the uninitialized type by consulting the operand of the associated new instruction. Alternatively, both the instruction number of the new instruction and the ultimate type of the object can be included in the uninitialized type. We observed in Section 3.1 that it is possible to have several distinct uninitialized instances of the same class in existence on the stack and in registers at the same time. However, the dataflow analysis will ensure that each of these instances has been created by different new instructions. Thus, associating the instruction number of the new to the uninitialized type it creates suffices to make the types of two uninitialized objects distinct, even if both objects are instances of the same class. The verifier can easily decide whether the constructor being called to initialize an object is appropriate. If an <init> method is called on an uninitialized object of type uninit(i), then the class from which the kinitl method is taken must be the same as the class specified by the new instruction at instruction number i. In the typing state

resulting from the initialization, all occurrences of the uninit(i) type on the stack or in the registers are changed to the appropriate initialized type. If an exception occurs during initialization, then the object that was being initialized must be discarded. Therefore, a slightly different resulting state is merged in any exception handler successors of the invokespecial instruction which called the constructor: any occurrence of uninit(i) in the registers is changed to unusable. Let us go back to the example given in Section 3.1. Table 1 reports the stack state evolution while analyzing the class file. The constructor invocation at instruction 9 is appropriate because the class from which the constructor is taken is the same as that specified by the operand of the new instruction at instruction 3. In the state resulting from instruction 9 (associated to instruction 10), the type of any remaining occurrence of uninit(3) is changed and the occurrences of uninit(1) are left alone. Merging two uninitialized types produces the type unusable, unless both are exactly the same type. If both uninitialized types are the same (they refer to the same new instruction) then the merge produces that same uninitialized type. Merging an uninitialized type with an initialized type or with a primitive type produces unusable. Merging the null type with an uninitialized type could produce that same uninitialized type: attempting to initialize a null reference will trigger a run-time exception. However,there is no situation in which it would be useful to merge an uninitialized type with the null type. This possibility is never encountered in code produced by a Java compiler. As we will explain shortly, it might be simpler for the merge of an uninitialized type and null to produce the unusable type. One might think that a problematic situation could arise if a method managed to execute the same new instruction (with instruction number i) twice without initializing the object created by the first execution of new. Then there would be two identical uninit(i) types describing two distinct uninitialized objects. If one of them were to be initialized, the verifier would think that both have been initialized and it would therefore allow an uninitialized object to be used. However, except in one case, this situation cannot occur. Suppose a new instruction can be found at offset or instruction number i. The first time the dataflow analysis visits this instruction, there can be no reference of type uninit(i) anywhere on the stack or in the registers. Suppose the code following the new instruction leaves the uninitialized object in a register or stack location l. If the control flow gets back to the new instruction, then

the typing states from the first and second visit will be merged. Whatever type location l contained during the first visit to the new instruction will be merged with uninit(i) to produce the unusable type. Thus, the object that was created by the first visit to new and that was never initialized has effectively been forgotten: it can no longer be used in any way; it cannot even be initialized. This reasoning depends on the fact that merging uninit(i) with any type but itself should produce unusable. The case in which this reasoning does not hold is when the merging of uninitialized object types and the null type is defined to
1 new java/net/URL 2 dup 3 new java/net/URL 4 dup 5 ldc “http” 6 ldc “myhost” 7sipush 8000 8ldc“/dir1/dir2/ page.html” 9invokespecialjava String←int←String←String←uninit(3)←uninit(3)←uninit(1)←uninit(1)←… net/URL/<ini>l (Ljava/ lang/String;Ljava/ lang/String;ILjava/ lang/String;)V 10 ldc “../index.html” 11 invokespecial java/ net/URL/kinitl(Ljava/ net/URL;Ljava/lang/ String;)V 12 astore_1 URL←… URL←uninit(1)←uninit(1)←… String←URL←uninit(1)←uninit(1)←… ← uninit(1)← uninit(1)←uninit(1)←… uninit(3)←uninit(1)←uninit(1)←… uninit(3)←uninit(3)←uninit(1)←uninit(1)←… String←uninit(3)←uninit(3)←uninit(1)←uninit(1)←… String←String←uninit(3)←uninit(3)←uninit(1)←uninit(1) int←String←String←uninit(3)←uninit(3)←uninit(1)←uninit(1)←…

produce the uninitialized type rather than unusable. Thus if location l initially contained null, the merging that occurs on the second visit to the new instruction will produce uninit(i) rather than unusable. As merging uninitialized types with null is useless, the simple solution is therefore to define the merge so that it does produce unusable. The alternative is to add a check to be performed whenever a new instruction is visited. When a new instruction at instruction number i is visited, the

stack and registers should be searched and any occurrence of the type uninit(i) should be changed to unusable. Either way, a reference to an uninitialized object does not survive an execution path that returns to the new instruction that created it, and therefore there cannot be two distinct uninitialized objects with the same type. As noted in Ref. [3], subroutine polymorphism also creates a circumstance in which the above strategies do not hold. This is because the types contained in registers that the subroutine does not touch are not propagated beyond the subroutine?s exit point back to its call sites. Consider a subroutine in which a new instruction creates a new object but does not initialize it. Consider this scenario: *The subroutine is called a first time to create a new object. The reference to it is somehow returned (on the stack or via a register). *The reference to the uninitialized object is stored in a register over which the subroutine is polymorphic. *The subroutine is called a second time to produce a second object. Within the subroutine, the type of the register that contains the reference to the first uninitialized object will be set to unusable (as explained above) but this will not propagate beyond the ret instruction because the subroutine is polymorphic over that register When the second call to the subroutine returns, we have references to two different uninitialized objects created by the same new instruction and represented by the same type. The solution proposed in Ref. [3] is to change all uninitialized types on the stack or in registers to unusable when processing jsr and ret instructions. We think a simpler and more flexible solution would be to disable polymorphism on registers that contain uninitialized types: when processing jsr instructions, the modified or touched bits should be set on all registers that contain references to uninitialized objects.

4. Constructor requirements
Constructors, also called instance initialization methods, correspond to methods named <init> in the bytecode. In Java, constructors do not have a return statement; in the bytecode, methods named <init> must always have a signature ending with “V” which indicates a void return type. The constructor is responsible for putting the target object (its this argument) in a valid state so that it can be used as an instance of the class in which the constructor is declared. This is done by calling an alternate constructor (either from the same class or from the superclass) on the this argument, and then performing any

application-specific or class-specific initialization. Constructors are not inherited in subclasses. 4.1. Rules There are issues to consider when verifying constructors. Register 0 in an instance method invocation normally contains the this argument, which is a reference to the object on which the method is being called. The type of that reference is the class in which the method is declared. A constructor receives a parameter in register 0 in the same way, except that the object on which the constructor method is invoked is not initialized. The reference in register 0 should have some special type indicating this. The verifier must enforce the following rules for constructors: *A constructor is required to invoke either another constructor from the current class or a constructor from its direct superclass on the reference it initially received in register 0 (the this argument). We will refer to this invocation as “invoking the alternate constructor”. Each execution path in a constructor must do this, although all paths need not to call the same alternate constructor.Although a constructor is not allowed to return normally before calling another constructor on the current object, it may terminate its own execution at any point by throwing (and not catching) an exception. This rule applies to all constructors except that of java.lang.Object. In the case of Object, it can be assumed that the object received as this is an initialized object of type Object and no further constructors should be invoked. *The reference received in register 0 (as well as any copy made of that reference) may not be used until the alternate constructor has been invoked. Here “used” has the same meaning as in Section 3.1. Once the alternate constructor has returned, the type of the uninitialized object is changed to that of the current class. * If an exception is thrown by the invocation of the alternate constructor, then the current constructor is not allowed to return normally: it must terminate by either not catching the exception or by throwing an exception of its own, or possibly it might loop forever. As in the general case of constructor invocation, the this object becomes unusable and it is not permitted to make a second alternate constructor invocation attempt. The first property determines what it means to properly initialize an object. Initializing an object may require all sorts of operations: from putting some values into its fields to accessing other objects and classes. The only requirement, which

imposes structure and helps ensure that an object?s state is always valid, is that the current object must first be initialized by a constructor from the superclass before it can be accessed in any way. The only alternative is calling another constructor of the current class, thereby delegating the responsibility. As explained in Section 3.1, if an exception is thrown by the invocation of a constructor, then the state of the object that was being initialized becomes undefined and the object becomes unusable. Consider the following scenario. * Method M in class A creates a new object of type C: new C dup invokespecial C/<init>()V * S is the superclass of C. The <init>()V constructor for C looks like: ; get reference to current (uninitialized) ; object aload_0 ; call superclass constructor invokespecial S/<init>()V …with some additional code and an exception handler protecting the call to the superclass constructor. * Suppose that S?s kinitl()V constructor can, in some circumstances, throw an exception. Recall from Section 3.1 that an object is assumed to have been initialized upon return of the call to its constructor. Therefore, by returning normally, a constructor indicates that the initialization was performed successfully. If C?s constructor were to catch and handle an exception thrown in S?s constructor and then return normally, the method M would have no way of knowing that the initialization failed. For this reason, the last of the three proposed rules requires that, in such a situation, a constructor must not return normally: it must throw an exception, which will indicate to the caller that some problem occurred. The rules in Section 3.1 require that, even if the caller catches the exception, it still will not use the incorrectly initialized object. The rules in Section 3.1 apply to constructors in addition to the rules presented here. This means that if invoking the alternate constructor results in an exception, then the current object may not be used and no further initialization attempts may be made.

Note that these rules are actually more permissive than those governing the high-level Java language. As for the bytecode language, Java requires a constructor to either call an alternate constructor of the current class or a constructor of the superclass. In Java, the call may be implicit: it may be absent from the source but it will be generated by the compiler. As opposed to bytecode, the Java language requires that if an explicit call to another constructor is present, it must be the very first statement of a constructor?s code (Ref. [5], Section 8.6.5). This also implies that the call to another constructor cannot be protected by an exception handler. The Java bytecode, in contrast, cannot require the call to the alternate constructor to be the first thing done: some expressions may need to be evaluated to pass as arguments to the alternate constructor. Furthermore, the bytecode language does allow the call to an alternate constructor to be protected by an exception handler. This has added the complications, which the third proposed rule handles. To clarify how the type of the object being initialized changes with the return of each constructor, we go back to the above example scenario. The type of the object being initialized is seen differently by each constructor. When C’s constructor begins execution, it considers its this argument to be an uninitialized object destined to become of type C. The C’s constructor then calls S’s. From the perspective of S’s constructor, its this argument is an uninitialized object destined to become of type S. A constructor from S’ s superclass is then called. After its return, the this argument in S’ s constructor is considered to be of type S. Note that S’s constructor cannot use the new object as an object of type C. When S’ s constructor returns into C’ s constructor, then the type of the this argument of C’s constructor is finally considered to be C. 4.2. Enforcing the rules When verifying a constructor, some special type must be used to represent the uninitialized this argument: this type does not refer to a new instruction. This type could be completely different from uninit or it could simply be uninit with an invalid instruction number associated to it. The invocation of the alternate constructor is characterized by having that special type as its argument. When the alternate constructor returns, the special type is changed to the type of the current class. To enforce the special rules for constructors outlined in Section 4.1, we need to extend the state information associated to each instruction in the dataflow analysis: we add to the state information a flag named constructed. In the initial state of the method (associated to the first instruction), the flag is unset. When the constructor invokes an

appropriate alternate constructor to initialize its this argument, the constructed flag is set in the state resulting from that invokespecial instruction. A new check is added to the return instruction: within a constructor, verification must fail if the state associated to return does not have the constructed flag set. If an exception is caught during the invocation of an initialization method, then the state merged in the first instruction of the exception handler has all references to the uninitialized object replaced by the type unusable. When this happens within a constructor (as mentioned in Section 4.1) the constructor must not be allowed to return normally. This is accomplished by not setting the constructed flag in the state that gets merged into the exception handler. When merging two states, the resulting state has the constructed flag set only if both states being merged have that flag set. This means that all paths leading to a return instruction must have called an appropriate alternate constructor for the method to be accepted. This also implies that not even one path leading to return may have caught an exception during the invocation of the alternate constructor. An alternative is to use two flags: the constructed flag and a no_return flag. The latter forbids a constructor to return when the call to the alternate constructor resulted in an exception. The no_return flag gets set in the state of the first instruction of the exception handler. When merging two states, no_return is set in the result if any of the states being merged has that flag set. Verification fails if no_return is set in the state of the return instruction. Using two flags allows the verifier to provide a slightly more precise description of the cause of a verification failure. 5. Partially initialized objects We stated in Section 3.1 that, when a method calls a constructor to initialize a new object and an exception is thrown during the initialization, then the calling method cannot use the improperly initialized object. This is not completely true. This rule is simply a sort of contract between the constructor and the method calling it: the constructor signifies its failure to initialize the object by throwing an exception, and the calling method then makes all its references to the object unusable. A distinction can be drawn between type-related initialization and application-related initialization. When a constructor?s call to an alternate constructor returns, then as far as the JVM and the verifier are concerned, the current object has been properly initialized so that its type changes from uninitialized to the type of the current class. Subsequent code in the constructor considers that the type of the this

object is the current class. From the point of view of typing and type-safety, the verifier is satisfied that the initialization is successful as soon as the alternate constructor returns. Operations performed after the return of the alternate constructor affect the state of the object, not its type: the current object (or other related objects) is put in a state that the application expects and which it considers valid. If an exception is thrown during a constructor?s execution, then the caller?s references to the object being initialized become useless. However it is possible for the constructor to have saved a reference to the object being initialized after the return of the alternate constructor but before causing an exception to be thrown. A constructor can “save” or “leak” a reference to the object it is initializing by storing its this reference in another object?s field orby passing it via a method call. Consider the following example code: public class CSaver { public C c; } public class C extends ASuper { public C(CSaver s) throws SomeException { super(); s.c ? this …// more code here throw new SomeException(); } } An even more interesting strategy would have been for the constructor to save its this argument inside the exception object it is about to throw. In this example, it may be possible for the caller to C?s constructor (or for other parts of the application) to use the incompletely initialized instance of C if it can obtain the CSaver instance in which the constructor saved its this reference. From a typing point of view, the object?s type (and that of the saved reference) are indeed of type C since the superclass constructor has completed successfully. The JVM however cannot guarantee that the object has been put in a state that the application expects. There are good reasons for a constructor to pass its this argument to another

object. Consider for example a situation in which the application needs to keep track of all instances of a given class in a global hash table. It would also be impossible for a constructor to delegate a subsection of its initialization work to a helper method without passing it its this argument. However, developers must be careful of what a constructor does with its this argument. If a constructor leaks a reference to the current object, it must be certain that the current object?s state is acceptable to the application. The verifier is not concerned with an application?s functionality or internal consistency: it merely enforces type-safety. Note that this situation could also occur when a constructor calls the alternate constructor: the alternate constructor could call its own alternate constructor, leak or save its this reference, and then throw an exception. If the alternate constructor was taken from the superclass, then initialization of the object will remain incomplete: the object?s dynamic type will remain that of the superclass. Although it will never be possible to finish the initialization, the object will be usable as an instance of the superclass.

6. Related work
The Java Virtual Machine Specification [1] is the only relevant official documentation. Section 4.9 briefly presents Sun?s verifier implementation. This description is reproduced in Ref. [2]. In the JVM Specification, Section 4.9.4 in particular deals with the verification of object initialization. Some of the constraints listed in Section 4.8.2 are also relevant. The explanations presented in the specification are very brief, and it is difficult to make sense of them without further background. The specification also leaves out any information on the handling of exceptions during initialization. Finally, the specification hints at a strategy for ensuring that there are never two distinct uninitialized objects created by the same new instruction simultaneously present in a typing state. However, the specification is out-of-date and the verifier in current JDK versions uses a cleaner strategy (one of those suggested in this paper). In Ref. [4], the authors propose that a type system can be used to formally describe the verification process. They present a formalization of a small subset of the Java bytecode language and use a type system to verify Java subroutines. In Ref. [3], the authors extend the framework proposed in Ref. [4] to support the verification of object initialization. This work has significantly clarified the rules for object initialization. It has also uncovered a bug caused by the interaction of the strategy used to verify subroutines and object initialization. We discussed this problem and its

solution in Section 3.2. The bug was corrected in JDK version 1.1.5. There has been other attempts that deal with the verifier in general. In Ref. [8] (Section 5), the authors present a global description of verification. In Ref. [10], the author presents a formal description of the dataflow analysis framework applied to the Java verification problem. In Ref. [11], the author uses concurrent constraint programming to specify the verifier. The Kimera project [12] has developed an alternative implementation of the verifier. The goal of the Kimera project was to expose bugs in Sun?s verifier. An automated testing system was devised to compare the Kimera verifier and Sun?s verifier: mutated class files were fed to both verifiers and disagreements as to whether or not to accept a class file pointed to potential bugs or design flaws. Although some bugs were discovered using this method, the Kimera people unfortunately did not release the source code for their verifier, did not give any details about their alternative implementation and did not share their insight into the verifier. Other work, less closely related to the verifier, may also be useful. In Ref. [13], the author provides a formal description of a subset of the JVM instruction set. Other researchers have tackled the formalization of the type system of the high-level Java language: [14,15]. Computer Logic Inc. is developing a JVM implementation, which replaces bytecode verification by strong runtime checks [16]. The implementation uses ACL2, a formal modeling yet executable language. A lot of research is being conducted on Java security in general. Ref. [17] is an active research group based at Princeton University. Ref.[18] studies the security implications of dynamic linking while Refs. [19,20] deal with strategies to enforce security policies and improve the security manager. Also the Muri project published [21]. Li Gong presents the latest security strategies implemented in JDK 1.2 in Refs. [22,23]. For more general presentations on Java security, the interested reader may consult Refs. [24–27].

7. Conclusion
In this paper, we have explained the difficulties in enforcing proper object initialization. We have covered both the creation of new objects and the responsibilities of constructors. We have clearly stated the rules that need to be enforced and we have explained them in detail. We have also discussed how verifier implementations can enforce the specified rules.

Besides providing a comprehensive and understandable discussion of the subject, our most important contribution consists in covering the issue of handling initialization failures. When an exception occurs during the call to a constructor, the object that was being initialized must be discarded because it might not have been initialized properly. The verifier must take this into account when following execution paths leading from invocation of constructors into exception handlers. If, within a constructor, the call to the alternate constructor results in an exception, then the constructor must not be allowed to catch the exception and return normally: this would lead the caller to believe that the initialization was successful. This whole issue, although very important, has never been discussed before. Current JDK versions do handle such situations correctly. There are two other interesting aspects to our paper. The first is our discussion of two alternative strategies for ensuring that a typing state never contains two distinct objects created by the same new instruction. Our work should help to clarify matters and show how an implementation can achieve the desired result with minimum effort. The second is our discussion in Section 5 about the possibility of constructors “leaking” their this argument, thus allowing incompletely initialized objects to be used. Although analysis of the issues relating to object initialization is rather complex, the information provided in Ref. [3] and in this paper sums up to a thorough, in-depth discussion of the subject. Given that documentation on the verifier is relatively sparse and considering that the verifier is at the basis of the Java security architecture, we believe that our work of exploring and detailing tricky aspects of the verifier will prove invaluable to implementers and security analysts. Work in this field is by no means finished. The whole process of verifying Java bytecode needs to be explored in depth, detailed and analyzed. The work that has already been done has focused on some of the most complex issues. However, we still do not have a complete, well-organized picture of verification. As object initialization is one of the most complex aspects of the dataflow analysis, our explanations should be of great help in understanding the process of verifying Java bytecode. Many bugs have already been found and corrected in the verifier. Only through a thorough understanding of the verifier will we be able to ensure that it is secure, and thereby ensure the security of the Java platform.

Acknowledgements
We would like to thank Stephen N. Freund and John C. Mitchell (Stanford University), Raymie Stata and Mart n Abadi (Compaq Research Center) for

providing early access to their papers. Thanks to Stephen Freund for an enlightening discussion. Thanks to Gary McGraw (Reliable Software Technologies Inc.) for an insightful conversation.

References
[1] T. Lindholm, F. Yellin, The Java Virtual Machine Specification. Java Series, Addison Wesley, Reading, MA, 1996 (ISBN 0-201-63452-X. http://java.sun.com/docs/books/vmspec/index.html). [2] F. Yellin, Low level security in Java. In Fourth International World Wide Web Conference, MIT, December 1995. http://www.w3.org/ pub/Conferences/www4/Papers/197/40.html. [3] S.N. Freund, J.C. Mitchell, A type system for object initialization in the Java bytecode language, Proceedings of the ACM Conference on Object-Oriented Programming: Systems, Languages and Applications, OOPSLA’98, Vancouver, BC, Canada, October 1998. [4] R. Stata, M. Abadi, A type system for Java bytecode subroutines, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, California, USA, January 19–21, 1998, ACM Press, 1998. [5] J. Gosling, B. Joy, G. Steele, The Java Language Specification. Java Series, Addison Wesley, Reading, MA, 1996 ISBN 0-201-63451-1 (http://java.sun.com/docs/books/jls/index.html). [6] M. Campione, K. Walrath, The Java Tutorial: Object-Oriented Programming for the Internet (Java Series), 2nd ed., Addison Wesley, Reading, MA, 1998 [ISBN 0201310074 (http://java.sun.com/docs/books/tutorial/)]. [7] [8] [9] J. J. JDK Meyer, T. Downing, Jasmin 1.2 Java — Virtual Java documentation, Machine, assembler O ’ Reilly, interface, 1998. 1997 1997. http://java.sun.com/products/jdk/1.2/docs/index.html. (ISBN1-56592-194-1). Meyer, http://cat.nyu.edu/meyer/jasmin. [10] A. Goldberg, A specification of Java loading and bytecode verification.Kestrel University, 1997. http://www.kestrel.edu/~goldberg. [11] V. Saraswat, The Java bytecode verification problem. AT&T Research, 1997. http://www.research.att.com/~vj/bcv.html. [12] Kimera: a Java system architecture, 1997. http://kimera.cs.washington.edu/.

[13] Q. Zhenyu, A formal specification of a large subset of Java(tm) virtual machine instructions. Universitat Bremen, September 1997. http://www.informatik.uni-bremen.de/~qian/abs-fsjvm.html. [14] S. Drossopolou, S. Eisenbach, Java is type safe — probably, Proceedings of the 11th European Conference on Object-Oriented Programming, Jyva¨skyla¨, Finland, June 9–13 1997. Lecture Notes in Computer Science, LNCS. [15] T. Nipkow, D. von Oheimb, Java-light is type-safe — definitely, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January19–21 1998, ACM Press, 1998. [16] R.M. Cohen, Defensive Java Virtual Machine. Computational Logic Inc., University of Texas at Austin, USA, 1997. http://www.cli.com/ software/djvm/. [17] Secure internet programming: Home page, Princeton University, NJ, USA, 1998. http://www.cs. princeton.edu/sip. [18] D. Dean, The security of static typing with dynamic linking, Fourth ACM Conference on Computer and Communications Security, Zurich, April 1–4, 1997, pp. 18–27. http://www.cs.princeton.edu/ sip/pub/ccs4.html. [19] D.S. Wallach, D. Balfanz, D. Dean, E.W. Felten, Extensible security architectures for Java. Technical Report 546-97, Department of Computer Science, Princeton University, April 1997. [20] D.S. Wallach, E.W. Felten, Understanding Java stack inspection, Proceedings of IEEE Symposium on Security and Privacy, Oakland, CA, May 1998. IEEE. [21] I. Shin, Hostile applet summary, 1997. Muri project. http://theory .stanford.edu/muri/hostile.html. [22] L. Gong, M. Mueller, H. Prafullchandra, R. Schemers, Going beyond the sandbox: an overview of the new security architecture in the Java development kit 1.2, Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA, USA, December 1997, pp. 103–112. [23] L. Gong, R. Schemers, Implementing protection domains in the Java development kit 1.2, Proceedings of the Internet Society Symposium on Network and Distributed System Security, San Diego, CA, March 1998, pp. 125 – 134. http://java.sun.com/people/gong/papers/jdk12impl.ps.gz. [24] D. Dean, E.W. Felten, D.S. Wallach, D. Balfanz, Web browsers and beyond, in: D.E. Denning, P.J. Denning (Eds.), Internet Beseiged: Countering Cyberspace Scofflaws, ACM Press, 1997, pp. 241–269.

[25] L. Gong, Java security architecture, Sun Microsystems Inc., March 1998. http://java.sun.com/products/jdk/1.2/docs/guide/security/index.html. [26] G. McGraw, E. Felten, Java Security: Hostile Applets, Holes & Antidotes, Wiley, New York, 1996 (ISBN 0-471-17842-X). [27] S. Oaks, Java Security, O’Reilly, 1998 (ISBN 1-56592-403-7).


赞助商链接
相关文章:
Jbmp4.0常见错误
have different Class objects for the type javax...(JspServlet.java:275) javax.servlet.http.Http...提示错误: MalformedByteSequenceException:Invalid byte ...
学通JAVA WEB的24堂课课后答案代码
学通JAVA WEB的24堂课课后答案代码_理学_高等教育_教育专区。Web 课后代码: ...code = formatNumber(codeBit[0]) + } byte[] codeBit = value.getBytes(...
Struts2与Struts1的对比
Java 社区论坛,与 200 万技术人员互动交流 >>进入...codestruts.objectFactory = spring struts.object... excelStream = new ByteArrayInputStream(excel...
关于BSLC的工作总结(android+webservice)
//在Java平台上调用.NET Web Service的服务时,出现"服务器未能识别 HTTP 标头...如果服务器返回值的类型是 byte[] 的时候,使用 Object object = envelope.get...
Mysql ----Java 类型转换
用于 ResultSet.getObject()的 MySQL 类型和 Java 类型 MySQL 类型名称 BIT(1...(new in MySQL-5.0) byte[] TINYINT BOOL , BOOLEAN SMALLINT[(M)] [...
Java中File,byte[],Object间的转换
Java 中 File,byte[],Object 间的转换一、有两点需要注意: 1、Object 对象必须是可序列化对象 。 2、可序列化的 Object 对象都可以转换为一个磁盘文件;反过来...
100+经典Java面试题及答案解析
Java 语言支持的 8 中基本数据类型是: byte short...HashMap 需要一个 hash 函数,它使用 hashCode()和...finalize()方法是 Object 类的一个 protected 方法,...
基于JavaWeb的在线图书订购与打印管理系统外文翻译
2. Object Oriented Simply stated, object-oriented design is a technique for...8. Interpreted The Java interpreter can execute Java bytecodes directly on ...
java调用webService
java调用webService_计算机软件及应用_IT/计算机_专业资料。import java.net....XSD_BYTE);//设置返回类型 Byte result = (Byte)call.invoke(new Object[]{...
在Web Dynpro for Java中向KM上传文件
import java.io.ObjectOutputStream; import java.io.Serializable; public class Byte_File_Object { /** * 文件转化为字节数组 * @Author Sean.guo * @...
更多相关标签: