One of the most remarkable features of the Dalvik virtual machine (the workhorse under the Android system) is that it does not use Java bytecode. Instead, a homegrown format called DEX was introduced and not even the bytecode instructions are the same as Java bytecode instructions. There was some discussion whether this makes Dalvik a Java virtual machine at all. My personal opinion is that this is a religious and legal dispute. Dalvik opcodes are clearly designed to support only the Java language. Compiling programs to Dalvik bytecode written in a language other than Java is certainly possible, as it was demonstrated with Java but neither the Java bytecode, nor the Dalvik bytecode makes any effort to support any language other than Java. This is in contrast with the .Net virtual machine where at least a claim has been made that the VM supports multiple languages - even though there are always limitations in any virtual machine that prevents running a particular language on a particular virtual machine.
Android comes with a disassembler called dexdump. The location of this tool is not intuitive, it runs on the Linux platform that hosts Android. Launch the emulator, and issue the following commands:
In order to use the tool, one has to move the DEX file to the Android platform (e.g. adb push in case of the emulator). Then one can say:
dexdump -d classes.dex
The output of this tool is not very easy to use, however. Take for example the bytecode compiled from the following switch statement.
000418: 2b02 0c00 0000 |0000: packed-switch v2, 0000000c // +0000000c
00041e: 12f0 |0003: const/4 v0, #int -1 // #ff
000420: 0f00 |0004: return v0
000422: 1220 |0005: const/4 v0, #int 2 // #2
000424: 28fe |0006: goto 0004 // -0002
000426: 1250 |0007: const/4 v0, #int 5 // #5
000428: 28fc |0008: goto 0004 // -0004
00042a: 1260 |0009: const/4 v0, #int 6 // #6
00042c: 28fa |000a: goto 0004 // -0006
00042e: 0000 |000b: nop // spacer
000430: 0001 0300 faff ffff 0500 0000 0700 ... |000c: packed-switch-data (10 units)
The jump table used by the packed-switch instruction is not disassembled at all, it is not even dumped entirely. The same problem applies to fill-array-data tables and there are further restrictions.
I decided therefore to create a more comfortable disassembler and here is the first cut.
Access the dedexer project's page on SourceForge.
This tool is easier to use than dexdump for many reasons. For starter, it is a standard Java program that runs on the usual JVMs. Its format is much more readable and is familiar to those who know the Jasmin syntax. For example the previous fragment is disassembled like this by dedexer:
.method public calc1(I)I
ps418_422 ; case 0
ps418_426 ; case 1
ps418_42a ; case 2
In addition, individual file is created for each class, along with the directory structure representing the package structure.
This is not a full decompiler, however. One has to know the Dalvik opcodes in order to work with the tool. This opcode list has been extended and maintained as dedexer was developed and is now in sync with the disassembler. You will see some unknown opcodes in the list. I have not encountered those instructions "out in the wild" and the disassembler does not recognize them either. If you see any of those, send me the DEX file so that I can analyse it!
This is a simple tool and is not without limitations. The most painful one is that the tool does not process the debug and annotation information in the DEX file. Array data dump could also be better. I am sure that the feature most people would like to see is a bridge toward Java class files but that is far away. Jasmin will be able to generate Java class files once the backward conversion from Dalvik opcodes to Java bytecode is provided but that's a complex task so don't hold your breath. The condition I set for myself as release condition is that the tool is able to disassemble the DEX file in framework.jar. It is able to, so I guess, the tool may be of use for others too. Enjoy!