x86 Opcode generation

by Paul Hsieh
On a lark, one day I decided to I wanted to write an assembler that simply sat on top of debug.com which could do the instruction encoding for me, while I wrote a layer on top to compute address labels, and some of the more rudimentary MASM instructions. For some reason, I always seem to want to do this program in Microsoft's GWBASIC interpreter. Probably because, this is a simple string manipulation program, and the real bottleneck is in spawning debug (which is unavoidable given my design), not in any computations that it does.
Anyhow, its not such a great assembler, but it can get really simple jobs done, and has replaced a86 for me. I decided that to improve it, I would have to take debug out of the loop (and move to a more respectable language like C of course.) To do that I wrote a program (in GWBASIC again) to feed debug with all possible opcode combinations and simply store all its opcode generations. I also used a variant of this program to produce all ASCII decodable opcode combinations (there was a contest to write the shortest functional completely ASCII .COM file posted to comp.lang.x86; I submitted the shortest program, but it violated the originally stated rules by not executing on older machines due to self modifying code. See "tiny assembly gems" for details.)
As I was developing this program I ran into a few very interesting stumbling blocks. First I quickly learned that I had to detect address and immediate operands as a single ecoding (otherwise I would be getting back 65536 encodings of mov ax,0000 mov ax,0001 mov ax,0002 etc.) I then had to detect relative addresses for call and jmp. Then there were the other numerically operanded instructions like ret, out and loop.
But the one thing that really surprised me was that debug.com turns out to have a bug. Some 8087 undefined FPU opcodes cause debug.com to output a blank dissassembly instead of its usual "???" for undefined opcodes (compare the opcode D0 30 to D9 E2.) This is clearly a slip up by Microsoft. Anyhow, I've presented the basic opcode generation program below, at the end of this page. Be warned that to execute it you require a basic interpreter (It has been tested with qbasic and GWBASIC) or compiler, and about an hour and half (on my Pentium) of time to kill. To speed it up you can put command.com (set COMSPEC appropriately too), debug.com and the basic interpreter in a ramdisk.

Important Notes:

  • If you use the program below that results in a software product of any kind please credit me. Other than that, it is public domain; you may even modify it as you see fit.
  • I will take no responsibility for potential damage to your PC that this program may cause. The user should be particularly wary of the bug I mentioned in the description above as I have not characterized it in any way beyond what I have stated.
  • I am not naive. I realize that debug.com only gives 8088/8087 opcodes.
  • The bug I refer to above is not just debug.com's inability to handle opcodes for the 286 and above processors.
  • Yes the code is about as hacky and unstructured as you can get. I never wrote it with elegance in mind.
  • Yes, I realize that in a sense this is redundant with this opcode site
  • Yes, I like BASIC. Leave me alone! Wanna fight? I'll take you on in any language you chose! C'mon! Put 'em up damnit! :o)

1 '
2 ' Copyright 1996 Paul Hsieh.  All rights reserved.
3 '
4 ' This program is public domain, subject to the conditions that any use made
5 ' of it that results in a derivative work or product must credit the author,
6 ' Paul Hsieh and that this source never be distributed without these comments
7 ' appearing intact at the top of the program.
8 '
10 DIM AR[16]:KEY OFF:H$="0123456789ABCDEF"
20 A$="00 00":OPEN "INST.OUT" FOR OUTPUT AS #2
30 OPEN "oc.src" FOR OUTPUT AS #1
40 PRINT#1,"a"
50 PRINT#1,"db "+A$+" 00 00 00 00 "
60 PRINT#1,""
70 PRINT#1,"u100 10f"
80 PRINT#1,"q"
90 CLOSE #1
100 SHELL "debug < oc.src > oc.out"
110 OPEN "oc.out" FOR INPUT AS #1
130 CLOSE #1
140 B$=MID$(A$,25,255)
150 C$=MID$(A$,11,14):E$=C$:IF C$="" THEN C=2:GOTO 190
160 C=0:WHILE ASC(C$)<>ASC(" ")
170 AR[C]=VAL("&h"+LEFT$(C$,2)):C$=MID$(C$,3,255):C=C+1
180 WEND:GOSUB 250:PRINT#2,E$;":";B$
190 FOR T=C TO 16:AR[T]=&H0:NEXT
200 C=C-1:AR[C]=AR[C]+1:IF AR[C]>&HFF AND C>0 THEN AR[C]=&H0:GOTO 200
230 A$="":FOR I=0 TO 15:A$=A$+HEX$(AR[I])+" ":NEXT I
240 GOTO 30
250 OP=0:X=INSTR(B$,","):IF X=0 OR MID$(B$,X+1,1)="[" THEN GOTO 290
260 D$=MID$(B$,X+1,255)+"     ":IF INSTR(H$,MID$(D$,2,1))=0 THEN 290
270 OP=1:IF INSTR(H$,MID$(D$,4,1)) THEN B$=LEFT$(B$,X)+"imm16":OP=OP+1:GOTO 290
280 B$=LEFT$(B$,X)+"imm8"
290 X=1
300 X=INSTR(X,B$,"]"):IF X=0 THEN GOTO 340
310 IF INSTR(H$,MID$(B$,X-1,1))=0 THEN GOTO 340
320 OP=OP+1:IF INSTR(H$,MID$(B$,X-3,1))=0 THEN B$=LEFT$(B$,X-3)+"ofs8"+MID$(B$,X,255):GOTO 340
330 OP=OP+1:B$=LEFT$(B$,X-5)+"ofs16"+MID$(B$,X,255)
340 REM
350 IF X>0 THEN X=INSTR(X+1,B$,"["):IF X>0 THEN GOTO 300
360 X=INSTR(B$,":"):IF X<6 THEN GOTO 380
370 IF OP=0 AND INSTR(H$,MID$(B$,X-1,1)) AND INSTR(H$,MID$(B$,X+1,1)) THEN B$=LEFT$(B$,X-5)+" abs16:16":OP=OP+4
380 IF OP=0 AND LEFT$(B$,3)="RET" AND INSTR(H$,MID$(E$,5,1)) THEN OP=OP+2:B$=LEFT$(B$,4)+" imm16"
390 IF OP<>0 OR LEFT$(B$,1)<>"J" THEN GOTO 420
400 IF INSTR(H$,MID$(E$,5,1)) THEN OP=OP+2:B$=LEFT$(B$,3)+" rel-addr16":GOTO 420
410 IF VAL("&H"+LEFT$(E$,2))<&HFE THEN OP=OP+1:B$=LEFT$(B$,3)+" rel-addr8"
420 IF OP=0 AND LEFT$(B$,4)="CALL" THEN IF INSTR(H$,MID$(E$,5,1)) THEN OP=OP+2:B$=LEFT$(B$,4)+" rel-addr16"
430 IF OP=0 AND LEFT$(B$,3)="INT" AND INSTR(H$,MID$(E$,3,1)) THEN OP=OP+1:B$=LEFT$(B$,4)+" imm8"
440 IF OP=0 AND LEFT$(B$,2)="AA" AND INSTR(H$,MID$(E$,3,1)) THEN OP=OP+1:B$=LEFT$(B$,4)+" imm8"
450 IF OP=0 AND LEFT$(B$,4)="LOOP" AND INSTR(H$,MID$(E$,3,1)) THEN OP=OP+1:B$=LEFT$(B$,6+(MID$(B$,5,1)=CHR$(9)))+" rel-addr8"
460 IF OP=0 AND LEFT$(B$,3)="OUT" AND INSTR(H$,MID$(E$,3,1)) THEN OP=OP+1:B$=LEFT$(B$,3)+" imm8,AL"
470 REM

Copyright © 1996, Paul Hsieh All Rights Reserved.