CALM - a common assembly language for microprocessors - 1 1 Introduction Microprocessors have existed now for over ten years. But each microprocessor has its own assembly language, which differs from others in many ways. This fact is still strange because all microprocessors resemble functionally each other. But a consistent assembly language could not be defined until some years ago, because new possibilities (operations, addressing modes) are offered with the development of new microprocessors. This evolution slowed down in the 16/32 bit era, because is has become more clear of what one expects from a modern microprocessor on operations and addressing possibilities. Hence the actual 16/32 bit microprocessors differ only slightly. One observes an increasing symmetry of the addressing modes, the data lengths and the data types in an instruction. One of these universal assembly language is presented here. Its name is CALM, which is the acronym for Common Assembly Language for Microprocessors. 1.1 What is CALM ? CALM is not a new programming language, but another notation of the instructions. Many years of experience have shown, that the defined instructions and their notation is sufficient to describe about 95% of the instructions of a microprocessor. The problem is rather simple: One finds a notation, which on the one side expresses precisely what the hardware does and, on the other side, is not too complicated. CALM fulfills these requirements in almost all cases. The exceptions are often unusual operations or addressing modes, which do not justifiy their own notation and therefore are represented with a simplified notation. An additional meta language is not necessary. 1.2 Advantages of a common assembly language Today assembly programming is relatively complicated and time- consuming. Besides the increased requirement to program in assembler, artificial difficulties are added: Each manufacturer uses other terms and defines its own notation of the instructions. This different terminology makes the user unsure, prevents comparisons of processors and renders more difficult processor changes. CALM defines a common syntax for instructions and pseudo-instructions. Thus CALM forms a common base for training and communication. For instance, an assembly program can also be understood by those which do not know the processor, for which the instructions has been written for. In addition a common notation allows an objective comparison of micro- processors. Comparison tests are more efficient and easier to do, as the learning of the specific assembly language is no longer necessary. Also processor changes are easier, as the notation does not change. Therefore the training cost can be substantially lowered. This is especially interesting with "upward-compatible" microprocessors. Furthermore a common syntax simplifies the assembler. One can construct an assembler, which consists of a processor independant main part and a processor specific part. For the main part, the following functions are integrated: handling of symbols and pseudo-instructions, evaluation of expressions, file handling and general functions for the text analysis. The syntax and the semantic of the instructions and the rules for the object generation are all determined in the processor part. CALM - a common assembly language for microprocessors - 2 1.3 Programming in assembler The question is whether the assembly programming is still useful today. More and more powerful hardware and software push the assembly programming continuously back. But the assembly programming is, in many cases, irreplaceable. One may think of hardware interfaces, optimisations of any kind, single-chip microcomputers, microprogrammed units, etc. However these tasks are increasingly realized by specialists and offer a subroutine call on the higher programming level. But wouldn't it be an advantage, even in these cases, to be able to write these few programs in a common assembly language? 1.4 The origins of CALM The existence of CALM is indirectly the product of the 8080. Many people found that the correspondent assembly language, defined by Intel, was too primitive. Many alternative solutions have shot up like mushrooms. The first version of CALM was defined in 1974 at the Swiss Institute of Technology in Lausanne and has proven its efficiency by consistency and usefulness in the practical use. Other 8 bit processors have been expressed in CALM successfully. With the 16/32 bit era, CALM has been extended and improved. CALM defines especially a consistent notation for the addressing modes and the data specifiers. The CALM version, presented in this article, has been defined in 1983 with DIN [1]. Under the worldwide efforts to define a common assembly language, the working group IEEE P694 should be mentionned. The group began its work in 1977 and published in 1984 the 18th version. Temporarily an intensive experience exchange took place. However P694 was too fixed to 4 and 8 bit micropro- cessors architectures and particularly underestimated the problems of an explicit notation for addressing modes. In addition many programmers still believe that an assembly language should consist of names as short as possible and with every instruction only the absolute minimum should be noted. 2 Instruction An instruction performs a precise action, which has been defined by the manufacturer of the microprocessor. The following text shows, how an instruction is decomposed in different parts and how these are named and which notation is used. 2.1 Composition of an instruction An instruction is composed of an operation code and operands. In addition, depending on the complexity, a condition code, an address specifier and a data specifier may also be added. This modular composition of an instruction and the independance of the meaning of every element, allow any extension. However this concept requires, that all details are necessary also in those instructions with limited possibilities. CALM defines 49 main instructions (fig. 1). With these instructions, about 95% of the instructions of a microprocessor can be expressed. The rest are special instructions, where the defined operation code of the manufacturer is taken, but the CALM notation is used for the operands, the condition code, the address and data specifiers. The instruction MOVE also fixes the order of the operands: source to destination (left to right). This order must also be kept for other instructions like SUB. CALM - a common assembly language for microprocessors - 3 Data transfer instructions MOVE source,destination move source operand to destination. PUSH source push on stack (= MOVE source,{-SP}). POP destination pop from stack (= MOVE {SP+},destination). CONV source,destination perform a data type conversion. CLR destination clear the dest.op. (= MOVE #0,destination). SET destination set the dest. op. (= MOVE #-1,destination). EX operand1,operand2 exchange two locations. SWAP sourcedestination the two halves of the operand are swapped. Arithmetic instructions ADD source,sourcedestination add two operands. source1,source2,destination ADDC same as ADD add two operands and the carry bit C. SUB same as ADD subtract the first operand from the second. SUBC same as ADD sub. the 1st op. and C from the 2nd op. ACOC same as ADD add the 1's compl. of the 1st op., 1 and C to the 2nd operand. Like SUBC but flags. NEG sourcedestination negate the source operand (2's complement source,destination or subtract from zero). NEGC same as NEG negate with carry, that means subtract the operand and the carry bit C from zero. MUL source,sourcedestination multiply the 2 source op. If the source1,source2,destination dest. op. is not expl. mentioned with its size, there will be an ambiguity in the size of the dest. (double/single). DIV divisor,dividendquotient divide the 2nd operand by the first divisor,dividend,quotient one. The destination is either the divisor,dividend,quotient,rest second or the third operand. Remainder may be an additional last op. INC sourcedestination increment by one the operand. DEC sourcedestination decrement by one the operand. COMP source1,source2 compare the second op. to the first one. Same operation as SUB, but no result is generated (only the flags are updated). CHECK lowerbound,upperbound,source check a value against 2 bounds. Logical instructions AND source,sourcedestination perform a bitwise logical AND of the source1,source2,destination the source operands. OR same as AND perform a bitwise logical OR. XOR same as AND perform a bitwise logical XOR. NOT sourcedestination perform a bitwise complement (inversion) of source,destination the source operand (1's complement). Shift instructions SR sourcedestination shift the source operand to the right. The amplitude,sourcedestination MSB is replaced by zero, the amplitude,source,destination carry bit C is replaced by LSB. ASR same as SR shift the source op. with sign to the right. MSB: unchanged. Carry bit C = LSB. SL same as SR shift the source op. to the left. LSB = 0. Carry bit C = MSB. ASL same as SR like SL but overflow bit V: has to be set, if the sign of the result is changed. RR same as SR shift the source op. to the right. The MSB is replaced by the LSB. Carry bit = LSB. RRC same as SR shift the source op. with C to the right. MSB = C. Carry bit C = LSB. RL same as SR same as RR in the other direction. RLC same as SR same as RRC in the other direction. Test instructions TEST source test the sign and zero value of the source source:bitaddress operand. With bitadressing only the zero bit Z is modified (value of the bit). Fig. 1. CALM Instructions (part 1) CALM - a common assembly language for microprocessors - 4 TCLR sourcedestination test and clear the operand (bit(s)). sourcedestination:bitaddress TSET same as TCLR test and set the operand (bit(s)). TNOT same as TCLR test and complement the operand (bit(s)). Program flow instructions JUMP jumpaddress jump to the given address. JUMP,c jumpaddress jump to address, when condition true. DJ,c sourcedestination,jumpaddress decrement 1st op. and jump to the address given by the 2nd op., when the condition c is satisfied. SKIP,c skip next instr., when condition is true. CALL same as JUMP subroutine call (save return add. on stack CALL,csame as JUMP and jump to subroutine at given address). RET return from subroutine (jump to the address RET,c popped from the stack). TRAP expression special subroutine call. TRAP,c WAIT wait for interrupt. HALT halt processor until an electrical reset signal restarts it. RESET reset peripherals. NOP no operation. ION interrupt on. IOFF interrupt off. Special instructions All special instructions like DAA, LINK, LEAVE, FFS. Fig. 1. CALM Instructions (part 2) 2.2 Operation code The operation code expresses the operation which is performed. The operation code is a name, which is derived from an english action verb. In some cases abbreviations and their combinations are used. The defined CALM operation codes can also be found in the manufacturer assembly language of a microprocessor. The only differences concern those operation codes where the manufacturer anticipates an addressing mode in their names (for example: BRANCH, LEA, XLAT, ADDI). However some little differences in the orthography are often COMP instead of CMP, MOVE and MOV, etc. 2.3 Operands An operand indicates who (register, memory location, etc.) participates in the operation and how this information is addressed. The operand gives this information with a defined notation. This notation must be sufficient to express all possible (and useful) addressing modes. Refer to chapter 3 for this notation. This general notation is still unique. As proof, one can compare the notations of the manufacturers for addressing modes: a given microprocessor has only a limited number of addressing possibilities. Often this leads to a short notation, which simplifies the work of the assembler, but a concept for addressing modes is not presented. A precise notation is also necessary, as it is no longer possible to include the kind of addressing in the operation code. Also the times, where the accumulator is the middle of a microprocessor, have definitively passed. More and more powerful addressing modes need a precise notation, because they may be combined with every operation. 2.4 Condition code A condition code expresses a condition bit state or a combination of condition bit states. Condition bits are changed by instructions. If the instructions on their side depend on condition bits, a condition code separated by a comma is added to the operation code. Condition codes are names which consist usually of 2 letters (fig. 2). CALM - a common assembly language for microprocessors - 5 General: EQ Equal NE Not Equal BS Bit Set BC Bit Clear CS Carry Set CC Carry Clear VS oVerflow Set VC oVerflow Clear MI MInus PL PLus After unsigned (logical) compare: LO LOwer LS Lower or Same HI HIgher HS Higher or Same After arithmetic (2's complement) compare: LT Lower Than LE Lower or Equal GT Greater Than GE Greater or Equal Special: PE Parity Even PO Parity Odd NV NeVer AL ALways NMO Not Minus One Some condition codes are equivalent: EQ=BC, NE=BS, CS=LO, CC=HS. Fig. 2. Condition codes Condition bits are usually stored in the flag register F. Additional condition bits (I, S, T) can exist and are often stored in a special status register S (fig. 3). letter/function use C carry binary adder and binary shifter H half-carry 4 bit carry (only valid for 8 bit processors) L link only if never used as a carry bit N sign set: MSB set (negative number) V overflow set: overflow with arithmetic numbers Z zero set: result is equal to zero I interrupt set: interrupts are allowed S supervisor set: supervisor mode T trace set: trace active Fig. 3. Flag and status register bits The names of the condition codes are frequently identical to those of the manufacturers. The only way that they are appended to the operation code, distinguishes both notations: In the manufacturer's notation, the condition code is directly appended to the operation code name (often only one letter). The CALM notation is more consistent, as any operation code may be combined with any condition code. So both pieces of information are clearly separated (fig. 4). JUMP,NE jumpaddress RET,LO CALL,CS jumpaddress DJ.16,NE CX,jumpaddress SKIP,LO DJ.16,NMO D0,jumpaddress Fig. 4. Examples with condition codes 2.5 Data specifier Data specifiers became only necessary with the 16/32 bit microprocessors. With these microprocessors one must specify the length and the type of the transferred data (fig. 5). U or nothing unsigned or not specified A arithmetic (2's complement) D decimal (unsigned, 2 digits per byte: BCD) F floating point (IEEE 754 format) O offset (2n-1 bias) S signed (sign and absolute value) --- ---------- ->( . )------------------------------------------->[ number ]-> --- v v v v v v ^ ---------- ( U ) ( A ) ( D ) ( F ) ( O ) ( S ) | `-----------------------------------' Fig. 5. Data specifiers CALM - a common assembly language for microprocessors - 6 The data length indicates how many bits take part in an operation. Depending on the instruction these specifiers are given either with the operation code or with every operand individually. The data length is directly indicated in bits. A dot is used as a separation sign. The data type says something about the processed data. The specifier consists of a letter, which is inserted between the dot and the data length. If there is no data specifier, it is assumed that the data doesn't need to be processed. Therefore a data type specifier is necessary when a certain unit, like BCD or FPU adder, is concerned (fig 6). MOVE.32 R6,R1 ADD.D32 R5,R0 CONV R1.A16,R2.F64 Fig. 6. Examples with data specifiers Also the manufacturer's notation of the instructions knows data specifiers. However only the data length is indicated independantly. The data length is indicated by a letter, which is either directly or separated by a dot appended to the operation code. The use of letters is still limited: On the one side there does not only exist data lengths of 8, 16 and 32 bits, and on the other side there will be never an agreement on their assignment. For the different data types the manufacturers define different operation codes, for example IMUL for MUL.A16, ABCD for ADDX.D8 and NEGL for NEG.F64. 2.6 Address specifier Address specifiers have become more and more necessary with the powerful addressing modes. Increasingly, an address is constituted by subaddresses, which do not have the full address length. Here, one must specify which length these subaddresses have and how they are extended to the full address length. The address specifiers give all this information. The programmer can reconstruct on the paper what is happening in the microprocessor, or in other words, how the address is built. He can, therefore, clearly indicate to the assembler, which addressing mode is to be chosen. U or nothing unsigned, zero extended A arithmetic, sign extended R relative, coded value sign extended added to the PC ---------- --- -------->--------->[ number ]------->( ^ )---> | ^ ---------- ^ --- | --- | | |->( A )---------------------' | --- | |->( R )-| | --- | `->( U )-' --- Fig. 7. Address specifiers An address specifier includes an address length and an address type (fig. 7). The address length gives the number of bits which are delivered by a subaddress. The address length is also indicated in bits. The circumflex is used as a separation sign. An address specifier is placed before a subaddress in an operand (fig. 8). CALM - a common assembly language for microprocessors - 7 JUMP 32^jumpaddress CALL A16^jumpaddress JUMP,NE R8^jumpaddress MOVE.32 32^{A0}+A16^{D0}+A8^offset,D1 MOVE.16 32^{A5}+R8^label,D4 Fig. 8. Examples with address specifiers The address type indicates how the address is interpreted. It distinguishes absolute and relative addressing. If an address does not deliver the full length, then the kind of extension to the full address length is indicated. With the absolute addressing this is possible with zero or sign extension. With the relative addressing, the program counter is added to the sign extended offset value. Address specifiers exist also in the assembly languages of the manufacturers. However they can often not be recognized because of their nonuniform syntax. Different address lengths are specified by pre- or postfixes (for example: LBRA, BRA.S, disp(A0,D3.L), BNE label:W). Address types are distinguished by different operation codes or special signs (for example: BSR, label(PC), @label). Fortunately, only for jump instructions, many manufacturers are still using two different operation codes (BRANCH and JUMP) to distinguish relative and absolute addressing. However this complicates and obstructs ultimately any extensions. 3 Notation of the addressing The preceeding text clarifies, that consistent rules must be formulated to unambiguously specify the type of addressing. For this, the programmer uses a simplified programming model, that generally contains the register structure and the general architecture of the memory cells. 3.1 Register and memory cells Register and memory locations consist of a number of bits. In micro- processors, multiples of 8 bits (= 1 byte) are usual. Hence register lengths of 8, 16 and 32 bits are constructed. Consecutive memory locations are numbered and that number is called an address. If a 16 bit word is placed in two consecutive locations, two different byte orders exist (fig. 9). 7 0 7 0 ---------- ---------- [ byte ] 0 [ byte ] 0 ---------- ---------- [ ] [ ] .... .... byte order: [ ] byte order: [ ] ---------- ---------- least significant[ 16 ] n most significant [ 16 ] n - bit - - Bit - most significant [ word ] n+1 least significant[ word ] n+1 ---------- ---------- [ ] [ ] .... .... Fig. 9. Byte numbering Furthermore, if a 16 bit word is bit addressesd, the byte order, with the most significant byte first, is a disadvantage, because the bits are not numbered in the same way. In addition, the bit numbering in registers, and consecutive memory locations with the most significant byte first, is different. CALM - a common assembly language for microprocessors - 8 3.2 Address spaces Register and memory locations can be seen as part of a register and a memory address space. Often, the register address space contains only a few memory locations, but is directly placed in the microprocessor and has therefore a small access time. As there are only a few locations, and because of their technical privileged position, they are named registers. In addition each register has its own reserved name. If possible, CALM uses the register names defined by the manufacturer. Furthermore, only the listed symbols in fig. 10 are reserved in CALM. PC Program Counter at execution time SP Stack Pointer F arithmetic Flag register S Status register APC value of the Assembler PC at the beginning of the instruction where APC appears TRUE -1 (all bits ones) FALSE 0 (all bits zeros) Fig. 10. Reserved and predefined symbols On the other hand the memory address space is very big and placed externally. Memory locations are numbered and can obtain user defined symbols. On some microprocessors, additional address spaces exist: Input/Output and Data. In order to distinguish them from the memory address space, a dollar sign is placed before the address expression with the I/O address space and a percent sign with the data address space (fig. 11). MOVE B,A register -> register MOVE ADMEMORY,A memory address space -> register MOVE $ADINOUT,A input/output address space -> register MOVE %ADDATA,A data address space -> register Fig. 11. Examples: address spaces 3.3 Direct addressing As mentioned before, registers and memory locations consist of a number of bits. These bit states represent a binary word, which defines the content of the register or the memory location. It is common in assembly languages to indicate only the name of a content, instead of the content itself. One says "Increment D", instead of "Increment the content of the register D". This implicit reference is always valid in CALM and is valid for all four mentioned address spaces. The reserved name of the register or the address of the memory location is directly given. This is named direct addressing (fig. 12). ADD B,A content(B) + content(A) -> content(A) ADD ADMEMORY,A content(ADMEMORY) + content(A) -> content(A) ADD 16'1234,A content(16'1234) + content(A) -> content(A) The 2nd example is identical with the 3rd, when ADMEMORY = 16'1234. Fig. 12. Examples: direct addressing 3.4 Immediate addressing A number sign before a constant, a symbol or a complex expression cancels the implicit reference and gives the immediate built address (and not their content). This is named immediate addressing (fig. 13). The number sign corresponds to the pointer sign (^, @) in high-level languages. ADD #16'1234,A 16'1234 + content(A) -> content(A) ADD #ADMEMORY+1,A ADMEMORY + 1 + content(A) -> content(A) JUMP ADMEMORY ADMEMORY -> content(PC) MOVE #ADMEMORY,PC ADMEMORY -> content(PC) Fig. 13. Examples: immediate addressing CALM - a common assembly language for microprocessors - 9 3.5 Indirect addressing If the content of any address is again used as an address, braces enclose the corresponding address expression. This is named indirect addressing (fig. 14). ADD {HL},A content(content(HL)) + content(A) -> content(A) MOVE #16'1234,{HL} 16'1234 -> content(content(HL)) MOVE #16'1234,{ADMEMORY} 16'1234 -> content(content(ADMEMORY)) Fig. 14. Examples: indirect addressing There is no functional difference, if a register or a memory location is used for the indirect addressing. However many processors allow only an indirect addressing with registers. 3.6 Combined addressing modes With the direct, immediate and indirect addressing we have already the base elements to build any complex address. The possible operations between the individual address terms are the addition, the subtraction and the multiplication (fig. 15). MOVE #16'FE,{A0}+8 254 -> content(content(A0)+8) MOVE #10'98,{A0}-8 98 -> content(content(A0)-8) MOVE #8'177,{A0}*8 127 -> content(content(A0)*8) MOVE #2'1011,{A0}* 11 -> content(content(A0)*datalength) MOVE R0,{{SB}+5}+3 cont.(R0) -> content(content(content(SB)+5)+3) MOVE {A0}+{D0}+10,A1 cont.(content(A0)+content(D0)+10) -> cont.(A1) MOVE #{A0}+{D0}+10,A1 content(A0)+content(D0)+10 -> content(A1) Fig. 15. Examples: combined addressing modes Note that the indirect addressing is only an explicit notation of the direct addressing, but may be applied as many times as desired. The implicit reference (= content of a built address) is only applied to the whole address expression. If a number sign is before the whole address expression, this implicit reference is suspended. 3.7 Special addressing The relative addressing is in reality only a shorter notation of a combined addressing mode as shown in fig. 16. notation description designation MOVE R^ADDRESS,D0 MOVE {PC}+OFFSET,D0 relative addressing MOVE {A0+},D0 MOVE {A0},D0 automodification ADD #datalength,A0 (incrementation) MOVE D0,{-A0} SUB #datalength,A0 automodification MOVE D0,{A0} (decrementation) CLR D0:#1 bit 1 in D0 is cleared bitaddressing PUSH R1..R3|R5 PUSH R1 register list PUSH R2 PUSH R3 PUSH R5 MOVE R0,R1:#1..#4 transfers bits 0 to 3 frombit list R0 to R1 (bits 1 to 4) Fig. 16. Examples: special addressing modes Some additonal addressing modes have attained a certain importance and therefore became a unique notation. In the bit addressing, the first expression gives the byte address. The expression after the colon is the bit address. The bit address zero corresponds to the bit 0 in the addressed byte. If the byte address points to the memory address space, the bit address may exceed the byte (positive or negative). CALM - a common assembly language for microprocessors - 10 4 Pseudo-instructions Pseudo-instructions are commands to the assembler which control the code generation, the conditional assembly and listing, macros, etc. Every pseudo- instruction begins with a dot (fig. 17). Only with the pseudo-instructions .ASCII, .ASCIZ, .BLK, .FILL, .n and .STRING are preceeding labels allowed. The byte order with the pseudo- instructions .16, .32, etc. is determined by the processor (.PROC). Listing .TITLE character string start a new page with the specified title. .CHAP character string add the spec. text in the subtitle header. .END terminate the assembler program. .TEXT until an .ENDTEXT is encountered, the foll. lines are ign. by the ass. (listing copy). .ENDTEXT terminate .TEXT. .LIST expression list the subsequent instr. only if the expression is true. .LIST may be nested. .ENDLIST end of a conditional list segment. .LAYOUT define the layout param. for the listing. Assembling .BASE expression define the new default base. .START expression define the starting address. .EXPORT symbol1, symbol2, ... define the exported symbols. .IMPORT symbol1, symbol2, ... define the imported symbols. Program counter .APC expression select one of the ass. program counters. .LOC expression current ass. program counter (APC) = value. .ALIGN expression align APC to the next multiple of value. .EVEN align the APC value to the next even value. .ODD align the APC value to the next odd value. .BLK.8 expression (number) add to the value of the APC the product .BLK.8.16.32 expression of the data size times the nb. of items. File insertion .INS filename insert the mentioned source file. .REF filename insert the mentioned symbol table. .PROC filename insert the mentioned processor description. Code-generation .n expression, expression, ... insert the given values in the .8.32 exp8, exp32, exp8, ... object. .FILL.n expression (length), expression fill length with value. .ASCII "ascii text" insert the ASCII codes in the object. .ASCIZ "ascii text" as .ASCII, with ASCII code null at the end. .STRING "ascii text" as .ASCII, with the len. (8 bits) at begin. Conditional instructions and macros .IF expression the sub. instr. up to the corresp. .ELSE or .ENDIF are ass. if exp.true. May be nested. .ELSE the sub. instr. up to the corresp. .ENDIF are ass., if the exp. with .IF was false. .ENDIF end an .IF or .ELSE section. .MACRO name,parameter1,... begin of a macro definition with the macro name and the optional parameter list. .ENDMACRO end of a macro definition. .LOCALMACRO symbol1,... list of local symbols in a macro. Fig. 17. CALM pseudo-instructions 5 Use of CALM - not so easy A certain amount of time is needed for any programming language before it is accepted. But in opposition to high-level languages, a common assembly language depends naturally on the microprocessor which the language trys to describe. CALM - a common assembly language for microprocessors - 11 5.1 Documentation The CALM notation of the instructions of a processor is documented in so-called reference cards. In these reference cards, all the instructions of a processor are listed in a compact form. But additional information like instruction codes, execution times, particularities, etc. of a microprocessor can only be found in the documentation of the manufacturer. So the user is forced to know temporarily both notations. Therefore the CALM reference cards also give the operation code names of the manufacturer. 5.2 A CALM notation - many possible instruction codes There are many instructions which are identical on modern 16/32 bit microprocessors. The CALM notation shows this clearly. The question is now: which instruction code should the assembler generate? Normally the assembler chooses the most compact and fastest instruction code. The distinguishing features are given by the operands. The resulting instruction code depends on the size and the type of the operand and on the additional address and data specifiers. 5.3 Assembler Despite the advantages of CALM, programming in CALM obviously has no sense if the correspondent utilities (assembler, linker, etc.) do not exist. Actually, CALM assemblers generating machine code (without linker) are available from the author for all current microprocessors and for the operating systems PC/MS-DOS 2.11, Atari ST, and Smaky. 5.4 Object format The machine code generation still remains very complicated because each processor has been defined a special object format. This diversity makes it quite impossible to generalize the assembler. Nethertheless a common object format [2] has been defined. 6 Examples The following examples compare instructions in the CALM notation to the manufacturer's notation. These are not complete programs, but a selection of twenty instructions from the microprocessors iAPX86, M68000 and NS32000. Additional details of the assembly notation depends strongly on the used assembler. Here the information from the manufacturer assembler has been used. 6.1 iAPX86 The iAPX86 from Intel is one of the most complex microprocessors because of its segmentation. The assembler ASM-86 needs a lot of information about these segment registers (ASSUME, NEAR, declaration of the variables, etc.). Therefore the following examples are not complete because this information is not given here. Because of the exceptional architecture, a common assembly language can not be simple for this processor (fig. 18). CALM indicates the used segment register with every instruction. In addition the jump range (in or outside of the current segment) is determined by a data specifier appended to the jump instruction. Also a simplified notation is introduced for the automatic shifting of the segment register ([CS] for {CS}*16). CALM - a common assembly language for microprocessors - 12 CALM Intel MOVE.16 #16'1000,BX MOV BX,1000H MOVE.16 AX,${DX} OUT DX,AX MOVE.16 #[ES]+NEXT,DX LEA DX,ES:NEXT MOVE.8 DL,BH MOV BH,DL MOVE.8 [DS]+DATA,AL MOV AL,DATA MOVE.16 [CS]+{SI},AX MOV AX,CS:[SI] MOVE.8 AH,F SAHF PUSH.16 SF PUSHF MOVE.8 [DS]+{BX}+{AL},AL XLAT CONV_TAB MOVE.32 [DS]+TARGET,DSBX LDS BX,TARGET CONV.A8.16 AX CBW INC.8 AL INC AL DIV.A16 CX,DXAX,AX,DX IDIV CX OR.16 #ERROR,[DS]+STATUS OR STATUS,ERROR RL.8 AL ROL AL,1 TEST.16 #MASK,AX TEST AX,MASK JUMP,LO R^LOW JB LOW CALL.32 20^ROUTINE JSR FAR ROUTINE DJ.16,NE CX,ADDRESS LOOP ADDRESS TRAP.32,VS INTO Fig. 18. Examples: iAPX86 6.2 M68000 Motorola's notation of the instructions for the M68000 differs slightly from CALM (fig. 19). This processor shows that upward-compatibility is still valid for machine code (however only partly), but not for the assembly language. The additional addressing modes of the M68020 forced Motorola to change the notation of some addressing modes. CALM Motorola MOVE.32 #16'1000,D1 MOVE.L #$1000,D1 MOVE.32 #A8^WERT,D0 MOVEQ #WERT,D0 MOVE.32 #{A6}+100,A3 LEA 100(A6),A3 MOVE.8 D4,D6 MOVE.B D4,D6 MOVE.8 {A4}+4,D2 MOVE.B 4(A4),D2 MOVE.16 {A0}+A16^{D4}+2,{A2}+32^{A3}+4 MOVE 2(A0,D4),4(A2,A3.L) MOVE.16 D0,F MOVE D0,CCR PUSH.16 SF MOVE SR,-(A7) PUSH.32 #TABLE PEA TABLE PUSH.32 D0..D3|D5 MOVEM.L D0-D3/D5,-(A7) CONV.A8.16 D4 EXT.W D4 INC.8 D1 ADDQ.B #1,D1 DIV.A16 #10,D6 DIVS #10,D6 OR.16 #MASK,{A4}+A16^{D4}-4 OR #MASK,-4(A4,D4.W) RL.32 #8,D4 ROL.L #8,D4 TCLR.8 {A0+}:D4 BCLR D4,(A0)+ JUMP,LO R^LOW BLO LOW CALL U^ROUTINE JSR ROUTINE DJ.16,NMO D0,ADDRESS DBRA D0,ADDRESS TRAP,VS TRAPV Fig. 19. Examples: M68000 6.3 NS32000 In reality this is a family, called "NS32000 series" by National Semi- conductor, which actually consists of three 100% identical members (NS32032, NS32016, NS32008). The only difference is the external bus wide. The NS32000 series has a complete symmetry on the addressing possibilities for all instructions (fig. 20). The relative addressing is allowed anywhere and is coded by the assembler automatically. CALM - a common assembly language for microprocessors - 13 CALM NS MOVE.32 #16'1000,R1 MOVD H'1000,R1 MOVE.32 #A4^WERT,R0 MOVQD WERT,R0 MOVE.32 #{R6}+100,{SB}-20 ADDR 100(R6),-20(SB) MOVE.8 R4,R6 MOVB R4,R6 MOVE.8 {SB}+4,R2 MOVB 4(SB),R2 MOVE.16 {{FP}+2}+4,{SB}+{R0}*1+2 MOVW 4(2(FP)),2(SB)[R0:B] MOVE.8 R0,F LPRB R0,UPSR PUSH.16 SF SPRW PSR,TOS PUSH.32 #R^TABLE ADDR TABLE,TOS PUSH.32 R0..R3|R5 SAVE [R0,R1,R2,R3,R5] CONV R4.8,R2.32 MOVZBD R4,R2 INC.8 R1 ADDQB 1,R1 DIV.A32 #10,R6 QUOD 10,R6 OR.16 #MASK,{{SB}-2}+{R5}*8+4 ORW MASK,4(-2(SB))[R5:Q] RL.32 #-10,R4 ROTD -10,R4 TCLR {R0}+20:{{SB}+10}+4.A32 CBITD 4(10(SB)),20(R0) JUMP,LO R^LOW BLO LOW CALL U^ROUTINE JSR @ROUTINE AJ.32,NE #2,{R0}+4,ADDRESS ACBD 2,4(R0),ADDRESS TRAP,VS FLAG Fig. 20. Examples: NS32000 7 Conclusions CALM is a common assembly language, which is adapted for almost all microprocessors, but also for miniprocessors, mainframes and microprogrammed units. The CALM notation is oriented towards the future, as the notation can be expanded purposefully and functionally. The clear separation of the individual parts of an instruction allows a specific adaptation to any processor. In addition, one understands an instruction in the CALM notation better than a long description. Twenty-four processors have been checked and it was proven that the CALM notation is sufficient. CALM reference cards have been established for the following processors: Z80, 65x02, 680x, 6805, 6809, 8048, 8051, 808x, iAPXx86, NS32000, 680xx, etc. This short article tried to show in some terms the main characteristics of CALM. The assembly programming becomes not simpler in the CALM notation, but it is more comprehensible for the user, because CALM is constructed by logical rules. 8 References [1] DIN 66283, Allgem. Assemblierer-Sprache fr Mikroproz. CALM Nat. German stand.,Beuth Verlag GmbH, PB 1145, D-1000 Berlin 30 [2] "The Microprocessor Universal Format for Object Modules", Prop. Stand.: IEEE P695 Working Group. IEEE Micro, 8.1983, pp. 48-66. [3] Nicoud, J.D. and Fh, P. Common Assembly Language for Micro- processors, CALM. Internal document. LAMI-EPFL, INF-Ecublens, CH-1015 Lausanne, December 1986. English version of DIN 66283. [4] Nicoud, J.D. and Fh, P. Explanations and Comments, Related to the Common Assembly Language for Microprocessors. LAMI-EPFL. [5] Zeltwanger, H. "Genormte Assemblersprache fr Ps", ELEKTRONIK, 35. Jahrg. (1986), No. 8, pp. 66-71. [6] Nicoud, J.D. Calculatrices, Volume XIV du Trait d'Electricit, Lausanne: Presses polytechniques romandes, 1983. [7] Nicoud, J.D. and Wagner, F. Major Microprocessors, A unified approach using CALM. North Holland, 1987. [8] Strohmeier, A. Le matriel informatique, concepts et principes Lausanne: Presses polytechniques romandes, 1986. [9] Fh, P. "Die (Un-)Logik von Assemblersprachen", Elektroniker, 26. Jahrg. (1987), No. 5, pp. 97-100.