If you’ve not already done so, it might be a good idea for you to quickly skim my recent 4-Bit HRRG Computer Recap column in order to set the scene and ensure we’re all tap-dancing to the same drumbeat.
In earlier columns, we introduced the 4-Bit HRRG Computer’s Instruction Set and discussed a variety of Instruction Trade-Offs. Also, we discussed some concepts like Big-Endian, Little-Endian, and Addressing Modes. Most recently, we performed a thought experiment in which we assumed that we were working with the first computer on the planet and considered the progression from Machine Code to Assembly Language.
Well, this is where the rubber meets the road, as it were, because we’re about to pull everything together in the form of the HRRG’s assembly language. Fear not — this will be a lot easier than you think, or my name’s not “Max the Magnificent.”
As a quick reminder, my chum, Joe Farr, is creating a virtual implementation of the 4-Bit HRRG Computer. As part of this, Joe’s created a virtual HRRG terminal. He’s used a photograph of a Sperry Univac Uniscope 200 as the base, and he’s augmented this little beauty with graphical representations of 7-segment displays, a floppy disk drive, and a paper tape writer.
As we see, the terminal is running the HRRG’s cross-assembler, where a cross-assembler is an assembler that can convert assembly language source code into machine code for a computer other than that on which it is run. In this case, our cross assembler is running on a PC while generating machine code to be executed on the 4-Bit HRRG Computer.
So, if we dare assume that time exists in the first place (see Is Time Truly an Illusion?), then the time has come to present the HRRG’s assembly language. This is one of those situations when you really need to know one thing in order to understand another, but you ideally need to know the second thing in order to wrap your brain around the first. Ah well, such is life. All we can do it take a deep breath, gird up our loins, and leap headfirst into the fray with gusto and abandon.
Dipping Our Toes in the Assembly Language Water
On the off chance this is all completely new to you, let’s begin by noting that programming in assembly language isn’t going to be anywhere as difficult as you probably fear. There’s nothing magical about any of this (except, of course, the fact that it works). Since this is our very own assembly language and associated assembler application that we’re specifying, the way they look and the things they are capable of doing are absolutely in our hands.
Perhaps the easiest way to kick things off is by means of a simple example, commencing with the program we discussed in our previous column. We will be looking at all of this in more detail below, so let’s just skim over it now and we’ll fill in the gaps later. We’ll start with the source code program file on the left. The left-hand column is where we will find any labels. In some cases (e.g., START and HALT), these are optional and/or unused, but even optional labels can help to make a program more readable and understandable.
The .EQU (equate) directives are used to associate labels with constant values called literals, which may be specified in decimal, binary (using the ‘%’ character), or hexadecimal (using the ‘$’ character); for example, 10, %1010, and $A all represent the same numerical value — we typically use whatever form makes the most sense at the time.
Thus, in our example, the COUNTVAL and HALTFLAG labels — which we might refer to as “constant labels” — are associated with values of 10 and %0010, respectively. Whenever the assembler sees the COUNTVAL or HALTFLAG labels in the future, it will replace them with their assigned values.
The .ORG (origin) directive informs the assembler as to the start of the program; that is, the memory address into which the first instruction opcode is to be loaded.
In the case of the START, LOOP, and HALT labels — which we might refer to as “address labels” — the assembler will associate them with the memory addresses of their corresponding instruction opcodes (START = MOV = $100, LOOP = DEC = $104, and HALT = OR = $10B). Whenever the assembler sees any of these labels used as operands (e.g., the LOOP label associated with the JPNZ instruction), it will replace that label with its associated address value.
The assembler will read the source file, perform its magic, and generate the executable / object file on the right (we won’t go into the differences between these terms here). This executable file contains the numerical values — a.k.a. the machine code — that will be executed by the HRRG’s central processing unit (CPU). These opcode and operand values will be loaded into the HRRG’s memory at the designated addresses.
OK, now that we’ve perused and pondered our simple example, let’s delve a little deeper into various aspects of the HRRG’s assembly language.
Just as one of the fundamental building blocks of a natural language is the sentence, the equivalent construct in a computer language is called a statement, which, for the purposes of our language, we may regard as encompassing a single thought or idea. An assembly source file is composed of a series of these statements, each of which nominally consists of four fields.
In reality, not all of these fields are used all of the time. A statement may consist of a blank line, or just an instruction like NOP (no operation), or a label and an instruction (e.g., DONOTHING: NOP), or an instruction and one or more operands (e.g., MOV R1, R3), or just a comment (e.g., # This is a comment), along with a variety of other permutations.
Many computer languages (including some assembly languages) allow statements to span multiple lines, in which case they would be terminated by a special character such as a semicolon. In the case of the HRRG’s assembly language, however, each statement may occupy only a single line and there is no special termination character.
CASE and WHITESPACE
Many early assemblers allowed you to create source files using only uppercase characters. In the case of the HRRG’s assembly language, you may use uppercase, lowercase, or a mixture of both. The HRRG’s assembler is case-insensitive, which means — for example — that it will consider labels such as fred, Fred, FrEd, and FRED to be identical.
Early assembly languages typically had extremely restrictive rules, such as specifying exactly on which column each field must commence. By comparison, the HRRG’s assembly language syntax is relatively free-format — you can use as many whitespace characters (spaces, tabs, new lines) and be as messy as you wish. Having said this, we strongly recommend that you keep your source as neat and tidy as you can. You can follow the style we use in our examples or feel free to develop your own, but — whatever you do — try to be as consistent as possible. You will discover that consistency pays dividends in the long run when you return to blow the dust off a neglected source file sometime in the future.
COMMENTS and BLANK LINES
Generally speaking, it’s a good idea to liberally sprinkle your assembly source with comments and to distinguish logically distinct portions of your program with blank lines. Hopefully this will mean that, when you return to one of your programs in the future, you will have a clue as to what’s going on.
Different assemblers use different characters to indicate the beginning of a comment. Some use a ‘#’ (this character is also often referred to as a number sign, pound sign, or a sharp), others use a semicolon, and others use an apostrophe.
Since people tend to get comfortable doing things a certain way, and since the HRRG assembler doesn’t use any of these characters for anything else, and since we’re all about making people happy, the HRRG supports all three of the aforementioned characters to indicate the start of a comment.
A comment character may be followed by any printable text characters, including spaces, tabs, and other comment characters. A comment is terminated by the end of the line.
As we discussed in our introductory examples, our language supports two kinds of labels — constant labels and address labels — both of which support the same naming conventions.
Both types of label can consist of a mixture of alphabetic and numeric characters, but the first character must be alphabetic.
When a label is declared it is terminated with a colon ‘:’ character, but this character doesn’t form part of the label’s name and is not used thereafter. The maximum length of a label is sixteen characters, excluding the colon character used to terminate the label.
Labels cannot be the same as any of the HRRG assembly language’s reserved words, which include directive and instruction mnemonics.
In addition to standard instructions, the HRRG’s assembly language supports special instructions called “directives” whose purpose is to direct the assembler to do certain things.
The .ORG (or .ORIGIN) directive is primarily used to inform the assembler as to the start, or origin, of the program; that is, the memory location into which it should place the program’s first opcode. Thus, this directive must have an operand in the form of a 3-nybble address, which may be presented as a literal value as illustrated in the example below, or as a constant from an .EQU directive.
The reason we used the “primarily” qualifier in the preceding paragraph is that we can have multiple .ORG directives in our source code. In some cases, this may be because we have multiple programs in the same file; in other cases, .ORG statements may be used in conjunction with other directives to reserve areas of memory into which a program might wish to store address and/or data values.
The .EQU (or .EQUATE) directive is used to associate a constant value with a label. Each .EQU directive must have a label assigned to it. These constant labels may be used in the body of the program instead of literal (numerical) values. When the assembler is assembling the program, it will automatically substitute any constant labels in the body of the program with their numerical equivalents.
Constant values can be used to represent both 1-nybble data values (e.g., the values assigned to OFFSET, COUNTVAL, and HALTFLAG) and 3-nybble addresses (e.g., the $100 assigned to STARTADDR). Observe the way in which the STARTADDR constant is subsequently used with the .ORG directive.
As seen in the COUNTVAL example, the assignment to an .EQU statement may be presented in the form of a simple expression using the addition (‘+’), subtraction (‘-‘), multiplication (‘*’), and division (‘/’) operators. All expressions will be evaluated using integer arithmetic, operations will be performed from left-to-right (there’s no precedence of operators), and the remainders from any division operations will be truncated.
We also support the unary ‘!’ operator, which will flip the bits of whatever literal (or label) with which it is associated (once again, literals can be specified as decimal, binary, or hexadecimal values). For example, FRED: .EQU !%1010 will end up with FRED being associated with a value of %0101, while BERT: .EQU !FRED will end up with BERT being associated with a value of %1010.
Unlike many utilities of this nature, the HRRG’s assembler supports forward-referencing, which means you can employ a constant label in an equation before you’ve actually declared it (e.g., the assignment to COUNTVAL references OFFSET before OFFSET has been declared). The HRRG’s assembler makes multiple passes through the code until it has satisfied any forward references.
It’s important to note that constant labels are used only by the assembler (and to make the code easier to read); they don’t appear in (or occupy any space in) the resulting machine code.
The .RESERVE directive is used to reserve a number of nybble-sized memory locations for later use; for example:
In this case, we are using the .RESERVE directive to reserve four nybbles starting at whatever address the assembler ends up associating with the TMPDATA label. In our program, we use MOV instructions to load these locations with the values 2, 0, 5, and 7 under program control. This is obviously a meaningless program, but it serves to give us the idea.
We can use as many .RESERVE directives as we like. One option is to place them at the end of the program as shown in the example above. Another common technique is to employ multiple.ORG directives as illustrated below:
Once again, this is a meaningless program, but it does demonstrate the use of multiple .ORG directives along with some .EQU directives that fall between the two .ORG directives. In the case of this example, our four data locations will be reserved starting at address $800, while our main program will commence at address $100.
The .NYBBLE (or NYBBLE1) and .TRYBBLE (or .NYBBLE3) directives are more sophisticated versions of the .RESERVE directive in that they allow us to reserve memory locations that are preloaded with required data or address values.
In this case, we end up with four nybbles — starting at address $800 and ending at address $803 — that are preloaded with values of 2, 0, 5, and 7, respectively. Our program, which starts at address $100, uses MOV instructions to copy these values into the 4-bit general-purpose registers, R0, R1, R2, and R3, respectively.
Although this program doesn’t use them, we also use the .TRYBBLE directive to reserve three 3-nybble fields — starting at address $804 and ending at address $80C — that are preloaded with values of $601, $902, and $453, respectively.
The .INCLUDE directive is used to include previously created source code files into the existing program. These source code files could contain useful subroutines, macro definitions (we’ll talk about these later), or complete programs.
The associated file name must follow the LABEL naming conventions (up to 16 alphanumeric characters, starting with an alpha character) and must have an .ASM (or .TXT) extension. Such files will be stored in a special folder on the host PC.
INSTRUCTIONS AND ADDRESSING MODES
This is where our previous cogitations on the HRRG’s Instruction Set meet our ruminations on Addressing Modes. Happily, as we will see, the HRRG supports only four addressing modes: implied (a.k.a. implicit), immediate, absolute (a.k.a. direct), and indirect. What makes things just a tad interesting is that we can sometimes use multiple modes in the same instruction.
Let’s start with the INC (increment) instruction. In TEST01 and TEST02 we increment the contents of the 4-bit register R0 and the 12-bit register IX, respectively. Both of these use the implied addressing mode because the location of the data to be manipulated is implied by the use of the register names.
TEST03 employs the absolute addressing mode. The use of the square brackets [ ] indicates that we are talking about the contents of a memory address, so this statement says that we wish to increment the contents of the memory at address $800.
TEST04A employs the indexed addressing mode in which the contents of the 12-bit index register (IX) are added to $800 and the result is used as the address of the memory location that is to be incremented.
TEST04B is functionally identical to TEST04A, while TEST04C is used to indicate that [IX] will auto-expand to [$000 + IX].
This is a great time to compare the difference between TEST02, which increments the contents of the 12-bit index register (IX), and TEST04C, which increments the contents of the memory at the address specified by the 12-bit contents of IX.
Next, let’s look at some examples based on the AND (logical AND) instruction. Remember that, with instructions of this type, the first operand is referred to as the source and the second operand is referred to as the target. In the case of the AND, the contents of the source are logically AND-ed bit-by-bit with the contents of the target, and the result is stored in the target. Thus, in TEST01 the contents of 4-bit register R0 are AND-ed with the contents of 4-bit register R1, and the results are stored in R1.
In TEST02 a constant value of $1 (or %0001 in binary) is AND-ed with the contents of register R1 and the results are stored in R1. Note that we can only use a constant value as the source, not the target, because there would be no way for the CPU to store the results of such an operation.
In TEST03 the contents of 4-bit register R1 are AND-ed with the 4-bit contents of the memory at location $800 and the results are stored in this memory location. By comparison, in TEST05 the 4-bit contents of the memory at location $800 are AND-ed with the contents of 4-bit register R1 and the results are stored in R1. Hopefully, the remaining tests will be self-explanatory.
Finally, for this column, let’s consider some examples using the MOV (move, copy, or load) instruction. Once again, the first operand is the source and the second operand is the target. In TEST01, we simply copy the contents of the 4-bit register R0 into the 4-bit register R1. In this case, the CPU understands to copy four bits because it knows R0 and R1 are 4-bit registers.
In TEST02 we copy the contents of the 12-bit register IX into the 12-bit register TA. In this case, the CPU understands to copy twelve bits because it knows IX and TA are 12-bit registers.
Now look at TEST03, in which we instruct the CPU to copy the contents of the 4-bit register R1 into the 12-bit register IX. In this case, the CPU will simply copy the contents of R1 into the least-significant nybble of IX. Similarly, in TEST04 we instruct the CPU to copy the contents of the 12-bit register IX into the 4-bit register R2. In this case, the CPU will simply copy the contents of the least-significant nybble of IX into R2.
In TEST05 we load the 4-bit constant value $A (10 in decimal, %1010 in binary) into the 4-bit register R0. By comparison, in TEST06, we load the 12-bit constant value $00A (10 in decimal, %000000001010 in binary) into the 12-bit register IX.
In TEST07 we copy the contents of the 4-bit register R0 into the memory location at address $800. By comparison, in TEST08 we copy the contents of the memory location at address $800 into the 4-bit register R1 (the CPU understands to copy only one nybble of data out of memory because it knows R1 is a 4-bit register).
In TEST09 we copy the contents of the 12-bit register IX into three memory locations starting at address $900. By comparison, in TEST10 we copy the contents of three memory locations starting at address $900 into the 12-bit register TA (the CPU understands to copy three nybbles of data out of memory because it knows TA is a 12-bit register).
Now things start to get a tad more interesting. In TEST11 we load the 4-bit constant value $A (10 in decimal, %1010 in binary) into the memory location at address $800. By comparison, in TEST12, we load the 12-bit constant value $ABC (2,748 in decimal, %101010111100 in binary) into three memory locations starting at address $900. The CPU can tell whether we want to load one or three nybbles based on the size of the constant value. Of course, if we specify a small decimal value in the range 0 to 15, then the CPU will assume a 4-bit value. If we wish to load a 3-nybble area in memory with a small value like 1, then we will have to specify this value in binary (%000000000001) or hexadecimal ($001) and include the leading zeros.
We will leave decoding TEST13 through TEST17 as an exercise for the reader. Also, we should note that the examples shown here demonstrate only a subset of all the possible addressing mode combinations.
In our earlier discussions on addressing modes, we alluded to two things — first, that each addressing mode requires its own opcode; second, that we might wish to use the same instruction mnemonic for different addressing modes.
The point about each addressing mode requiring its own opcode is more relevant to 8-bit processors like the 6502. In the case of the 4-bit HRRG, the opcode simply specifies the type of instruction, and it’s the operand(s) that will determine the addressing mode(s) along with any associated data.
The point about using the same instruction mnemonic for different addressing modes holds true in both cases. Just look at the examples of the INC, AND, and MOV instructions above — both we (you and me) and the assembler can work out what addressing modes we wish to use by looking at the assembly source code, while the CPU will know what we want it to do based on the machine code we feed it.
In my next HRRG column, we will introduce some of the HRRG assembler’s macro capabilities. Also, in the not so distant future, we will be making the HRRG Emulator, including the HRRG Terminal and HRRG Assembler, available for anyone to play with. In the meantime, as always, I welcome your comments, questions, and suggestions.