Chapter 2 BINARY ARITHMETIC

Numbers versus numeration

It is imperative to understand that the type of numeration system used to represent numbers has no impact upon the outcome of any arithmetical function (addition, subtraction, multiplication, division, roots, powers, or logarithms). A number is a number is a number; one plus one will always equal two, no matter how you symbolize one, one, and two. A prime number in decimal form is still prime if it's shown in binary form, or octal, or hexadecimal. π is still the ratio between the circumference and diameter of a circle, no matter what symbol(s) you use to denote its value. The essential functions and interrelations of mathematics are unaffected by the particular system of symbols we might choose to represent quantities. This distinction between numbers and systems of numeration is critical to understand.

The essential distinction between the two is much like that between an object and the spoken word(s) we associate with it. A house is still a house regardless of whether we call it by its English name house or its Spanish name casa. The first is the actual thing, while the second is merely the symbol for the thing.

That being said, performing a simple arithmetic operation such as addition (longhand) in binary form can be confusing to a person who's used to working with decimal numeration only. In this lesson, we'll explore the techniques used to perform simple arithmetic functions on binary numbers, since these techniques will be employed in the design of electronic circuits to do the same. You might take longhand addition and subtraction for granted, having used a calculator for so long, but deep inside that calculator's circuitry all those operations are performed "longhand," using binary numeration. To understand how that's accomplished, we need to review to the basics of arithmetic.

Binary addition

Adding binary numbers is a very simple task, and very similar to the longhand addition of decimal numbers. As with decimal numbers, you start by adding the bits (digits) one column, or place weight, at a time, from right to left. Unlike decimal addition, there is little to memorize in the way of rules for the addition of binary bits:

0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
1 + 1 = 10
1 + 1 + 1 = 11

Just as with decimal addition, when the sum in one column is a two-bit (two-digit) number, the least significant figure is written as part of the total sum and the most significant figure is "carried" to the next left column. Consider the following examples:

.                          11  1 <--- Carry bits ----->  11
.     1001101             1001001                     1000111
.   + 0010010           + 0011001                   + 0010110
.   ---------           ---------                   ---------
.     1011111             1100010                     1011101

The addition problem on the left did not require any bits to be carried, since the sum of bits in each column was either 1 or 0, not 10 or 11. In the other two problems, there definitely were bits to be carried, but the process of addition is still quite simple.

As we'll see later, there are ways that electronic circuits can be built to perform this very task of addition, by representing each bit of each binary number as a voltage signal (either "high," for a 1; or "low" for a 0). This is the very foundation of all the arithmetic which modern digital computers perform.

Negative binary numbers

With addition being easily accomplished, we can perform the operation of subtraction with the same technique simply by making one of the numbers negative. For example, the subtraction problem of 7 - 5 is essentially the same as the addition problem 7 + (-5). Since we already know how to represent positive numbers in binary, all we need to know now is how to represent their negative counterparts and we'll be able to subtract.

Usually we represent a negative decimal number by placing a minus sign directly to the left of the most significant digit, just as in the example above, with -5. However, the whole purpose of using binary notation is for constructing on/off circuits that can represent bit values in terms of voltage (2 alternative values: either "high" or "low"). In this context, we don't have the luxury of a third symbol such as a "minus" sign, since these circuits can only be on or off (two possible states). One solution is to reserve a bit (circuit) that does nothing but represent the mathematical sign:

.                        101₂ = 5₁₀    (positive)
.
.  Extra bit, representing sign (0=positive, 1=negative)
.                       |
.                       0101₂ = 5₁₀    (positive)
.
.  Extra bit, representing sign (0=positive, 1=negative)
.                       |
.                       1101₂ = -5₁₀   (negative)

As you can see, we have to be careful when we start using bits for any purpose other than standard place-weighted values. Otherwise, 1101₂ could be misinterpreted as the number thirteen when in fact we mean to represent negative five. To keep things straight here, we must first decide how many bits are going to be needed to represent the largest numbers we'll be dealing with, and then be sure not to exceed that bit field length in our arithmetic operations. For the above example, I've limited myself to the representation of numbers from negative seven (1111₂) to positive seven (0111₂), and no more, by making the fourth bit the "sign" bit. Only by first establishing these limits can I avoid confusion of a negative number with a larger, positive number.

Representing negative five as 1101₂ is an example of the sign-magnitude system of negative binary numeration. By using the leftmost bit as a sign indicator and not a place-weighted value, I am sacrificing the "pure" form of binary notation for something that gives me a practical advantage: the representation of negative numbers. The leftmost bit is read as the sign, either positive or negative, and the remaining bits are interpreted according to the standard binary notation: left to right, place weights in multiples of two.

As simple as the sign-magnitude approach is, it is not very practical for arithmetic purposes. For instance, how do I add a negative five (1101₂) to any other number, using the standard technique for binary addition? I'd have to invent a new way of doing addition in order for it to work, and if I do that, I might as well just do the job with longhand subtraction; there's no arithmetical advantage to using negative numbers to perform subtraction through addition if we have to do it with sign-magnitude numeration, and that was our goal!

There's another method for representing negative numbers which works with our familiar technique of longhand addition, and also happens to make more sense from a place-weighted numeration point of view, called complementation. With this strategy, we assign the leftmost bit to serve a special purpose, just as we did with the sign-magnitude approach, defining our number limits just as before. However, this time, the leftmost bit is more than just a sign bit; rather, it possesses a negative place-weight value. For example, a value of negative five would be represented as such:

Extra bit, place weight = negative eight
.                    |
.                    1011₂ = 5₁₀   (negative)
.
.          (1 x -8₁₀)  +  (0 x 4₁₀)  +  (1 x 2₁₀)  +  (1 x 1₁₀)  =  -5₁₀

With the right three bits being able to represent a magnitude from zero through seven, and the leftmost bit representing either zero or negative eight, we can successfully represent any integer number from negative seven (1001₂ = -8₁₀ + 7₁₀ = -1₁₀) to positive seven (0111₂ = 0₁₀ + 7₁₀ = 7₁₀).

Representing positive numbers in this scheme (with the fourth bit designated as the negative weight) is no different from that of ordinary binary notation. However, representing negative numbers is not quite as straightforward:

zero             0000
positive one     0001          negative one     1111
positive two     0010          negative two     1110
positive three   0011          negative three   1101
positive four    0100          negative four    1100
positive five    0101          negative five    1011
positive six     0110          negative six     1010
positive seven   0111          negative seven   1001
.                              negative eight   1000

Note that the negative binary numbers in the right column, being the sum of the right three bits' total plus the negative eight of the leftmost bit, don't "count" in the same progression as the positive binary numbers in the left column. Rather, the right three bits have to be set at the proper value to equal the desired (negative) total when summed with the negative eight place value of the leftmost bit.

Those right three bits are referred to as the two's complement of the corresponding positive number. Consider the following comparison:

positive number       two's complement
---------------       ----------------
001                    111
010                    110
011                    101
100                    100
101                    011
110                    010
111                    001

In this case, with the negative weight bit being the fourth bit (place value of negative eight), the two's complement for any positive number will be whatever value is needed to add to negative eight to make that positive value's negative equivalent. Thankfully, there's an easy way to figure out the two's complement for any binary number: simply invert all the bits of that number, changing all 1's to 0's and visa-versa (to arrive at what is called the one's complement) and then add one! For example, to obtain the two's complement of five (101₂), we would first invert all the bits to obtain 010₂ (the "one's complement"), then add one to obtain 011₂, or -5₁₀ in three-bit, two's complement form.

Interestingly enough, generating the two's complement of a binary number works the same if you manipulate all the bits, including the leftmost (sign) bit at the same time as the magnitude bits. Let's try this with the former example, converting a positive five to a negative five, but performing the complementation process on all four bits. We must be sure to include the 0 (positive) sign bit on the original number, five (0101₂). First, inverting all bits to obtain the one's complement: 1010₂. Then, adding one, we obtain the final answer: 1011₂, or -5₁₀ expressed in four-bit, two's complement form.

It is critically important to remember that the place of the negative-weight bit must be already determined before any two's complement conversions can be done. If our binary numeration field were such that the eighth bit was designated as the negative-weight bit (10000000₂), we'd have to determine the two's complement based on all seven of the other bits. Here, the two's complement of five (0000101₂) would be 1111011₂. A positive five in this system would be represented as 00000101₂, and a negative five as 11111011₂.

Subtraction

We can subtract one binary number from another by using the standard techniques adapted for decimal numbers (subtraction of each bit pair, right to left, "borrowing" as needed from bits to the left). However, if we can leverage the already familiar (and easier) technique of binary addition to subtract, that would be better. As we just learned, we can represent negative binary numbers by using the "two's complement" method and a negative place-weight bit. Here, we'll use those negative binary numbers to subtract through addition. Here's a sample problem:

Subtraction: 7₁₀ - 5₁₀         Addition equivalent:  7₁₀ + (-5₁₀)

If all we need to do is represent seven and negative five in binary (two's complemented) form, all we need is three bits plus the negative-weight bit:

positive seven = 0111₂
negative five  = 1011₂

Now, let's add them together:

.                    1111  <--- Carry bits
.                     0111
.                   + 1011
.                   ------
.                    10010
.                    |
.             Discard extra bit
.
.            Answer = 0010₂

Since we've already defined our number bit field as three bits plus the negative-weight bit, the fifth bit in the answer (1) will be discarded to give us a result of 0010₂, or positive two, which is the correct answer.

Another way to understand why we discard that extra bit is to remember that the leftmost bit of the lower number possesses a negative weight, in this case equal to negative eight. When we add these two binary numbers together, what we're actually doing with the MSBs is subtracting the lower number's MSB from the upper number's MSB. In subtraction, one never "carries" a digit or bit on to the next left place-weight.

Let's try another example, this time with larger numbers. If we want to add -25₁₀ to 18₁₀, we must first decide how large our binary bit field must be. To represent the largest (absolute value) number in our problem, which is twenty-five, we need at least five bits, plus a sixth bit for the negative-weight bit. Let's start by representing positive twenty-five, then finding the two's complement and putting it all together into one numeration:

+25₁₀  = 011001₂ (showing all six bits)      
One's complement of 11001₂ = 100110₂
One's complement + 1 = two's complement = 100111₂        
-25₁₀ = 100111₂

Essentially, we're representing negative twenty-five by using the negative-weight (sixth) bit with a value of negative thirty-two, plus positive seven (binary 111₂).

Now, let's represent positive eighteen in binary form, showing all six bits:

.              18₁₀  = 010010₂
.       
.       Now, let's add them together and see what we get:
.
.                     11   <--- Carry bits
.                   100111
.                 + 010010
.                 --------
.                   111001

Since there were no "extra" bits on the left, there are no bits to discard. The leftmost bit on the answer is a 1, which means that the answer is negative, in two's complement form, as it should be. Converting the answer to decimal form by summing all the bits times their respective weight values, we get:

(1 x -32₁₀)  +  (1 x 16₁₀)  +  (1 x 8₁₀)  +  (1 x 1₁₀)  = -7₁₀

Indeed -7₁₀ is the proper sum of -25₁₀ and 18₁₀.

Overflow

One caveat with signed binary numbers is that of overflow, where the answer to an addition or subtraction problem exceeds the magnitude which can be represented with the alloted number of bits. Remember that the place of the sign bit is fixed from the beginning of the problem. With the last example problem, we used five binary bits to represent the magnitude of the number, and the left-most (sixth) bit as the negative-weight, or sign, bit. With five bits to represent magnitude, we have a representation range of 2⁵, or thirty-two integer steps from 0 to maximum. This means that we can represent a number as high as +31₁₀ (011111₂), or as low as -32₁₀ (100000₂). If we set up an addition problem with two binary numbers, the sixth bit used for sign, and the result either exceeds +31₁₀ or is less than -32₁₀, our answer will be incorrect. Let's try adding 17₁₀ and 19₁₀ to see how this overflow condition works for excessive positive numbers:

.       17₁₀  = 10001₂            19₁₀  = 10011₂
.
.                           1  11  <--- Carry bits
.    (Showing sign bits)    010001
.                         + 010011
.                         -------- 
.                           100100

The answer (100100₂), interpreted with the sixth bit as the -32₁₀ place, is actually equal to -28₁₀, not +36₁₀ as we should get with +17₁₀ and +19₁₀ added together! Obviously, this is not correct. What went wrong? The answer lies in the restrictions of the six-bit number field within which we're working Since the magnitude of the true and proper sum (36₁₀) exceeds the allowable limit for our designated bit field, we have an overflow error. Simply put, six places doesn't give enough bits to represent the correct sum, so whatever figure we obtain using the strategy of discarding the left-most "carry" bit will be incorrect.

A similar error will occur if we add two negative numbers together to produce a sum that is too low for our six-bit binary field. Let's try adding -17₁₀ and -19₁₀ together to see how this works (or doesn't work, as the case may be!):

.      -17₁₀  = 101111₂           -19₁₀  = 101101₂
.
.                          1 1111  <--- Carry bits
.    (Showing sign bits)    101111
.                         + 101101
.                         --------
.                          1011100 
.                          |            
.                  Discard extra bit
.
FINAL ANSWER:   011100₂   = +28₁₀

The (incorrect) answer is a positive twenty-eight. The fact that the real sum of negative seventeen and negative nineteen was too low to be properly represented with a five bit magnitude field and a sixth sign bit is the root cause of this difficulty.

Let's try these two problems again, except this time using the seventh bit for a sign bit, and allowing the use of 6 bits for representing the magnitude:

.         17₁₀ + 19₁₀                     (-17₁₀) + (-19₁₀) 
.
.          1  11                           11 1111
.         0010001                           1101111
.       + 0010011                         + 1101101
.       ---------                         ---------
.         0100100₂                         11011100₂
.                                          |
.                                  Discard extra bit
.
. ANSWERS:  0100100₂ = +36₁₀
.           1011100₂ = -36₁₀

By using bit fields sufficiently large to handle the magnitude of the sums, we arrive at the correct answers.

In these sample problems we've been able to detect overflow errors by performing the addition problems in decimal form and comparing the results with the binary answers. For example, when adding +17₁₀ and +19₁₀ together, we knew that the answer was supposed to be +36₁₀, so when the binary sum checked out to be -28₁₀, we knew that something had to be wrong. Although this is a valid way of detecting overflow, it is not very efficient. After all, the whole idea of complementation is to be able to reliably add binary numbers together and not have to double-check the result by adding the same numbers together in decimal form! This is especially true for the purpose of building electronic circuits to add binary quantities together: the circuit has to be able to check itself for overflow without the supervision of a human being who already knows what the correct answer is.

What we need is a simple error-detection method that doesn't require any additional arithmetic. Perhaps the most elegant solution is to check for the sign of the sum and compare it against the signs of the numbers added. Obviously, two positive numbers added together should give a positive result, and two negative numbers added together should give a negative result. Notice that whenever we had a condition of overflow in the example problems, the sign of the sum was always opposite of the two added numbers: +17₁₀ plus +19₁₀ giving -28₁₀, or -17₁₀ plus -19₁₀ giving +28₁₀. By checking the signs alone we are able to tell that something is wrong.

But what about cases where a positive number is added to a negative number? What sign should the sum be in order to be correct. Or, more precisely, what sign of sum would necessarily indicate an overflow error? The answer to this is equally elegant: there will never be an overflow error when two numbers of opposite signs are added together! The reason for this is apparent when the nature of overflow is considered. Overflow occurs when the magnitude of a number exceeds the range allowed by the size of the bit field. The sum of two identically-signed numbers may very well exceed the range of the bit field of those two numbers, and so in this case overflow is a possibility. However, if a positive number is added to a negative number, the sum will always be closer to zero than either of the two added numbers: its magnitude must be less than the magnitude of either original number, and so overflow is impossible.

Fortunately, this technique of overflow detection is easily implemented in electronic circuitry, and it is a standard feature in digital adder circuits: a subject for a later chapter.