# CS 130 Lab #4: Representation and manipulation of textual data

## Goals

In this lab we will explore how text is represented or encoded in terms of binary codes (which can then be more conveniently written using hexadecimal digits). We will look at a historical progression of more sophisticated standards for encoding characters, do some conversions by hand, and then use a spreadsheet to implement some interesting conversions as "programs".

## Foundations: bits, bytes and codes

A byte is a "chunk" of binary data consisting of some number of bits taken as whole, typically 8 bits. For example:
1 0 1 0 1 1 1 0
An 8-bit byte can be represented as above, or as 2 digits in hexadecimal (why?). For example, the byte above could also be written as:
AE
Of course, the byte can also be represented as a decimal value when considered as a number:
AEhex = (10 x 16) + (14 x 1) = 174dec
Characters are symbols that we sue for writing. In order to be stored, processed and displayed by computers, characters are represented or encoded into binary form (sequences of 1s and 0s). These representations are usually made in terms of a fixed number of bits, often 8 bits: in other words, a byte is often used to represent a single character.

Historically, however, there has been a lot of "fuss" over just how this is done: older encodings used 5 bits (see the Baudot code below). A modern world-wide standard originating in the USA (the ASCII code, pronounced "ask-ee") uses 7 bits, but was often "rounded up" to 8 bits to fit a typical byte more exactly. Extended ASCII codes were later introduced and (more or less) standardized to allow a wider variety of Western (European) languages to be encoded on computers. The modern approach to these issues is embodied as the Unicode standard, which can use even more bits (in some forms) to encode nearly all the symbols used for writing in any language in the world (and some not even of this world, in some sense).

Read the history of character codes listed in the first item below, toward the goal of answering the hand-written exercises in the following section. You may find some of the other links in this list useful for various parts of the exercises.

## Hand-written exercises (for demo!)

Determine how your first and last name would be written out using:
• Morse code: write the dots and dashes out on paper.

• Baudot code: write out your name, and show the corresponding codes in binary and decimal, using one decimal number per 5 bits of Baudot code

• ASCII code: write out your name, and show the corresponding codes in binary and decimal, using one decimal number per 7 bits of ASCII code.

Finally, what is the Unicode representation of the biohazard symbol? (you may want to use a search engine to find this informat

For this section, you should develop a spreadsheet "program" in Excel which will convert your name (or any characters entered) into several different forms. See if you can make your spreadsheet look like this example:

The input string (Name) appears in the upper left; successive rows of the sheet then display:

• individual characters in cells (use the MID and COLUMN functions; MID allows you to select out characters from a piece of text, COLUMN allows you to know which column the current cell is in, numerically);

• the decimal codes for these characters in ASCII (use the CODE function);

• the hexadecimal pairs corresponding to the above decimal codes (use the DEC2HEX function);

• the hexadecimal codes, but with the pairs reversed in order (use the CONCATENATE function and the MID function again; CONCATENATE will allow you to put individual characters together into a longer string);

• the decimal codes for these reversed hex pairs (using HEX2DEC);

• the characters represented by these codes (using the CHAR function);

• (OPTIONAL!) the whole string of converted characters (called "Funny Name" in the picture above).