Tables and Strings in COBOL
Big Data like it's 1985
I recently came across a blog post dealing briefly with the concept of strings, tables and subscripting in COBOL. While the code in the blog post works just fine, I personally think it's overcomplicating a very simple use case (subscripting a string) and underselling a powerful COBOL feature (tables). Since I'm a deeply demented man with a lot of free time on my hands, I decided to expand a bit on the subject - if only to give myself a chance of brushing up on my own very rudimentary COBOL knowledge. Feel free to point out any errors.
Table of Contents
- Subscripting Strings
- Creating Tables
- Why call it "Tables"?
- Bad Table Practices
- Sorting Tables
- Searching Tables
- Another Dimension
- A note on pointers
- STOP RUN.
The original blog post is correct in that strings in COBOL can't
strictly speaking be subscripted. It's also correct in that you
can create a table with an item length of one, put your string
in that and then subscript it. That is, if you have the string
"Hello, World!" in a table accessible as
and want to access the first character ("H"), you'll be able to write
But there's a much easier and more powerful way to do this in COBOL, called reference modification - or as many other languages call it, substrings. It's easier because you don't have to define a table and more powerful because, unlike the table solution, you can access an arbitrary length of the string in one go.
mystring is instead an ordinary string,
you can access its first character with
The first integer before the colon is the starting character position
and the integer after is the desired substring length.
This can of course return an arbitrary string length from an arbitrary
position, such as
mystring(2:4) and even the remainder of
a string from a given starting point, such as
In short, it's similar to substring handling in many other languages.
Reference modification can also be used for value assignment, as in the following program:
IDENTIFICATION DIVISION. PROGRAM-ID. "Strings". DATA DIVISION. LOCAL-STORAGE SECTION. 01 string-a PIC X(10). 01 string-b PIC X(20). PROCEDURE DIVISION. MOVE ALL "foo" TO string-a. DISPLAY string-a. MOVE ALL "bar" TO string-b. DISPLAY string-b. MOVE string-a(2:5) TO string-b(8:5). DISPLAY string-b. STOP RUN.
This will yield the following output:
foofoofoof barbarbarbarbarbarba barbarboofoobarbarba
Doing something similar
with tables alone would require a lot of
individual characters into temporary variables. Since reference modification
was introduced in COBOL-85, I dare say it's going to be available on all
but the most ancient of legacy systems.
This doesn't mean that tables aren't useful, because they provide additional constructs and abstractions for working with data. Consider the following code:
DATA DIVISION. LOCAL-STORAGE SECTION. 01 str-tbl. 02 str3 PIC XXX OCCURS 5 TIMES.
Here, we've defined the table
str-tbl, which can hold five
str3 items, each with a length of three characters (that's what
XXX means; it can also be written as
Now, let's populate it with some items:
PROCEDURE DIVISION. MOVE "abc" TO str3(1). MOVE "def" TO str3(2). MOVE "ghi" TO str3(3). MOVE "jkl" TO str3(4). MOVE "mno" TO str3(5).
If we wanted to pick the fifth element from a populated
we'd subscript it by referencing the item name:
str3(5). We'd now
get a three-character string back, since that's how we've defined it. So,
DISPLAY str3(5) would print "mno".
This subscript can be
combined with reference modification, which means that
DISPLAY str3(5)(1:2) would print "mn".
We can still deal with the whole table as a string, meaning
DISPLAY str-tbl will print "abcdefghijklmno" and
DISPLAY str-tbl(1:1) will print "a".
Why call it "Tables"?
Things in COBOL often differ from other languages, because COBOL is, in many ways, not like other languages. That could perhaps suffice as an explanation of why tables are called tables, but I'd argue that the reason they're called tables is because they are, well, tables. They can be subdivided into multiple fields, and they can be sorted and searched in ways that are reminiscent of SQL.
Consider the following table definition:
01 mix-tbl. 02 mix-item OCCURS 4 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX.
Here, we've defined the table
mix-tbl, in which
we'll store three of the item (or record, as the COBOL lingo goes)
mix-item. The record itself consists of both a numerically
mix-num field and the alphanumeric field
mix-str. (Having the option of arbitrarily formatted fields
in a table record means you could feed in the numbers "07250" and get a nicely
formatted cost back, E.G. "$72,50". How's that for the awesome power of
We can now populate this table in a number of ways, though I strongly advice to always populate individual record fields. Here are a few different varieties:
MOVE "03Aaa" TO mix-tbl. MOVE "11Bbb" TO mix-item(2). MOVE 2 TO mix-num(3). MOVE "Ccc" TO mix-str(3). MOVE 2 TO mix-num(4). MOVE "Ddd" TO mix-str(4).
- On the first line, we move the value "03Aaa" to the table itself, thus populating the first record.
- On the second line, we assign the value "11Bbb" to the second record in the table.
- On the third and fourth lines, we populate the third record by assigning mix-num and mix-str individually. The same method is repeated on the last two lines.
We could of course also populate our table by reading from a file, but let's leave that for another time.
Bad Table Practices
It's important to note that COBOL will only format our numeric
values for us if we perform atomic assignments to the individual record
fields. If the first assignment above had read
MOVE "3Aaa" TO mix-tbl,
we'd have quite a problem on our hands, because COBOL would then happily
put "3A" into our
With that out of the way, let's continue on!
If we wanted to look at
mix-tbl in its entirety, we could
DISPLAY mix-tbl, which would give us the
We could also access for example
mix-item(3), giving "02Ccc", or
mix-str(3) giving "02" and "Ccc", respectively.
We can also easily sort the table using the
SORT mix-item ASCENDING mix-num DESCENDING mix-str.
Note that just like in SQL, we can sort fields in different orders
and according to an arbitrary chain of precedence.
The table is now sorted in place;
mix-num(3) will give us
mix-str(3) will give us "Aaa".
Here's the entire program:
IDENTIFICATION DIVISION. PROGRAM-ID. "Sorting Tables". DATA DIVISION. LOCAL-STORAGE SECTION. 01 mix-tbl. 02 mix-item OCCURS 4 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX. PROCEDURE DIVISION. MOVE "03Aaa" TO mix-tbl. MOVE "11Bbb" TO mix-item(2). MOVE 2 TO mix-num(3). MOVE "Ccc" to mix-str(3). MOVE 2 TO mix-num(4). MOVE "Ddd" to mix-str(4). DISPLAY mix-tbl. DISPLAY mix-item(2). DISPLAY mix-str(3). DISPLAY mix-num(3). SORT mix-item ASCENDING mix-num DESCENDING mix-str. DISPLAY mix-tbl. DISPLAY mix-item(2). DISPLAY mix-str(3). DISPLAY mix-num(3). DISPLAY mix-str(4). DISPLAY mix-num(4). STOP RUN.
It should produce the following output:
03Aaa11Bbb02Ccc02Ddd 11Bbb Ccc 02 02Ddd02Ccc03Aaa11Bbb 02Ccc Aaa 03 Bbb 11
Tables can also be searched. In order to perform a search, our table
must be indexed, which we'll tell it with the
INDEXED BY instruction when defining it:
DATA DIVISION. LOCAL-STORAGE SECTION. 01 product-tbl. 02 product-item OCCURS 5 TIMES INDEXED BY idx. 03 product-name PIC X(8). 03 product-price PIC $ZZ. 77 search-query PIC X(8).
Once this table is populated, we can now search it using the
SEARCH construct, which follows a common pattern in COBOL.
It's got two sub-clauses, one of which is
AT END, which
in the case of
SEARCH means we've reached the end of the table
without finding a matching search criteria. (When reading files in COBOL,
you perform your typical line reading in a
NOT AT END clause,
which I find both confusing and amusing.)
SEARCH product-item AT END DISPLAY "No matches for "search-query WHEN product-name(idx) = search-query DISPLAY product-name(idx)": "product-price(idx) END-SEARCH.
In this case, we're searching for a product name and when it's found, we display its price. Here's all of the code:
IDENTIFICATION DIVISION. PROGRAM-ID. "Searching". DATA DIVISION. LOCAL-STORAGE SECTION. 01 product-tbl. 02 product-item OCCURS 5 TIMES INDEXED BY idx. 03 product-name PIC X(8). 03 product-price PIC $ZZ. 77 search-query PIC X(8). PROCEDURE DIVISION. *> Populate and print our table. PERFORM VARYING idx FROM 1 BY 1 UNTIL idx=6 STRING "Product" FUNCTION CHAR(65 + idx) INTO product-name(idx) COMPUTE product-price(idx) = idx * 10 DISPLAY product-name(idx) " : " product-price(idx) END-PERFORM. *> Search with mismatch. MOVE "NotFound" TO search-query. PERFORM Search-Table. *> Search with match. MOVE "ProductC" TO search-query. PERFORM Search-Table. STOP RUN. Search-Table. MOVE 1 TO idx. SEARCH product-item AT END DISPLAY "No matches for "search-query WHEN product-name(idx) = search-query DISPLAY product-name(idx)": "product-price(idx) END-SEARCH.
The above program should output the following:
ProductA : $10 ProductB : $20 ProductC : $30 ProductD : $40 ProductE : $50 No matches for NotFound Found ProductC: $30
On a sorted table, we could also perform a binary search using
Multi-dimensional tables can also be defined. We can add to our
IDENTIFICATION DIVISION. PROGRAM-ID. "Multidimensional Tables". DATA DIVISION. LOCAL-STORAGE SECTION. 01 mix-tbl. 02 mix-item OCCURS 3 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX. 03 mix-sub OCCURS 3 TIMES. 04 sub-num PIC 99. PROCEDURE DIVISION. MOVE 30 to mix-num(3). MOVE 31 to mix-sub(3,1). MOVE 32 to mix-sub(3,2). MOVE 33 to mix-sub(3,3). DISPLAY mix-sub(3,2). STOP RUN.
This will now of course output "32".
A note on pointers
Another way of accessing arbitrary positions in COBOL strings are
pointers. They're not exactly of the C variety, though
they have some vague similarities to the pointer arithmetic used when
working with string parsing in C. COBOL pointers are
used together with the instructions
UNSTRING, to handle character positions
That's enough COBOL for one helping. Thanks for reading and Happy Hacking!