1 | <html> |
---|
2 | <body> |
---|
3 | <h2>File Formats for Punch Cards</h2> |
---|
4 | |
---|
5 | <p>Since punch cards have a quite special layout (usually 80 columns with 12 bits each), |
---|
6 | they cannot simply be saved in a binary data stream like 8-bit paper tapes are usually |
---|
7 | stored on computers (at least with my programs). It doesn't seem that many people encountered |
---|
8 | that problem in the past – I could find only one (!) document facing this problem.</p> |
---|
9 | |
---|
10 | <h3>The Jones Punched Card File Format</h3> |
---|
11 | <p>The <a href="http://www.cs.uiowa.edu/~jones/cards/format.html">punched card emulation proposal</a> |
---|
12 | by Douglas W. Jones is the only document in the world wide web that concerns this problem. Jones |
---|
13 | has written some utilities in C to work with emulated punched card decks (I simply call them |
---|
14 | "punch card files"). It's a highly compact binary format with a 3-byte header (that holds some magic |
---|
15 | string like <tt>H80</tt> or <tt>H82</tt>) that preceds every card file. The file itself is simply a |
---|
16 | concatenation of punch cards. Every punch card consists of a 3-byte header (filled up with some |
---|
17 | flags that are supposed to describe the design of the punch card) and the 80 columns. Two 12-bit |
---|
18 | colums are packed to three octets in bigendian format.</p> |
---|
19 | |
---|
20 | <p>This model of a punch card stack has following advantages and disadvantages:</p> |
---|
21 | |
---|
22 | <h4>Advantages</h4> |
---|
23 | <ul> |
---|
24 | <li>It is <b>very compact</b>, one card costs only 123 bytes.</li> |
---|
25 | <li>I/O is <b>very fast</b> due to cheap bit shifting operations that can be made easily |
---|
26 | in C/C++ programs</li> |
---|
27 | <li>It's a well-documented <b>standard</b>, there already exist some programs that |
---|
28 | can work with that format.</li> |
---|
29 | </ul> |
---|
30 | |
---|
31 | <h4>Disadvantages</h4> |
---|
32 | <ul> |
---|
33 | <li>No chance to edit with <b>text editors</b>, difficult hex editor handling is neccessary</li> |
---|
34 | <li>I/O needs complex bit shifting operations, this is not intiutive for the programmer, |
---|
35 | especially if files would be operated by scripts, etc.</li> |
---|
36 | <li>No place for <b>meta data</b> like the label for a column, a translation table |
---|
37 | (column value to Unicode character) or additional comments</li> |
---|
38 | <li>Due to the 3-byte header, files cannot be treated like a <b>stream of punch cards</b>, |
---|
39 | so usual unix programs like <tt>cat</tt> won't work</li> |
---|
40 | </ul> |
---|
41 | |
---|
42 | <h3>The Card Markup Language (XML)</h3> |
---|
43 | <p>Modeling a punch card with XML is a contemporary idea. Basically, I think such a file could |
---|
44 | look like this one:</p> |
---|
45 | |
---|
46 | <pre> |
---|
47 | <?xml version="1.0" encoding="UTF-8" ?> |
---|
48 | <card-deck xmlns="http://dev.technikum29.de/2009/punch-card-markup-language"> |
---|
49 | <card> |
---|
50 | <meta name="generator" value="Punch Card Reader Xyz" /> |
---|
51 | <meta name="scanned" value="15.08.09T12:35:42" /> |
---|
52 | |
---|
53 | <property key="Reader.Error" raw="-1235">Error:Foobar</property> |
---|
54 | |
---|
55 | <property key="Jones.Color" raw="0100">cream</property> |
---|
56 | <property key="Jones.Corner" raw="0">round</property> |
---|
57 | <property key="Jones.Cut" raw="1" /> |
---|
58 | <property key="Jones.Header" value="&#01;&#02;&#05;" /> |
---|
59 | <comment type="text/html"> |
---|
60 | <!-- dank Qt-Richtext-Widget leicht exportierbar --> |
---|
61 | <html:p> |
---|
62 | Hier lassen sich problemlos Richtext-Inhalte |
---|
63 | hinschreiben. |
---|
64 | </html:p> |
---|
65 | </comment> |
---|
66 | <column value="101101101010"> |
---|
67 | <label>A</label> |
---|
68 | </column> |
---|
69 | <column value="100100100100"> |
---|
70 | <label>&specialfoo;</label> |
---|
71 | </column> |
---|
72 | <column value="100100100000" label="F" /> |
---|
73 | <column bit0="0" bit1="1" bit2="1" bit3="2" bit4="1" bit5="0" bit6="0" bit7="0" bit8="0" bit9="0" bit10="0" bit11="1" bit12="0"> |
---|
74 | </column> |
---|
75 | </card> |
---|
76 | |
---|
77 | ... |
---|
78 | </card-deck> |
---|
79 | </pre> |
---|
80 | |
---|
81 | <p>As you might notice, there are different possiblities how to model a card. There is, for |
---|
82 | example, the most elaborate variant (<tt><column bit0="1" bit1="0" ...</tt>). Using that |
---|
83 | method, every punch card costs about 10&nsbp;kByte. There are cheaper methods like the |
---|
84 | "boolean string" <tt>0101...</tt>, but here it needs to be defined which position corresponds to |
---|
85 | which one on the punch card.</p> |
---|
86 | |
---|
87 | <h4>Advantages</h4> |
---|
88 | <ul> |
---|
89 | <li><b>Good text editor support</b></li> |
---|
90 | <li>Most <b>unambiguous model</b> of a punch card</li> |
---|
91 | <li>Very good <b>meta data</b> support in all respects</li> |
---|
92 | </ul> |
---|
93 | |
---|
94 | <h4>Disadvantages</h4> |
---|
95 | <ul> |
---|
96 | <li>A lot of overhead data, blowing up the file size enormously. Can be |
---|
97 | stripped down to very small sizes via compression (like gzip-on-the-fly)</li> |
---|
98 | </ul> |
---|
99 | |
---|
100 | <h3>Other, more exotic formats</h3> |
---|
101 | |
---|
102 | <ul> |
---|
103 | <li>Any <b>text formats</b> where the card is represented in a textual form like |
---|
104 | <pre> |
---|
105 | /12345679012 |
---|
106 | |00000100001 |
---|
107 | |00011000000 |
---|
108 | |... |
---|
109 | </pre> |
---|
110 | They are very good editable with text editors and not so much overhead, compared |
---|
111 | to the XML version. On the other hand, that would be ideal I/O formats for scripts |
---|
112 | (like perl scripts) but not for C code. Furthermore this is also not so clean.</li> |
---|
113 | <li><b>Bitmaps</b>, yes, pixel images. This is a fancy method that came into my mind |
---|
114 | some time ago, it would have been a perfect idea for paper tapes (for paper tape |
---|
115 | fonts, etc.), but "drawing" text via holes is not very common on punch cards, so |
---|
116 | there's no serious advantage in this format against any other binary format |
---|
117 | (except that it can be edited with a bitmap editor, hehe)</li> |
---|
118 | <li><b>CSV</b>, a classical export file format if someone want's to edit punch cards |
---|
119 | with Excel, but not neccessarily the first class storage format for punch cards</li> |
---|
120 | <li>Simple <b>80 columns text</b>, also only for import/export to real punch card |
---|
121 | files</li> |
---|
122 | </ul> |
---|