source: projects/punch-card/fileformats.htm @ 44

Last change on this file since 44 was 44, checked in by sven, 14 years ago

Import of the Punch Card Editor Project.
This is a C++/Qt project (using qmake) that I've started this weekend.
Of course it's supposed to be released as open source.

I've tried to start with a clean (but now still empty, of course)
directory structure. There will come the sourcecode for a complete
AVR ATmega microcontroller punch card device controller, soon.
I'm planing to finish this editing program and to implement the
communication protocol (over TTY, using some platform indepentent library).
Unfortunately that will take some time (and I don't have much time
any more...)

-- Sven @ workstation

File size: 5.5 KB
Line 
1<html>
2<body>
3<h2>File Formats for Punch Cards</h2>
4
5<p>Since punch cards have a quite special layout (usually 80 columns with 12 bits each),
6they cannot simply be saved in a binary data stream like 8-bit paper tapes are usually
7stored on computers (at least with my programs). It doesn't seem that many people encountered
8that problem in the past &ndash; I could find only one (!) document facing this problem.</p>
9
10<h3>The Jones Punched Card File Format</h3>
11<p>The <a href="http://www.cs.uiowa.edu/~jones/cards/format.html">punched card emulation proposal</a>
12by Douglas W. Jones is the only document in the world wide web that concerns this problem. Jones
13has written some utilities in C to work with emulated punched card decks (I simply call them
14"punch card files"). It's a highly compact binary format with a 3-byte header (that holds some magic
15string like <tt>H80</tt> or <tt>H82</tt>) that preceds every card file. The file itself is simply a
16concatenation of punch cards. Every punch card consists of a 3-byte header (filled up with some
17flags that are supposed to describe the design of the punch card) and the 80 columns. Two 12-bit
18colums are packed to three octets in bigendian format.</p>
19
20<p>This model of a punch card stack has following advantages and disadvantages:</p>
21
22<h4>Advantages</h4>
23<ul>
24   <li>It is <b>very compact</b>, one card costs only 123 bytes.</li>
25   <li>I/O is <b>very fast</b> due to cheap bit shifting operations that can be made easily
26       in C/C++ programs</li>
27   <li>It's a well-documented <b>standard</b>, there already exist some programs that
28       can work with that format.</li>
29</ul>
30
31<h4>Disadvantages</h4>
32<ul>
33   <li>No chance to edit with <b>text editors</b>, difficult hex editor handling is neccessary</li>
34   <li>I/O needs complex bit shifting operations, this is not intiutive for the programmer,
35       especially if files would be operated by scripts, etc.</li>
36   <li>No place for <b>meta data</b> like the label for a column, a translation table
37       (column value to Unicode character) or additional comments</li>
38   <li>Due to the 3-byte header, files cannot be treated like a <b>stream of punch cards</b>,
39       so usual unix programs like <tt>cat</tt> won't work</li>
40</ul>
41
42<h3>The Card Markup Language (XML)</h3>
43<p>Modeling a punch card with XML is a contemporary idea. Basically, I think such a file could
44look like this one:</p>
45
46<pre>
47&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
48&lt;card-deck xmlns="http://dev.technikum29.de/2009/punch-card-markup-language"&gt;
49        &lt;card&gt;
50                &lt;meta name="generator" value="Punch Card Reader Xyz" /&gt;
51                &lt;meta name="scanned" value="15.08.09T12:35:42" /&gt;
52               
53                &lt;property key="Reader.Error" raw="-1235"&gt;Error:Foobar&lt;/property&gt;
54               
55                &lt;property key="Jones.Color" raw="0100"&gt;cream&lt;/property&gt;
56                &lt;property key="Jones.Corner" raw="0"&gt;round&lt;/property&gt;
57                &lt;property key="Jones.Cut" raw="1" /&gt;
58                &lt;property key="Jones.Header" value="&amp;#01;&amp;#02;&amp;#05;" /&gt;
59                &lt;comment type="text/html"&gt;
60                        &lt;!-- dank Qt-Richtext-Widget leicht exportierbar --&gt;
61                        &lt;html:p&gt;
62                                Hier lassen sich problemlos Richtext-Inhalte
63                                hinschreiben.
64                        &lt;/html:p&gt;
65                &lt;/comment&gt;
66                &lt;column value="101101101010"&gt;
67                        &lt;label&gt;A&lt;/label&gt;
68                &lt;/column&gt;
69                &lt;column value="100100100100"&gt;
70                        &lt;label&gt;&specialfoo;&lt;/label&gt;
71                &lt;/column&gt;
72                &lt;column value="100100100000" label="F" /&gt;
73                &lt;column bit0="0" bit1="1" bit2="1" bit3="2" bit4="1" bit5="0" bit6="0" bit7="0" bit8="0" bit9="0" bit10="0" bit11="1" bit12="0"&gt;
74                &lt;/column&gt;
75        &lt;/card&gt;
76
77        ...
78&lt;/card-deck&gt;
79</pre>
80
81<p>As you might notice, there are different possiblities how to model a card. There is, for
82example, the most elaborate variant (<tt>&lt;column bit0="1" bit1="0" ...</tt>). Using that
83method, every punch card costs about 10&nsbp;kByte. There are cheaper methods like the
84"boolean string" <tt>0101...</tt>, but here it needs to be defined which position corresponds to
85which one on the punch card.</p>
86
87<h4>Advantages</h4>
88<ul>
89  <li><b>Good text editor support</b></li>
90  <li>Most <b>unambiguous model</b> of a punch card</li>
91  <li>Very good <b>meta data</b> support in all respects</li>
92</ul>
93
94<h4>Disadvantages</h4>
95<ul>
96  <li>A lot of overhead data, blowing up the file size enormously. Can be
97      stripped down to very small sizes via compression (like gzip-on-the-fly)</li>
98</ul>
99
100<h3>Other, more exotic formats</h3>
101
102<ul>
103  <li>Any <b>text formats</b> where the card is represented in a textual form like
104<pre>
105/12345679012
106|00000100001
107|00011000000
108|...
109</pre>
110     They are very good editable with text editors and not so much overhead, compared
111     to the XML version. On the other hand, that would be ideal I/O formats for scripts
112     (like perl scripts) but not for C code. Furthermore this is also not so clean.</li>
113 <li><b>Bitmaps</b>, yes, pixel images. This is a fancy method that came into my mind
114     some time ago, it would have been a perfect idea for paper tapes (for paper tape
115     fonts, etc.), but "drawing" text via holes is not very common on punch cards, so
116     there's no serious advantage in this format against any other binary format
117     (except that it can be edited with a bitmap editor, hehe)</li>
118 <li><b>CSV</b>, a classical export file format if someone want's to edit punch cards
119     with Excel, but not neccessarily the first class storage format for punch cards</li>
120 <li>Simple <b>80 columns text</b>, also only for import/export to real punch card
121     files</li>
122</ul>
Note: See TracBrowser for help on using the repository browser.
© 2008 - 2013 technikum29 • Sven Köppel • Some rights reserved
Powered by Trac
Expect where otherwise noted, content on this site is licensed under a Creative Commons 3.0 License