Manual Page Result
0
Command: ocr | Section: 1 | Source: UNIX v10 | File: ocr.1
OCR(1) General Commands Manual OCR(1)
NAME
ocr - optical character recognition
SYNOPSIS
ocr [ option ... ] [ file ]
DESCRIPTION
Ocr reads a black-and-white image of a page from file, and writes ASCII
to the standard output. If no file is specified, it reads from the
standard input.
The input is a picfile(5) image of one column of machine-printed text,
normally scanned in by cscan(1). Fonts, sizes, and line-spacings may
vary within the column, but each line should have a constant text size
and baseline. Lines should be parallel and roughly horizontal.
In the output, white space approximates the original page layout.
Words that spell(1) are preferred, and hyphenations across lines are
recombined.
The options are:
-as The alphabet is the union of symbol sets selected by characters
in string s, from among:
A ABCDEFGHIJKLMNOPQRSTUVWXYZ
a abcdefghijklmnopqrstuvwxyz
0 0123456789
. .,-:;*'"?!/&$()[]#@% (basic punctuation)
^ ^~`\|{}_ (extended punct'n)
+ +-*/<>=.Ee[] (numerical punct'n)
s \(sc\(dg\(dd\(ct\(bu\(co ... (selected non-ASCII)
l fi fl ff ffi ffl ae oe ... (ligatures, digraphs)
g \(*a\(*b\(*g\(*d\(*e\(*z ... (Greek lower case)
G AB\(*G\(*DEZ ... (Greek upper case)
The default is -aAa0.+^, the full printable-ASCII set, which may
be abbreviated as -ap. Thus, -apslgG selects all of the above.
-c Find columns in complex nested layouts using greedy white covers
algorithm.
-ml[,r]
Trim the left and right margins of the image by l and r inches,
respectively, before looking for columns. If r is omitted, it
is assumed to equal l.
-nn Find the n largest columns by analysis of a single vertical pro-
jection. Each column should be compactly-printed and separated
from the others by at least 2 ems of horizontal white space.
-pn,m Point sizes lie in the range [ n, m ]; other sizes are dis-
carded. The default is -p6,24.
-s Defeat spelling check (but continue to favor numeric strings and
good punctuation).
-t Write troff(1) format. Each column is shown on a separate page,
lines at their original height, words at their original horizon-
tal location, and characters roughly original size in Times ro-
man. Hyphenated words are not recombined.
-u Unspellable words are prefixed with `?' or, if -t is specified,
printed boldface.
-ww Find the largest column of width w inches, within a single ver-
tical projection.
Fonts
Trained on over 100 Latin-alphabet book fonts in various italic, bold,
etc styles. Only one font of Greek, without diacriticals. Also
Swedish and Tibetan, on request.
SEE ALSO
bcp(1), cscan(1), font(6), picfile(5), spell(1), troff(1)
BUGS
For best results, use images of high-contrast, cleanly-printed original
documents digitized at a resolution of 400 pixels/inch or higher. It
may help to restrict the alphabet and sizes to what's there.
cetus,hydra,coma OCR(1)