Scanners, OCR, and PDFs

O tech gurus:

I may need to convert a lot (more than 1,000 pages, total) of printed documents (double-sided, eight-and-a-half by eleven) to PDF format. I’d prefer that they have searchable text, not just images of text, so I need OCR.

But so far I’m not having any luck with the online PDF creation at Adobe or with the demo of Adobe Acrobat 7.0 Professional. I keep getting error messages, especially ones related to the source image not being at a high enough resolution. But my ancient scanner, an HP ScanJet 3300C that I’ve never much liked, doesn’t want to change its ways.

So I’m considering picking up a new scanner, preferably one with a document feeder that isn’t going to jam. Recommendations?

Also, any recommendations for OCR software, esp. that can handle English text with the occasional Chinese character without rejecting the whole page? (I don’t care about Chinese OCR – just that the software won’t get freaked out when dealing with something unexpected.) And, ideally, it would be able to save to PDF.

My system is a little old: Win2K, Pentium 4, 1.6 GHz, 524 MB RAM. I suppose an upgrade could be possible, if that would speed up my task considerably.

I think the system might get a little bit freaked if it’s English-only and suddenly finds a Chinese character in the middle, but I’m not sure. The new packages might be better.

I can’t recommend a specific scanner (in fact, my old clunky scanner worked just fine with the Danqing Chinese OCR package and the new HP all-in-one chokes it for some reason) but I’d like to hear about your experiences with Adobe, especially in terms of embedding Chinese fonts. It’s something I’d be very interested in being able to do, too.

I ended up ordering a Fujitsu Scansnap fi-5110EOX2 from a U.S. online retailer for US$390, plus US$73 shipping. The U.S. order should also allow me to collect a US$50 rebate and a free copy of ABBYY’s basic OCR program, which makes the price about NT$13,000. The same scanner costs NT$23,000 in Taiwan – with no rebate or free software offer. :noway:

This seems to be very much the scanner for taking lots of documents and converting them quickly into PDFs. Adobe Acrobat Standard is bundled with the scanner.

I just hope it makes it here in one piece and that customs doesn’t want to make me pay an arm and a leg.

You can outsource this to copy shops rather than buy equipment.
Hand them a book; they hand you a .pdf.

This .pdf will be very large because its just a bunch of images of the pages.
Acrobat Professional has some internal OCR capability. But there will be errors.

I think OminPage Professional (now version 15) dominates the market.

There will still be a lot of errors. You could outsource to proofreaders.
Even then there will be errors.

I thought I’d add some information now that the scanner has arrived and I’ve had a chance to play with it.

The main thing that impresses me is that this baby is fast. Put up to 50 pages into the feeder, and the scanner will spit a PDF in just about one minute. Moreover, it scans both sides of the paper at the same time and can automatically remove blank pages from the final document.

It also has a very small footprint.

So far, I’m very pleased.