Thursday, February 28, 2013

How to hack Scansnap Organizer to use any PDF documents

To avoid clutter, I scan everything: bank statements, business cards, brochures, receipts, anything. Then I OCR them (convert the image into text), store them in the cloud, share them in all my computing devices, and, in my laptop, heavily index them so I can find anything I ever touch, or so the idea. (hmm... what was the phone number of that car rental company I got a brochure last trip to bali... just run search! done)

Thus, one of my pride and joy, (and nearly single most expensive device acquisition outside of a laptop or a smart phone) is my Fujitsu Scansnap S1500. This work horse scanner can scan double-sided up to 50 page per minute. And for someone who scan just about everything, this is a must. And at $500, the Scansnap is a necessary evil. (Don't try it unless you can afford to fall in love with it, its dangerously convenient).

Sadly, the ABBYY Finereader that comes with it (that OCR docs better than Adobe PDF Pro)  only works for documents I scan on the Scansnap. Same with the Scansnap Organizer and Viewer (that can rearrange PDF documents since my Adobe PDF Pro stops working). 

And since I convince a lot of people to send documents to me already in PDF from various scanners, camera, etc. This is annoying. So here is the hack that I found on the web (see http://www.jsilence.org/blog/2011/01/25/edit-pdf-metadata/ ) to allow me to trick MOST pdf documents to be usable.

The trick is to add/edit the "CREATOR" tag of the PDF document to say "ScanSnap Manager #S1500M".

Step 1. Get and install a free software called PDFTk.

The software installation should install the software in 

C:\Program Files (x86)\PDF Labs\PDFtk Server\bin

Step 2. Create a directory in Window. Mine is called FixPDF.

Step 3. In the above directory, Create a text file: scansnap_meta.txt containing:

InfoBegin
InfoKey: Creator
InfoValue: ScanSnap Manager #S1500M


This file contains the meta data of the PDF, and will be used by PDFtk to inject into the pdf file to be fixed.
 
Step 4. Then create also a script text file: FixPDF.bat containing:

for %%a in (*.pdf) do (
"C:\Program Files (x86)\PDF Labs\PDFtk Server\bin\pdftk.exe" "%%a" update_info scansnap_meta.txt output "..\%%a"
)

This script will run the PDFtk software on all '*.pdf ' files in the directory and replace the original copy as the output.

Step 5. Put all PDF files to fix in the directory.

Step 6. Run the FixPDF.bat script.

Done.

17 comments:

Drachen said...

Hi Donald.

Thanks for the scripts...however must be doing something wrong cause the "new pdf file" with the same name doesn't change the "Creator" :-(

I did run some tests to understand what I was doing wrong and the conclusion is: when asked to replace the file for one with the same name, the creator doesn't change...if I choose a different name, then the script works perfectly!

Do you change the name of the new pdf files?

Donald said...

Hi Drachen,

Sorry, I just saw your posting today.

The script is designed to create a new file in the parent directory. that is what the "../" supposed to do.

But I'm glad you figure out that it works by changing the name.

Regards

Donald

vitalyb said...

That works great! The best part, for me, is that it lets you edit ANY pdf in "ScanSnap Organizer Viewier" which is an awesome PDF editor.

I've made the following script that allows me now to open any PDF with the ScanSnap editor:


set source=%~1
set target=%~dp1%~n12%~x1

"c:\Program Files (x86)\PDFtk\bin\pdftk.exe" "%source%" update_info "c:\Program Files (x86)\PDFtk\bin\scansnap_meta.txt" output "%target%"
move /Y "%target%" "%source%"
start "" "C:\Program Files (x86)\PFU\ScanSnap\Organizer\SsView.exe" "%source%"


To use, save it as batch file and it as an optional "Open with" application for PDF files.

Donald said...

That is a great idea, vitalyb!

I'll try that.

Unknown said...

Hi everyone,

Has anyone got this to work with newer software and/or ix500? Changing the creator doesn't seem to be sufficient to open the file anymore. Opening a pdf file with notepad I see there is also a creatortool field as well with some scansnap value in it but that is not supported in pdftk.

Not a big deal to me for a few documents since I can always print a pdf and rescan it, just hate to waste the paper and time. Any ideas would be appreciated. I am new to scansnap so I had not tried this older software.

Die Fledermaus said...

Given the wide use of ix500 and Scansnap software, it is an aggravation, especially as even downloaded PDFs can't be modified due to this limitation, certainly tied to ABBYY licensing etc.
Having hundreds of PDFs scanned w/ old Neatscan, which I only used to scan direct to PDF and never wasted time w/ their management app, it is hard to believe in this day companies still screw us with proprietary constraints.
Acrobat option under ScanSnap search will work if any PDF was OCR'ed but you also can't tag/use keywords, nor find keywords added under Acrobat with SnapScan Organizer. Jeez.

Jason said...

Thanks for this method. I used it for a while until I reformatted my PC, then did more research and discovered that the free BeCyPDFMetaEdit does this as well, and that software is just a single EXE. I wrote a simple batch file that I drop into my PDF folders and then run before conversion. I found that ScanSnap accepts the files if you simply make the creator "PFU ScanSnap Organizer" - no model # needed:

for %%F in (*.pdf) do (
"D:\BeCyPDFMetaEdit.exe" "%%~dpnxF" -Creator "PFU ScanSnap Organizer"
)

Donald said...

Hi Jason,

Thanks for your comment.

I'm testing this BeCyPDFMetaEdit.exe, and admittedly, it is nicer as it has a nice GUI.

I successfully get my ScanSnap to accept it using the GUI, but for some reason, if I use the command line / batch mode, it doesn't quite do it. I have to re-open the resulting file with a GUI, and simply "Save" it again (as I note that the creator changes is made properly by the batch file), and then miraculously ScanSnap Organizer accept it.

I need a few more passes at this to see what's the difference.

Donald

Unknown said...

Confirmed that editing the "Content Creator" metadata to "PFU ScanSnap Organizer" allows ex-post-facto OCRing by the Searchable PDF Converter app that comes with a SnapScan. This is on a Mac; I used Automator to create a system service that does this with a single click. Easy and straightforward.

Unknown said...

On a Mac, the free app "PDF Attributes" can also be used to modify PDF metadata.

rbv said...

Another simple method which works is to scan in a single dummy page using the SnapScan. Open the created pdf file and use 'add pages' to append pages of the actual pdf file of interest to the end. Save the result, which can now be opened and converted into searchable PDF. Essentially, you are providing a first page which really did come from your scanner.

Donald said...

Thanks rbv, that is an interesting idea. What do you use to append pages to the pdf file?

rbv said...

Acrobat pro. But I think reader can also do it.

rbv said...

Sad to say, but I tried a few free pdf merging programs, and the ones I tried did not work. Neither does the internal ScanSnap viewer. But, for sure Adobe Acrobat Pro works fine using the 'pages from file' insert tool. Provided you open the true scan file first (one dummy page from your scanner), and then insert the other pages, you can then even immediately delete the 'dummy' page, and save the result. That file can then be converted to searchable pdf via ABBYY through ScanSnap Organizer.

rbv said...

Does someone understand why BeCyPDFMetaEdit.exe needs to run as admin?

Jim said...

If you only need to do this to the occasional PDF, you can avoid installing unknown executables by editing in NotePad++ or your favorite text editor:
1. Open PDF in editor
2. Search for "/Creator"
3. If found, replace that line with "/Creator (PFU ScanSnap Organizer)" (no quotes)
4. If not found, search for "/Author" string (no quotes)
5. Add new line immediately below: "/Creator (PFU ScanSnap Organizer)" (no quotes)
6. Save. That's it. Can now be opened and edited in ScanSnap Organizer.

tshoang1 said...

I used Ghostscript pdf2ps. Then edit the Creator tag. then used ps2pdf then re-OCR. works fine. All this can be scripted of course.