Thursday, February 28, 2013

How to hack Scansnap Organizer to use any PDF documents

To avoid clutter, I scan everything: bank statements, business cards, brochures, receipts, anything. Then I OCR them (convert the image into text), store them in the cloud, share them in all my computing devices, and, in my laptop, heavily index them so I can find anything I ever touch, or so the idea. (hmm... what was the phone number of that car rental company I got a brochure last trip to bali... just run search! done)

Thus, one of my pride and joy, (and nearly single most expensive device acquisition outside of a laptop or a smart phone) is my Fujitsu Scansnap S1500. This work horse scanner can scan double-sided up to 50 page per minute. And for someone who scan just about everything, this is a must. And at $500, the Scansnap is a necessary evil. (Don't try it unless you can afford to fall in love with it, its dangerously convenient).

Sadly, the ABBYY Finereader that comes with it (that OCR docs better than Adobe PDF Pro)  only works for documents I scan on the Scansnap. Same with the Scansnap Organizer and Viewer (that can rearrange PDF documents since my Adobe PDF Pro stops working). 

And since I convince a lot of people to send documents to me already in PDF from various scanners, camera, etc. This is annoying. So here is the hack that I found on the web (see http://www.jsilence.org/blog/2011/01/25/edit-pdf-metadata/ ) to allow me to trick MOST pdf documents to be usable.

The trick is to add/edit the "CREATOR" tag of the PDF document to say "ScanSnap Manager #S1500M".

Step 1. Get and install a free software called PDFTk.

The software installation should install the software in 

C:\Program Files (x86)\PDF Labs\PDFtk Server\bin

Step 2. Create a directory in Window. Mine is called FixPDF.

Step 3. In the above directory, Create a text file: scansnap_meta.txt containing:

InfoBegin
InfoKey: Creator
InfoValue: ScanSnap Manager #S1500M


This file contains the meta data of the PDF, and will be used by PDFtk to inject into the pdf file to be fixed.
 
Step 4. Then create also a script text file: FixPDF.bat containing:

for %%a in (*.pdf) do (
"C:\Program Files (x86)\PDF Labs\PDFtk Server\bin\pdftk.exe" "%%a" update_info scansnap_meta.txt output "..\%%a"
)

This script will run the PDFtk software on all '*.pdf ' files in the directory and replace the original copy as the output.

Step 5. Put all PDF files to fix in the directory.

Step 6. Run the FixPDF.bat script.

Done.

9 comments:

Drachen said...

Hi Donald.

Thanks for the scripts...however must be doing something wrong cause the "new pdf file" with the same name doesn't change the "Creator" :-(

I did run some tests to understand what I was doing wrong and the conclusion is: when asked to replace the file for one with the same name, the creator doesn't change...if I choose a different name, then the script works perfectly!

Do you change the name of the new pdf files?

Donald said...

Hi Drachen,

Sorry, I just saw your posting today.

The script is designed to create a new file in the parent directory. that is what the "../" supposed to do.

But I'm glad you figure out that it works by changing the name.

Regards

Donald

vitalyb said...

That works great! The best part, for me, is that it lets you edit ANY pdf in "ScanSnap Organizer Viewier" which is an awesome PDF editor.

I've made the following script that allows me now to open any PDF with the ScanSnap editor:


set source=%~1
set target=%~dp1%~n12%~x1

"c:\Program Files (x86)\PDFtk\bin\pdftk.exe" "%source%" update_info "c:\Program Files (x86)\PDFtk\bin\scansnap_meta.txt" output "%target%"
move /Y "%target%" "%source%"
start "" "C:\Program Files (x86)\PFU\ScanSnap\Organizer\SsView.exe" "%source%"


To use, save it as batch file and it as an optional "Open with" application for PDF files.

Donald said...

That is a great idea, vitalyb!

I'll try that.

lisa k said...

Hi everyone,

Has anyone got this to work with newer software and/or ix500? Changing the creator doesn't seem to be sufficient to open the file anymore. Opening a pdf file with notepad I see there is also a creatortool field as well with some scansnap value in it but that is not supported in pdftk.

Not a big deal to me for a few documents since I can always print a pdf and rescan it, just hate to waste the paper and time. Any ideas would be appreciated. I am new to scansnap so I had not tried this older software.

Die Fledermaus said...

Given the wide use of ix500 and Scansnap software, it is an aggravation, especially as even downloaded PDFs can't be modified due to this limitation, certainly tied to ABBYY licensing etc.
Having hundreds of PDFs scanned w/ old Neatscan, which I only used to scan direct to PDF and never wasted time w/ their management app, it is hard to believe in this day companies still screw us with proprietary constraints.
Acrobat option under ScanSnap search will work if any PDF was OCR'ed but you also can't tag/use keywords, nor find keywords added under Acrobat with SnapScan Organizer. Jeez.

Jason said...

Thanks for this method. I used it for a while until I reformatted my PC, then did more research and discovered that the free BeCyPDFMetaEdit does this as well, and that software is just a single EXE. I wrote a simple batch file that I drop into my PDF folders and then run before conversion. I found that ScanSnap accepts the files if you simply make the creator "PFU ScanSnap Organizer" - no model # needed:

for %%F in (*.pdf) do (
"D:\BeCyPDFMetaEdit.exe" "%%~dpnxF" -Creator "PFU ScanSnap Organizer"
)

Donald said...

Hi Jason,

Thanks for your comment.

I'm testing this BeCyPDFMetaEdit.exe, and admittedly, it is nicer as it has a nice GUI.

I successfully get my ScanSnap to accept it using the GUI, but for some reason, if I use the command line / batch mode, it doesn't quite do it. I have to re-open the resulting file with a GUI, and simply "Save" it again (as I note that the creator changes is made properly by the batch file), and then miraculously ScanSnap Organizer accept it.

I need a few more passes at this to see what's the difference.

Donald

Nicholas Darnton said...

Confirmed that editing the "Content Creator" metadata to "PFU ScanSnap Organizer" allows ex-post-facto OCRing by the Searchable PDF Converter app that comes with a SnapScan. This is on a Mac; I used Automator to create a system service that does this with a single click. Easy and straightforward.