Convert HTML to PDF

As a background task for our web application BeeBole, I was looking for an easy and efficient way to produce PDF documents without being stuck with postscript like syntax or being feature limited in a design point of view.

Last week I discovered the tool I was looking for : WKHTMLTOPDF

By leveraging the power of the webkit engine through QtWebKit module, this thing is converting HTML with full CSS support to PDF the same way you “Save as PDF” from your browser ;)

In this article, I’ll show you the very first prototype I did of a possible WKHTMLTOPDF integration with our application.

Let’s start by installing this nifty tool the easy way (tested on Ubuntu Hardy and Jaunty 64 bit):

wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.8.3-static.tar.bz2
tar -jxvf wkhtmltopdf-0.8.3-static.tar.bz2
sudo aptitude install ia32-libs

Next, you’ll have to make a symbolic link pointing to WKHTMLTOPDF in /usr/local/bin.

sudo ln -s /full_path/WKHTMLTOPDF /usr/local/bin/WKHTMLTOPDF

Et voilà, we are ready to go!

Here is a way to post an arbitrary HTML portion of our current document to the server and have it converted as a PDF on the fly. We will use the following workflow:

  1. Post the innerHTML to the server
  2. Write a temporary HTML file with the posted data (file:write_file)
  3. Convert this file to PDF using the command line WKHTMLTOPDF (os:cmd)
  4. Read and store the PDF file content to a variable (file:read_file)
  5. Delete the temporary HTML and PDF files (file:delete)
  6. Send back the final document to the client using the ‘application/pdf’ content type header

We will be using Mochiweb as our web server (You can follow our tutorial about Mochiweb if you feel lost).

Just create priv/www/index.html and copy the following HTML in it :

http://friendpaste.com/7Kvh0SWRMyCgTMMEOy4x2B

and here is the Erlang server code which will handle the post request (myapp_web.erl):

http://friendpaste.com/538mWzeOfBKxiE2uScDjT4

No magic trick! Regarding the Javascript, I’m just filling an input text value with the innerHTML source and submit the form to the server.

The Erlang code is really straight forward. We defined a new case clause under the ‘POST’ part named ‘pdf’.

In order to generate the temporary file names, I’m simply concatenating the current date time. The important part here is to set the response’s content type to ‘application/pdf’.

Now, we can test our page by connecting to http://127.0.0.1:8000/pdf and clicking the ‘>>>PDF’ button (tested under Firefox).

I hope this was useful for you to start toying with that tool which is certainly going to replace their licensed versions counterpart ;)

21 thoughts on “Convert HTML to PDF

  1. Pingback: Convert HTML to PDF with Full CSS Support, an OpenSource Alternative Based on Webkit | BeeBole « Netcrema - creme de la social news via digg + delicious + stumpleupon + reddit

  2. I’m really grateful to you guys for pointing this out – it will be so useful to me in my current project.

  3. Pingback: Links creativos para el 06.08 | Eliseos.net

  4. Pingback: links for 2009-08-07 | synapsenschnappsen

  5. I doscovered one extremely nice aspect of this tool is that it will execute jQuery *before* printing – so if you have any table striping etc. it will still work.

    Probably the best HTML to PDF solution I have found.

  6. Pingback: 27 fresh links for my friends, as always webdesign and tech related. « Adrian Zyzik’s Weblog

  7. Pingback: The Abarentos Narrative » links for 2009-08-07

  8. Pingback: Web Design South Africa » Blog Archive » Convert HTML to PDF with Full CSS Support, an OpenSource …

  9. “wkhtmltopdf” is definitely a recommended tool for converting html to pdf. I tried it and it has lotsa useful features.

    But too bad our server runs on Ubuntu LTS 6.06, and they are using slightly older version required by wkhtmltopdf.

  10. Pingback: Can you create a great professional website using free open source software? | Open Source Project Management Software

  11. Pingback: Dvd 2 IPod – Convert DVDs To IPod Format. | Used Review

  12. Pingback: Destillat KW33-2009 | duetsch.info - GNU/Linux, Open Source, Softwareentwicklung, Selbstmanagement, Vim ...

  13. Pingback: Cool articles – SEO, blogging, internet marketing(august17th-august31st 2009) « Stefanm, my link collection

  14. I ran it and so far so good, except one thing.
    the pdf output doesnt display letters or words, only boxes.
    My server doesnt have x windows installed either.
    Is X windows required in order to supply the fonts?
    Thanks

  15. Pingback: Html+Css to Pdf by mebae - Pearltrees

  16. Pingback: pdf by fogus - Pearltrees