Thursday, 3 July 2014

wkhtmltopdf - Generate pdf of password protected pages

Hi All,

In this post, I am going to explain about how to generate pdf images of password protected pages using wkhtmltopdf solution.

Wkhtmltopdf is a simple but powerful opensource solution for generating pdfs out of web page images. You can find more information about this solution here.

wkhtmltopdf uses headless browsers to launch the webpages and generate the pdf. As they are headless browsers, we cannot enter data in to their UI fields directly. Hence using wkhtmltopdf on password protected pages is a challenge.

Wkhtmltopdf asks us to handle this feature by using the command --cookie.jar (as stated in help command):

--cookie-jar <path>             Read and write cookies from and to the
                                      supplied cookie jar file

It asks the user to use above mentioned command for login operation, store the cookie information of login process in one of the jar file and use that jar file while accessing subsequent password protected pages.

Above task is not straight forward and it requires basi c understanding of how http request and response works. Whenever we perform login operation, browser sends the login information that we entered as a 'http post' parameter to the server. Server, in turn, validates the credentials and authenticates the user (if the credentials are valid) by sending the confirmation in cookie.

So, lets see how this process works with a demo site 'http://demo.testfire.net'. We will login to this website using wkhtmltopdf and we will generate the pdf of password protected pages with it.

At first, we need to understand what are all the post parameters concerned with the login operation of our website. We can use 'Tamper Data' plugin of Firefox to identify the post parameters.

Start the 'Tamper Data' and perform the login operation. You can find the parameters that are posted when performing login operation. Below screenshot shows the 'post' parameters of 'demo.testfire.net' captured using Tamper Data:


As you can observe, username and password are not the only fields, and we have additional post parameters as well (In another website, I had 4 parameters corresponding to login operation). Please note the values 'Post Parameter Name' and 'Post Parameter Value' of all the parameters.

Now we need to make a request to login page of our application with all the post parameters and should save the generated 'cookie' information in one jar. Then we need to make request for password protected pages with the cookie jar to generate pdf.

Lets see the commands of wkhtml to perform above operations:

wkhtmltopdf --cookie-jar my.jar --post uid jsmith --post passw demo1234 --post btnSubmit Login http://demo.testfire.net/bank/login.aspx demo.pdf

Above command, makes a request to login page and pass the post parameters with the '--post' command as mentioned. Now wkhtmltopdf will make a post request from login page and would save the cookie details in 'my.jar'.

Now we are done!!!

Hereafter, make every other request to password protected pages (as below) with the cookie jar and wkhtmltopdf would awesomely generate the pdf of the requested pages.

wkhtmltopdf --cookie-jar my.jar http://demo.testfire.net/bank/main.aspx demo1.pdf

Thank you!!!


2 comments:

  1. Interesting read, I am struck in a similar problem now.
    I have a dynamic webpage where a user login and make some selection, upon this the page will display some chart and tables. I am planning to have an export to pdf button which will help user to export the full content of html to pdf. I have wkhtmltopdf installed in serverside and I can pass the full html to wkhtmltopdf upon button click. Do you think this is feasible ?

    ReplyDelete
  2. this was really helpful, thank you!

    ReplyDelete