I have been asked to do kind of a special thing the other day:

Stéphane, do you know if we can print a webpage in PDF with powershell, so we can archive the look and feel of our web application?

Well, right now, I don’t know how to print a webpage in PDF with powershell. But give a few minutes, and I’ll code you something.

In other words, I was asked to export webpages to pdf using powershell. Doens’t sound that difficult doesn’t it?

Like every project, I always start by googling. Very often we can find a script to start with, or at least something that will guide us in the right direction to go.

I noticed very quickly that to have the same output as the webpage, we needed to print it directly from Internet Explorer. Also, to export the page as showed in the browser, PDFCreator seemed like the best choice to me. With these basic ideas in mind, I started my google fu.

The first resultats I got was a Script on technet, that would modify some specefic registry values of the PDFCreator settings, and then print a document out. (available here)

I actually never got that method to work. I tried defining a Print Profile and specefiying that one for my print job, but that never worked out well me (Especially since PDFForge made a complete change in the latest version of their product, and shifted quite a few things in their registry hyve. After a while, I gave it up, and searched through the documention of PDFCreator. Not much to get me started there either. BUT I did found a folder with script ‘samples’ though!

As usual, there are no PowerShell examples, BUT, I found 4 vbs script samples, which were perfect to get me start.

The main thing that the VBS scripts were doing is creating an instance of the PDFCreator.JobQueue comobject. This was a great start, and it led my way to the final function that you can find at the end of this post.

Main points I would like to highlight about this function:

Some important information:

Tested with:

  • PDFCreator version 2.3.2 together with –>
    • PowerShell version 3.0
    • PowerShell version 4.0
    • PowerShell version 5.0

We also tried this function on Windows Server 2008 R2 (which had PowerShell V2). For some reason, we couldn’t fully install the PDFCreator. The installer needs to have access to the internet to download some additional files, which it couldn’t do since it was blocked by our Proxy.

I then had the error message: “Could not find the method ‘Initialize’” when I launched the function. The com instance simply was missing a method.

So it is still unclear why it didn’t work on the Windows 2008 server; either because PowerShell version 3.0 is the minimum required version needed to make it work, or because of the additional files that could not be downloaded during the install. I asked on the PDFCreator forum right here. I’ll update this post as soon as I have an answer.

How to automate pdfcreator to be used to print a webpage in PDF with powershell

To get details on how you can tweak the PDFCreator settings your self, I recommend you read the following article I read which covers the basics that you need to know to automate PDFCreator using Windows PowerShell.

If all you need to do, is print pdf’s directly out, check out the bottom of this page. You will find a complete script that does exactly that!

Some important points to keep in mind while scripting pdfcreator and powershell (To print a webpage in PDF with powershell )

In the Script from Technet, the author (Jishu Sengtupa) added quite a few Start-sleeps at some critical points of the scripts. In the begining I used the same method, Surprisingly, the script would be very unstable. It would work sometimes, but some other times simply not. I had various errors depending on which website I was trying to print out, such as the following ones:

 Trying to revoke a drop target that has not been registered (Exception from HRESULT: 0x80040100 (DRAGDROP_E_NOTREGISTERED))

You may ask your self “Yeah, what is that to suppose to mean?” I agree! Well, it simply means that the webpage was not loaded yet, and that I could not yet send the page to the printer.

The ‘start-sleep’ was originally a good idea, but some web pages need more time to load, so for a light web page, 2 seconds might be enough. For a more heavy one, you might needed 4 or even 6 seconds.

To fix this random behaviour, I used a simple while loop and checked for the ‘readyState’ property of the Internet Explorer com instance. Occording to the documentation, the value 4 would indicate that the page is finished loading.

The ready state can have 5 different values which are detailed on msdn here. But the most common ones are  the following 3 ones that I saw during my initial tests:

1 -> which stands for Loading
2-> which stands for Loaded
4-> which stands for complete

The Print jobs also needs a bit more time then to arrive concretly in the PDFcreator object. Before that you cannot launch a print job, otherwise you might pump into an error like this:

 

In the beggining, a random -start-sleep of sometimes 5, or 8 seconds did the trick, but, that was to random for me. Again, thw while loop made my day. I simply checked if the pdfcreator com object had more then one job available in his count property. When it was the case, the script could continue.

The final function is available on Github here: please, don’t hesitate to comment 🙂

.gist table { margin-bottom: 0; }

Read more:

wkhtmltopdf –> I haven’t tried this one, but according to it’s documentation, it seemed pretty straight forward. (If you have a plog post to share on this topic, please share it with us in the comments 🙂

Internet explorer Com Object -> the MSDN internet explorer com object help link