How New EC2 Instances Lead to Re-write PDF Tools

by Sebastien Mirolo on Mon, 11 May 2015

Amazon announced new T2 instances in July 2014. A little later it became clear that AWS free tier for StartUps was only for T2 instances. The free tier offer did not extend to the previous T1 generation (We found out on the first bill).

New and free, who could resist? Ensued an executive decision to move all of DjaoDjin infrastructure to second generation instances. Search and replace T1 by T2. How hard could that be, really?

It didn't take long before we stumble on the hard truth. While previous (T1) instances relied on PV virtualization, newer T2 instances relied on HVM virtualization. Unless you are a virtualization buff, that means nothing. Well not quite. There is a forced OS kernel update to make that happen.

Experience shows that Kernel updates ring with unexpected changes all over your software stack. Has anyone seen those beautiful lasagna stacks, separation of concerns, et al.? Yeah right.

Reality looks more like dependency hell.

So now we are trying to figure out which HVM compatible AMI to use. November 2014, Fedora 21 starts to ship release candidates as HVM AMIs, a few vendors pushed out experimental CentOS images. At that point there are no stable and official HVM AMI of a RedHat-compatible distribution.

We start deploying our web stack on a freshly provisioned Fedora 21 t2.micro.

$ yum install pdftk
Warning: No matches found for: pdftk
No matches found

Not found? Misspell? Typing the command again. Same result. Time for a little Googling. So PDFtk has been removed from Fedora 21 because it has a hard dependency on libgcj, a Java runtime library implementation, which itself has been removed due to the lack of a maintainer. There might also be some licensing issues related to iText.

Either way, spending time to figure out why a package that was perfectly working before is no longer available is best left for historians. Our main problem is: How do we fill a PDF template now?

There is not a lot of real answers. PDFtk seemed like the only viable alternative and it is gone now. Because our team is resourceful, we plunge into finding a good C++ PDF library and writing our own fillform tool. We settled on PoDoFo after gaining confidence reading the code in FormTest.

The code for the resulting PDF template filler is available on GiHub:

After development on a local OSX laptop, it remained to compile and run the podofo-flatform on the production systems. We hit another hic-hup then. Though podofo 0.9.3 has been released in July 2014 and is available through MacPort, it is version 0.9.1 that is currently packaged on Fedora 21.

Linking with PoDoFo 0.9.1 shows no error but the resulting podofo-flatform generates blank PDFs. Fields are not populated. Fortunately, compiling PoDoFo 0.9.3 from source did not pose extraordinary challenges.

$ tar zxvf ~/podofo-0.9.3.tar.gz
$ mkdir podofo-build
$ cd podofo-build
$ yum install cmake openssl-devel libidn-devel libjpeg-turbo-devel \
    libtiff-devel libpng-devel lua-devel freetype-devel fontconfig-devel \
$ cmake -G "Unix Makefiles" \
    -DCMAKE_INSTALL_PREFIX="/usr/local" \
$ make
$ make install

For reference, on OSX:

$ tar zxvf ~/Downloads/podofo-0.9.3.tar.gz
$ mkdir podofo-build
$ cd podofo-build
$ sudo port install fontconfig freetype jpeg tiff lua
$ cmake -G "Unix Makefiles" \
    -DCMAKE_PREFIX_PATH=/opt/local \
    -DCMAKE_INCLUDE_PATH=/opt/local/include \
    -DCMAKE_LIBRARY_PATH=/opt/local/lib \
    -DCMAKE_FRAMEWORK_PATH=/opt/local/Library/Frameworks \

Finally the last hurdle was to open the PDF templates we had with and save them. This last step somehow created a cleaner PDF file, one our podofo-fillform was able to generate flat PDFs from. It might just be that somewhere embed into the PDF Reference Manual there are some syntax we did not account for.

$ podofo-flatform --fill "City=San Francisco" --fill "Last Name=Smith" \
    --fill "First Name=Joe" template-form.pdf -

The podofo-flatform is available and used in djaodjin-extended-templates, a Django App we use in production to generate PDF invoices and HTML emails. By the time we had a working PDF fillform utility on a T2 instance, Django version 1.8 was released. Long in the works, Django decided to break API compatibility for the template engine on version 1.8. So yes, djaodjin-extended-templates only works with Django 1.7. Upgrading to Django 1.8 is bound to be another story...

More to read

If you are fascinated by the subtle interaction between business and technical decisions and the disproportionate outcome it often leads to, you might be interested in How we setup pylint on a git pre-receive hook or Software-as-a-Service lighting talk.

More technical posts are also available on the DjaoDjin blog, as well as business lessons we learned running a subscription hosting platform.

by Sebastien Mirolo on Mon, 11 May 2015

Bring fully-featured SaaS products to production faster.

Follow us on