Crafting Gorgeous PDFs using Chromium and Playwright
Creating visually appealing and maintainable PDFs on the backend has always been a challenge. Traditional tools lag in support for modern HTML and CSS, leaving us with archaic table‑based designs. This guide explores using Chromium and Playwright in a Spring Boot application to produce great‑looking documents with modern HTML/CSS.
Setting Up Chromium with Playwright
Installing Chromium on the server isn’t trivial. A browser isn’t a simple dependency you can easily include in your project. Chromium, which we’ll be using, is a fully fledged browser and has to be downloaded as such. Additionally, it requires a compatible version of the Chromium driver to interface with the browser. This setup becomes even more complex in a development team using different operating systems like Windows, macOS, and Linux, as each environment demands a different set of binaries. Beyond that, we need middleware to execute commands on the browser from our code. That’s a lot to prepare and maintain…
This is where Playwright comes into play. Playwright is mainly known for e2e UI testing, but we’ll be using it for different features. Playwright can detect the system it’s running on and download and set up Chromium for us (even at runtime!). This saves a lot of time when you’re starting and makes things easier later on.
There are other tools like Playwright, namely Cypress or Selenium, but Playwright ships with a Java library—that means it also covers us on the middleware front.
Let’s Build It!
First, add the Playwright dependency to your project:
implementation("com.microsoft.playwright:playwright:1.40.0")
With Playwright, we can interact with web browsers easily. Our code stays simple because Playwright handles the complicated parts, like setting up the environment and talking to the browser. Here’s what it looks like:
val pdf = Playwright.create().use { playwright -> // ①
val browser = playwright.chromium().launch() // ②
val page = browser.newPage().also { it.setContent(html) } // ③
page.pdf() // ④
}
Each line does something important:
① Launches new Playwright driver process
② Returns the browser instance
③ Opens a new page and injects our HTML into it
④ Makes a PDF
Simple, right? But wait, there’s more to consider… let’s check the performance of our implementation 👀
Step | First run | Warmed up |
---|---|---|
① | 2196 ms | 270 ms |
② | 585 ms | 252 ms |
③ | 439 ms | 151 ms |
④ | 58 ms | 24 ms |
While testing this code, we can see those four lines take ~700 ms to execute, and even ~3 s on the first run. That’s quite a lot! Looking again at what each line is doing, we’ll notice that for each request we’re starting a new Playwright driver and a new browser (① + ②). That can’t be good for performance. The natural solution would be to start the browser once and keep it in state—problem solved! Although the idea might be decent, we have to keep in mind that Playwright isn’t designed for this kind of work. The creators even warn us about this in their documentation.
No, Playwright is not thread safe, i.e. all its methods as well as methods on all objects created by it (such as BrowserContext, Browser, Page etc.) are expected to be called on the same thread where Playwright object was created or proper synchronization should be implemented to ensure only one thread calls Playwright methods at any given time. Having said that it's okay to create multiple Playwright instances each on its own thread.
The last part is a hint we can use. We can create a pool of threads, each with its own Playwright instance. For each request, we use a worker from the pool and delegate the task to it.
Optimization
Thread Pool
Luckily in Java, we have ExecutorService
that can easily make the pool for us. Since I’m using Kotlin, I also convert the pool to a dispatcher to use in
coroutine contexts.
val dispatcher = Executors.newFixedThreadPool(poolSize).asCoroutineDispatcher()
Using coroutines here makes a lot of sense since there’s quite a big chunk of work outside the JVM—on the browser itself— and our threads will spend some time waiting for the browser to process data.
val playwright = ThreadLocal.withInitial {
Playwright.create().chromium().launch().newPage() // ① + ②
}
We move steps ① and ② to the thread initialization, so they don’t slow down each request. It’s a huge win—after measurements the times look like this:
Step | First run | Warmed up |
---|---|---|
① + ② | - | - |
③ | 71 ms | 3 ms |
④ | 38 ms | 7 ms |
That’s up to 70× faster on a warmed-up machine and 30× faster on a cold one. We’ve moved the heavy lifting to the app’s start-up phase. The code is a bit more complex, but the performance gain is huge.
Baked-in Image
We can also speed things up by including the Playwright binaries in our Docker image, so we don’t have to download them every time.
To achieve this, we set an environment variable (PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=true
) instructing Playwright not to download binaries at runtime,
and we make certain these files are included in our application’s Docker image.
We take the binaries from Playwright’s official Docker image to ensure compatibility.
FROM mcr.microsoft.com/playwright:v1.40.0-jammy as build
# Your image build phase
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=true
COPY --from=build /ms-playwright /ms-playwright
Make sure the Docker image version matches the Playwright version in your project. Update both together for compatibility.
# build.gradle
implementation("com.microsoft.playwright:playwright:1.40.0")
☝️ this
# Dockerfile 👇 and that
FROM mcr.microsoft.com/playwright:v1.40.0-jammy as build
Summary
In this guide, we’ve walked through using Playwright to generate PDFs from HTML. We enhanced performance by implementing a thread pool and incorporating Chromium binaries into our Docker image.
Check out the full working project on my repository: gitlab.com/garstecki/pdf. You can test it using the demo.html file, which has templates that IntelliJ can understand and run.