Browser automation with puppeteer

Posted by Serkan Özkan, https://www.vulniq.com founder.

I was trying to implement browser automation using puppeteer (https://github.com/GoogleChrome/puppeteer) which turned out to be a serious reliability issue. Here is how I finally reached a stable state:

The problem: 

First I created a docker image with a single js file using nodejs to run puppeteer. Nothing fancy, mostly based on example code available on the internet. You send a request to it with a url parameter and it generates a screenshot of the url using puppeteer.

Which is pretty straightforward and it runs, when in development mode. But as you process more urls the underlying chrome instance starts to become bloated and finally crashes taking down everything with it. On more than one occasion it even helpfully took down the docker daemon and other docker containers.

The solution: 

Restarting. Should not be a shocker for anyone with experience in IT.

1- In your nodejs code, start a timer which will exit the node process every 10 minutes:
setTimeout(function(){ isShuttingDown = true; setTimeout(function(){ try { puppeteerInstance.close(); //<--probably useless as we don't wait for it } catch (error) { console.log(error); } process.exit(1); }, 2000); }, 600000);

2- Run the docker container with --restart on-failure option. This will catch the process.exit(1) above and docker will restart the container automatically.
    docker run --restart on-failure ...

3- In your request handler(s) return 503 if isShuttingDown is true.
app.get('/', async function(request, response ) {
if(isShuttingDown) { response.setHeader("Retry-after", 2); response.status(503).send("Restarting"); return; }

4- In your code that calls the nodejs service, if you get a 503, wait for the service to restart.

Comments

Popular Posts