Tutorial 9 min read

Web Scraping with Node.js: Puppeteer, Playwright, or API?

Compare Node.js web scraping approaches: Puppeteer, Playwright, and scraping APIs. Learn when to use each with practical code examples.

FineData Engineering · Editorial Policy

| February 9, 2026

Web Scraping with Node.js: Puppeteer, Playwright, or API?

Node.js is one of the most popular platforms for web scraping. Its async-first architecture makes it naturally suited for I/O-heavy tasks like fetching thousands of web pages. But the ecosystem offers several fundamentally different approaches, and choosing the wrong one for your use case can waste weeks of development time.

This guide compares the three main approaches: Puppeteer, Playwright, and scraping APIs. We’ll cover what each is good at, where each falls short, and show real code so you can make an informed decision.

Approach 1: Puppeteer

Puppeteer is Google’s official Node.js library for controlling Chrome. It launches a headless Chrome instance and gives you full programmatic control — navigate to pages, click buttons, fill forms, take screenshots, and extract content.

Basic Puppeteer Scraping

const puppeteer = require('puppeteer');

async function scrapeWithPuppeteer(url) {
  const browser = await puppeteer.launch({
    headless: 'new',
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });

  const page = await browser.newPage();

  // Set a realistic viewport and user agent
  await page.setViewport({ width: 1920, height: 1080 });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  );

  await page.goto(url, { waitUntil: 'networkidle2' });

  // Extract data from the rendered page
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.product-card')).map(card => ({
      title: card.querySelector('.title')?.textContent?.trim(),
      price: card.querySelector('.price')?.textContent?.trim(),
      url: card.querySelector('a')?.href,
    }));
  });

  await browser.close();
  return products;
}

Puppeteer Strengths

Full browser control — Click, type, scroll, screenshot, PDF generation
JavaScript rendering — Handles React, Vue, Angular sites natively
Google-backed — Stable, well-maintained, excellent Chrome integration
Rich ecosystem — Large community, lots of plugins and examples

Puppeteer Weaknesses

Resource hungry — Each Chrome instance uses 200-500MB RAM
Detectable — Sites can detect Puppeteer via navigator.webdriver, missing plugins, and browser fingerprint inconsistencies
Chrome only — No Firefox or Safari support
Scaling is painful — Running 50 concurrent browsers on a server requires careful memory management
No built-in anti-bot bypass — You need to add stealth plugins, proxy rotation, and CAPTCHA solving yourself

Approach 2: Playwright

Playwright is Microsoft’s answer to Puppeteer. It supports Chrome, Firefox, and Safari, has better auto-waiting, and includes features that Puppeteer lacks.

Basic Playwright Scraping

const { chromium } = require('playwright');

async function scrapeWithPlaywright(url) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext({
    viewport: { width: 1920, height: 1080 },
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  });

  const page = await context.newPage();

  await page.goto(url, { waitUntil: 'networkidle' });

  // Playwright has better selectors and auto-waiting
  const products = await page.$$eval('.product-card', cards =>
    cards.map(card => ({
      title: card.querySelector('.title')?.textContent?.trim(),
      price: card.querySelector('.price')?.textContent?.trim(),
      url: card.querySelector('a')?.href,
    }))
  );

  await browser.close();
  return products;
}

What Playwright Improves Over Puppeteer

Multi-browser support — Chromium, Firefox, and WebKit (Safari)
Better auto-waiting — Actions automatically wait for elements to be visible and stable
Browser contexts — Lightweight isolation without full browser overhead
Network interception — Easier API mocking and request modification
Trace viewer — Built-in debugging tool for recording and replaying tests

Where Playwright Still Struggles

The core limitations are the same as Puppeteer because they share the same architecture — running a real browser:

// The anti-bot problem persists with Playwright too
const { chromium } = require('playwright');

async function scrapeProtectedSite(url) {
  const browser = await chromium.launch({
    headless: true,
    // Even with these flags, sophisticated anti-bot systems
    // detect Playwright through:
    // - WebDriver flag in navigator
    // - Differences in browser plugin list
    // - Missing GPU/WebGL fingerprint details
    // - TLS fingerprint mismatches
  });

  const page = await browser.newPage();
  await page.goto(url);

  // Often results in a CAPTCHA or block page
  const content = await page.content();

  if (content.includes('captcha') || content.includes('blocked')) {
    console.log('Detected and blocked');
    // Now what? You need to add:
    // 1. Stealth plugins (playwright-extra)
    // 2. Proxy rotation service
    // 3. CAPTCHA solving service
    // 4. Custom fingerprint spoofing
  }

  await browser.close();
}

Community projects like playwright-extra and puppeteer-extra-plugin-stealth help, but they’re in an arms race with anti-bot systems and often lag behind.

Approach 3: Scraping API

A scraping API offloads the browser execution, anti-bot bypass, proxy rotation, and CAPTCHA solving to a managed service. You send a URL, you get back rendered HTML.

Basic API Scraping with Node.js

const axios = require('axios');
const cheerio = require('cheerio');

const FINEDATA_API_KEY = 'fd_your_api_key';

async function scrapeWithApi(url) {
  const response = await axios.post(
    'https://api.finedata.ai/api/v1/scrape',
    {
      url,
      use_js_render: true,
      tls_profile: 'chrome124',
      use_residential: true,
      timeout: 30
    },
    {
      headers: {
        'x-api-key': FINEDATA_API_KEY,
        'Content-Type': 'application/json'
      }
    }
  );

  const $ = cheerio.load(response.data.body);

  const products = [];
  $('.product-card').each((i, card) => {
    products.push({
      title: $(card).find('.title').text().trim(),
      price: $(card).find('.price').text().trim(),
      url: $(card).find('a').attr('href'),
    });
  });

  return products;
}

No browser to launch, no memory to manage, no stealth plugins to configure. The HTML comes back fully rendered with anti-bot protections handled.

Scaling with the API

Where the API approach really shines is concurrent scraping. With Puppeteer/Playwright, each concurrent page needs a browser tab (and the RAM to go with it). With an API, concurrency is just HTTP requests:

const axios = require('axios');
const cheerio = require('cheerio');
const pLimit = require('p-limit');

const FINEDATA_API_KEY = 'fd_your_api_key';
const limit = pLimit(10); // 10 concurrent requests

async function scrapeUrl(url) {
  const response = await axios.post(
    'https://api.finedata.ai/api/v1/scrape',
    {
      url,
      use_js_render: true,
      tls_profile: 'chrome124',
      timeout: 30
    },
    {
      headers: {
        'x-api-key': FINEDATA_API_KEY,
        'Content-Type': 'application/json'
      }
    }
  );

  return response.data.body;
}

async function scrapeMany(urls) {
  const results = await Promise.all(
    urls.map(url => limit(() => scrapeUrl(url)))
  );

  return results.map((html, i) => {
    const $ = cheerio.load(html);
    return {
      url: urls[i],
      title: $('h1').text().trim(),
      products: $('.product-card').length,
    };
  });
}

// Scrape 100 URLs with 10 concurrency — uses ~50MB RAM
const urls = Array.from({ length: 100 }, (_, i) =>
  `https://store.example.com/category?page=${i + 1}`
);

scrapeMany(urls).then(results => {
  console.log(`Scraped ${results.length} pages`);
});

Doing this with Puppeteer would require either a beefy server (10 Chrome instances at 300MB each = 3GB RAM for just the browsers) or a distributed system with message queues and worker pools.

Batch Scraping

For even larger workloads, FineData’s batch endpoint processes multiple URLs in a single request:

async function batchScrape(urls) {
  const response = await axios.post(
    'https://api.finedata.ai/api/v1/batch',
    {
      urls,
      use_js_render: true,
      use_residential: true
    },
    {
      headers: {
        'x-api-key': FINEDATA_API_KEY,
        'Content-Type': 'application/json'
      }
    }
  );

  const batchId = response.data.batch_id;

  // Poll for results
  while (true) {
    const status = await axios.get(
      `https://api.finedata.ai/api/v1/batch/${batchId}`,
      { headers: { 'x-api-key': FINEDATA_API_KEY } }
    );

    if (status.data.status === 'completed') {
      return status.data.results;
    }

    await new Promise(resolve => setTimeout(resolve, 5000));
  }
}

Head-to-Head Comparison

Factor	Puppeteer	Playwright	FineData API
Language	Node.js	Node.js, Python, Java, .NET	Any (REST API)
Browser support	Chrome only	Chrome, Firefox, Safari	N/A (server-side)
JS rendering	Yes	Yes	Yes
RAM per page	200-500MB	150-300MB	~0 (server-side)
Anti-bot bypass	Poor (detectable)	Poor (detectable)	Built-in
CAPTCHA solving	Manual integration	Manual integration	Built-in
Proxy support	Manual setup	Manual setup	Built-in
Max concurrency	5-20 (local)	5-30 (local)	100+
Setup time	15-30 min	10-20 min	2 min
Maintenance	High	Moderate	Low
Cost	Infrastructure	Infrastructure	Per-request
Page interaction	Full	Full	None
Screenshots/PDFs	Yes	Yes	No

When to Use Each

Choose Puppeteer when:

You need to interact with pages — fill forms, click buttons, navigate multi-step flows
You’re building end-to-end tests alongside scraping
You need screenshots or PDFs
You’re scraping a small number of unprotected sites
You want to stay within the Google/Chrome ecosystem

Choose Playwright when:

Everything above, plus:
You need cross-browser testing (Firefox, Safari)
You want better auto-waiting and debugging tools
You prefer the API over Puppeteer’s (Playwright’s is generally considered more ergonomic)

Choose a Scraping API when:

Sites have anti-bot protection (Cloudflare, CAPTCHAs, fingerprinting)
You need to scrape at scale (hundreds to millions of pages)
You don’t want to manage browser infrastructure
Reliability matters — your data pipeline can’t tolerate frequent failures
You’re working in a team and want to minimize ops burden

The Hybrid Pattern

Many production systems combine approaches:

async function smartScrape(url, options = {}) {
  // Use Playwright for sites that need interaction
  if (options.needsInteraction) {
    return scrapeWithPlaywright(url, options.steps);
  }

  // Use API for everything else (handles anti-bot, JS, scale)
  return scrapeWithApi(url);
}

Use Playwright for the 10% of cases that need browser interaction (login flows, form submissions, file downloads) and the API for the 90% that just need rendered HTML.

Performance Benchmarks

Here’s what scraping 100 product pages looks like across approaches (tested on a 4-core, 8GB RAM server):

Metric	Puppeteer	Playwright	FineData API
Total time	8-12 min	6-10 min	2-4 min
Peak RAM	3.2 GB	2.4 GB	120 MB
Success rate (protected)	35%	40%	92%
Success rate (unprotected)	95%	97%	99%
CPU usage	60-80%	50-70%	5-10%

The API approach is significantly faster and lighter because the browser execution happens on FineData’s infrastructure, and your Node.js process just handles HTTP requests and HTML parsing.

Key Takeaways

Puppeteer and Playwright are powerful tools for browser automation, but they’re resource-intensive, detectable by anti-bot systems, and hard to scale past a few dozen concurrent pages.
Scraping APIs trade per-request costs for zero infrastructure overhead, built-in anti-bot bypass, and effortless concurrency.
For page interaction (forms, clicks, navigation flows), use Puppeteer or Playwright. For data extraction at scale, use an API.
The hybrid approach combines the best of both: Playwright for interaction-heavy flows, API for everything else.
At scale, the API approach uses a fraction of the RAM and CPU, letting you run on a smaller (cheaper) server.

Ready to try the API approach? Check out our getting started guide or dive into our documentation. For Python-focused scraping, see our comparison of Requests + BeautifulSoup vs API.

#nodejs #puppeteer #playwright #javascript #comparison #tutorial

Tutorial

Web Scraping with Node.js: Puppeteer, Playwright, or API?

Web Scraping with Node.js: Puppeteer, Playwright, or API?

Approach 1: Puppeteer

Basic Puppeteer Scraping

Puppeteer Strengths

Puppeteer Weaknesses

Approach 2: Playwright

Basic Playwright Scraping

What Playwright Improves Over Puppeteer

Where Playwright Still Struggles

Approach 3: Scraping API

Basic API Scraping with Node.js

Scaling with the API

Batch Scraping

Head-to-Head Comparison

When to Use Each

Choose Puppeteer when:

Choose Playwright when:

Choose a Scraping API when:

The Hybrid Pattern

Performance Benchmarks

Key Takeaways

Related Articles

Python Web Scraping: Requests + BeautifulSoup vs Scraping API

How to Scrape JavaScript-Heavy Websites and SPAs

Building a Price Monitoring Tool: Step-by-Step Guide