Tutorial 6 min read

How to Scrape Job Postings with Dynamic Filters Using FineData API

Step-by-step guide to extract job listings from career sites with dynamic filters using FineData's API and Playwright rendering.

FT
FineData Team
|

How to Scrape Job Postings with Dynamic Filters Using FineData API

Job boards like Indeed, LinkedIn, and Glassdoor use dynamic filtering to load content via JavaScript. You can’t just fetch /jobs?q=software+engineer and expect to get all results. The real data comes from XHR requests triggered by filters: location, experience level, remote status, salary range.

This is not a new problem. But in 2026, even the most sophisticated scraping stacks struggle with it. Playwright helps. But when you’re building a production-grade job data pipeline, you need more than automation. You need reliability, anti-bot evasion, and structured extraction.

FineData’s API solves this with three key features: Playwright rendering, dynamic filter simulation via js_actions, and AI-powered structured extraction. This post shows how to use them together to scrape job postings with complex filters—without getting rate-limited or blocked.


The Problem: Dynamic Filters Break Traditional Scraping

You want to extract all remote software engineering roles at Google from Indeed.com.

The URL is: https://www.indeed.com/jobs?q=software+engineer&l=remote&jt=fulltime

But the page doesn’t load all jobs in the initial HTML. It makes a fetch request to https://www.indeed.com/jobs/api/collect with query parameters. The response is JSON. The DOM is empty until the request completes.

A naive requests.get() returns an empty result. Even BeautifulSoup fails. You need JavaScript rendering.

Now, try to automate the filter selection. Clicking “Remote” in the UI triggers a state change. The URL updates. The page re-renders. But the new URL doesn’t reflect the filter—only the query string.

This is where most pipelines break. You can’t scrape the results unless you simulate the entire user flow.


The Solution: Use js_actions + use_js_render + extract_schema

FineData’s POST /api/v1/async/scrape endpoint handles this natively.

Here’s the full working flow:

  1. Use use_js_render=true to enable Playwright rendering.
  2. Use js_actions to simulate user clicks and form input.
  3. Wait for the expected element (e.g., #job-listing-container) to appear.
  4. Extract structured data using extract_schema with a JSON Schema.
  5. Return only the relevant fields.

Step 1: Set Up the Request

import requests

url = "https://api.finedata.ai/api/v1/async/scrape"
headers = {
    "Authorization": "Bearer fd_your_api_key",
    "Content-Type": "application/json"
}

payload = {
    "url": "https://www.indeed.com/jobs",
    "method": "GET",
    "use_js_render": True,
    "js_wait_for": "selector:#job-listing-container",
    "js_scroll": True,
    "js_actions": [
        {"type": "type", "selector": "input#text-input-what", "value": "software engineer"},
        {"type": "click", "selector": "button#filter-location"},
        {"type": "type", "selector": "input#text-input-where", "value": "remote"},
        {"type": "click", "selector": "button#filter-remote"},
        {"type": "wait", "ms": 3000},
        {"type": "scroll", "direction": "down", "amount": 500}
    ],
    "formats": ["markdown"],
    "extract_schema": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "company": {"type": "string"},
                "location": {"type": "string"},
                "posted_date": {"type": "string"},
                "salary": {"type": "string"},
                "job_url": {"type": "string"},
                "job_type": {"type": "string"}
            },
            "required": ["title", "company", "location"]
        }
    },
    "timeout": 30,
    "max_retries": 3,
    "use_antibot": True,
    "tls_profile": "chrome120",
    "use_residential": True,
    "solve_captcha": True,
    "session_id": "job-scraper-2026-04-05",
    "session_ttl": 1800
}

response = requests.post(url, json=payload, headers=headers)
job_response = response.json()

Step 2: Handle the Async Job

The response returns a job_id and status pending. You now poll:

job_id = job_response['job_id']

while True:
    status_response = requests.get(
        f"https://api.finedata.ai/api/v1/async/jobs/{job_id}",
        headers={"Authorization": "Bearer fd_your_api_key"}
    ).json()

    if status_response['status'] == 'completed':
        result = status_response['result']
        break
    elif status_response['status'] == 'failed':
        raise Exception(f"Job failed: {status_response['error']}")

    time.sleep(2)

Step 3: Extract Structured Data

The result['data']['markdown'] contains the rendered HTML. But the real win is result['data']['json_extracted_data'].

[
  {
    "title": "Software Engineer I",
    "company": "Google",
    "location": "Remote",
    "posted_date": "2026-03-15",
    "salary": "$130k - $160k",
    "job_url": "https://www.indeed.com/viewjob?jk=abc123",
    "job_type": "Full-time"
  },
  ...
]

This is production-ready data. No post-processing. No regex. No brittle selectors.


Why This Works (And Why It’s Better Than DIY)

1. Playwright Rendering Is Reliable

use_js_render=true runs a real Chrome instance. It waits for the #job-listing-container to appear. It scrolls. It handles XHR responses.

You don’t need to reverse-engineer the API. You don’t need to parse fetch calls.

2. js_actions Simulates Real User Flow

Clicking the “Remote” filter isn’t just a DOM change. It triggers a state update. The URL changes. The request fires.

js_actions handles this. It’s not just “click a button.” It’s a sequence of actions that mirror what a human would do.

This avoids the “ghost click” problem. Some sites only render after a full user gesture sequence.

3. AI-Powered Extraction Beats CSS Selectors

CSS selectors break when the site updates. A single class rename—.job-title.job-card-title—breaks your entire pipeline.

extract_schema uses an LLM to analyze the page and return data matching your schema. It’s resilient.

For example, if the site uses <div class="job-card"> or <article data-role="job">, the AI adapts.

Honest opinion: I prefer extract_schema over extract_rules because it’s more maintainable. CSS selectors are fragile. LLMs are resilient.


Gotchas and Anti-Patterns

1. Don’t Use js_wait_for: "networkidle" on Job Boards

networkidle waits for 500ms with no new network activity. On Indeed, the job list might load in 200ms, then a second request for related jobs fires. You’ll get incomplete data.

Use selector:#job-listing-container instead. It’s more deterministic.

2. Avoid use_residential: true on Every Request

It costs +3 tokens. If you’re scraping 1000 job pages per day, that’s 3000 extra tokens per day. At $0.10 per 1000 tokens, that’s $0.30/day.

Use it only when you hit rate limits or blocks. Otherwise, stick with tls_profile: chrome120.

3. session_id Is Not a Silver Bullet

session_id keeps the same proxy IP across requests. Useful for sessions that require login.

But on Indeed, you can’t log in to scrape jobs. The site doesn’t require authentication for public listings.

So session_id adds cost without benefit. Use it only when you need to maintain a session state.

4. solve_captcha: true Isn’t Free

It costs +10 tokens. And not all sites have captchas. Use it only when captcha_detected: true.

Check the response for captcha_detected before enabling it.


Next Steps

1. Build a Job Board Monitor

Use POST /api/v1/async/batch to scrape multiple filters in parallel:

  • q=software+engineer&l=remote
  • q=product+manager&l=New+York
  • q=data+scientist&l=San+Francisco

Each job in the batch uses the same js_actions flow. You get all data in one webhook.

2. Automate Data Sync to Your CRM

Use the callback_url to send results to your pipeline. Then use FineData’s MCP server to connect directly to Cursor or Claude Desktop.

Your AI agent can now:

  • Read new job postings.
  • Extract key fields.
  • Suggest outreach messages.
  • Trigger outreach via your CRM.

3. Add Price Intelligence

Use the same flow to scrape salary data. Track trends over time. Compare Google vs. Meta vs. Amazon.

This is the foundation of a B2B lead generation tool. See our guide on B2B data enrichment.


Final Thoughts

Scraping job postings with dynamic filters is not just a technical challenge. It’s an anti-anti-bot battle.

You can’t win with requests + BeautifulSoup. Not in 2026.

You need:

  • JavaScript rendering (Playwright)
  • Action simulation (js_actions)
  • Resilient extraction (AI + schema)
  • Proxy rotation (residential)
  • CAPTCHA handling (auto-solve)

FineData automates all of this. The API is the only one that combines Playwright, AI extraction, and proxy rotation in a single request.

It’s not about speed. It’s about reliability.

You don’t need to maintain a fleet of Chrome instances. You don’t need to manage Puppeteer sessions.

Just send the request. Get structured data.

And if you’re still using selenium or puppeteer for this? You’re wasting time.

Non-obvious claim: The best web scraping stack in 2026 isn’t a framework. It’s an API that abstracts the complexity of anti-bot systems, JavaScript rendering, and data extraction.

Use it. You’ll thank yourself when your pipeline doesn’t break after a site update.

And if you’re building a job board scraper, start here.

#dynamic filtering #job board scraping #playwright automation #job data extraction #web scraping API

Related Articles