How to Scrape Job Postings with Dynamic Filters Using FineData API
Step-by-step guide to extract job listings from career sites with dynamic filters using FineData's API and Playwright rendering.
How to Scrape Job Postings with Dynamic Filters Using FineData API
Job boards like Indeed, LinkedIn, and Glassdoor use dynamic filtering to load content via JavaScript. You can’t just fetch /jobs?q=software+engineer and expect to get all results. The real data comes from XHR requests triggered by filters: location, experience level, remote status, salary range.
This is not a new problem. But in 2026, even the most sophisticated scraping stacks struggle with it. Playwright helps. But when you’re building a production-grade job data pipeline, you need more than automation. You need reliability, anti-bot evasion, and structured extraction.
FineData’s API solves this with three key features: Playwright rendering, dynamic filter simulation via js_actions, and AI-powered structured extraction. This post shows how to use them together to scrape job postings with complex filters—without getting rate-limited or blocked.
The Problem: Dynamic Filters Break Traditional Scraping
You want to extract all remote software engineering roles at Google from Indeed.com.
The URL is: https://www.indeed.com/jobs?q=software+engineer&l=remote&jt=fulltime
But the page doesn’t load all jobs in the initial HTML. It makes a fetch request to https://www.indeed.com/jobs/api/collect with query parameters. The response is JSON. The DOM is empty until the request completes.
A naive requests.get() returns an empty result. Even BeautifulSoup fails. You need JavaScript rendering.
Now, try to automate the filter selection. Clicking “Remote” in the UI triggers a state change. The URL updates. The page re-renders. But the new URL doesn’t reflect the filter—only the query string.
This is where most pipelines break. You can’t scrape the results unless you simulate the entire user flow.
The Solution: Use js_actions + use_js_render + extract_schema
FineData’s POST /api/v1/async/scrape endpoint handles this natively.
Here’s the full working flow:
- Use
use_js_render=trueto enable Playwright rendering. - Use
js_actionsto simulate user clicks and form input. - Wait for the expected element (e.g.,
#job-listing-container) to appear. - Extract structured data using
extract_schemawith a JSON Schema. - Return only the relevant fields.
Step 1: Set Up the Request
import requests
url = "https://api.finedata.ai/api/v1/async/scrape"
headers = {
"Authorization": "Bearer fd_your_api_key",
"Content-Type": "application/json"
}
payload = {
"url": "https://www.indeed.com/jobs",
"method": "GET",
"use_js_render": True,
"js_wait_for": "selector:#job-listing-container",
"js_scroll": True,
"js_actions": [
{"type": "type", "selector": "input#text-input-what", "value": "software engineer"},
{"type": "click", "selector": "button#filter-location"},
{"type": "type", "selector": "input#text-input-where", "value": "remote"},
{"type": "click", "selector": "button#filter-remote"},
{"type": "wait", "ms": 3000},
{"type": "scroll", "direction": "down", "amount": 500}
],
"formats": ["markdown"],
"extract_schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"company": {"type": "string"},
"location": {"type": "string"},
"posted_date": {"type": "string"},
"salary": {"type": "string"},
"job_url": {"type": "string"},
"job_type": {"type": "string"}
},
"required": ["title", "company", "location"]
}
},
"timeout": 30,
"max_retries": 3,
"use_antibot": True,
"tls_profile": "chrome120",
"use_residential": True,
"solve_captcha": True,
"session_id": "job-scraper-2026-04-05",
"session_ttl": 1800
}
response = requests.post(url, json=payload, headers=headers)
job_response = response.json()
Step 2: Handle the Async Job
The response returns a job_id and status pending. You now poll:
job_id = job_response['job_id']
while True:
status_response = requests.get(
f"https://api.finedata.ai/api/v1/async/jobs/{job_id}",
headers={"Authorization": "Bearer fd_your_api_key"}
).json()
if status_response['status'] == 'completed':
result = status_response['result']
break
elif status_response['status'] == 'failed':
raise Exception(f"Job failed: {status_response['error']}")
time.sleep(2)
Step 3: Extract Structured Data
The result['data']['markdown'] contains the rendered HTML. But the real win is result['data']['json_extracted_data'].
[
{
"title": "Software Engineer I",
"company": "Google",
"location": "Remote",
"posted_date": "2026-03-15",
"salary": "$130k - $160k",
"job_url": "https://www.indeed.com/viewjob?jk=abc123",
"job_type": "Full-time"
},
...
]
This is production-ready data. No post-processing. No regex. No brittle selectors.
Why This Works (And Why It’s Better Than DIY)
1. Playwright Rendering Is Reliable
use_js_render=true runs a real Chrome instance. It waits for the #job-listing-container to appear. It scrolls. It handles XHR responses.
You don’t need to reverse-engineer the API. You don’t need to parse fetch calls.
2. js_actions Simulates Real User Flow
Clicking the “Remote” filter isn’t just a DOM change. It triggers a state update. The URL changes. The request fires.
js_actions handles this. It’s not just “click a button.” It’s a sequence of actions that mirror what a human would do.
This avoids the “ghost click” problem. Some sites only render after a full user gesture sequence.
3. AI-Powered Extraction Beats CSS Selectors
CSS selectors break when the site updates. A single class rename—.job-title → .job-card-title—breaks your entire pipeline.
extract_schema uses an LLM to analyze the page and return data matching your schema. It’s resilient.
For example, if the site uses <div class="job-card"> or <article data-role="job">, the AI adapts.
Honest opinion: I prefer
extract_schemaoverextract_rulesbecause it’s more maintainable. CSS selectors are fragile. LLMs are resilient.
Gotchas and Anti-Patterns
1. Don’t Use js_wait_for: "networkidle" on Job Boards
networkidle waits for 500ms with no new network activity. On Indeed, the job list might load in 200ms, then a second request for related jobs fires. You’ll get incomplete data.
Use selector:#job-listing-container instead. It’s more deterministic.
2. Avoid use_residential: true on Every Request
It costs +3 tokens. If you’re scraping 1000 job pages per day, that’s 3000 extra tokens per day. At $0.10 per 1000 tokens, that’s $0.30/day.
Use it only when you hit rate limits or blocks. Otherwise, stick with tls_profile: chrome120.
3. session_id Is Not a Silver Bullet
session_id keeps the same proxy IP across requests. Useful for sessions that require login.
But on Indeed, you can’t log in to scrape jobs. The site doesn’t require authentication for public listings.
So session_id adds cost without benefit. Use it only when you need to maintain a session state.
4. solve_captcha: true Isn’t Free
It costs +10 tokens. And not all sites have captchas. Use it only when captcha_detected: true.
Check the response for captcha_detected before enabling it.
Next Steps
1. Build a Job Board Monitor
Use POST /api/v1/async/batch to scrape multiple filters in parallel:
q=software+engineer&l=remoteq=product+manager&l=New+Yorkq=data+scientist&l=San+Francisco
Each job in the batch uses the same js_actions flow. You get all data in one webhook.
2. Automate Data Sync to Your CRM
Use the callback_url to send results to your pipeline. Then use FineData’s MCP server to connect directly to Cursor or Claude Desktop.
Your AI agent can now:
- Read new job postings.
- Extract key fields.
- Suggest outreach messages.
- Trigger outreach via your CRM.
3. Add Price Intelligence
Use the same flow to scrape salary data. Track trends over time. Compare Google vs. Meta vs. Amazon.
This is the foundation of a B2B lead generation tool. See our guide on B2B data enrichment.
Final Thoughts
Scraping job postings with dynamic filters is not just a technical challenge. It’s an anti-anti-bot battle.
You can’t win with requests + BeautifulSoup. Not in 2026.
You need:
- JavaScript rendering (Playwright)
- Action simulation (js_actions)
- Resilient extraction (AI + schema)
- Proxy rotation (residential)
- CAPTCHA handling (auto-solve)
FineData automates all of this. The API is the only one that combines Playwright, AI extraction, and proxy rotation in a single request.
It’s not about speed. It’s about reliability.
You don’t need to maintain a fleet of Chrome instances. You don’t need to manage Puppeteer sessions.
Just send the request. Get structured data.
And if you’re still using selenium or puppeteer for this? You’re wasting time.
Non-obvious claim: The best web scraping stack in 2026 isn’t a framework. It’s an API that abstracts the complexity of anti-bot systems, JavaScript rendering, and data extraction.
Use it. You’ll thank yourself when your pipeline doesn’t break after a site update.
And if you’re building a job board scraper, start here.
Related Articles
Free No-Code Web Scraper: Extract Data Without Writing Code
How to use no-code web scrapers to extract structured data from websites. Tools, workflows, and practical limitations for non-developers.
TutorialHow to Scrape Dynamic Job Listings with Authentication in 2026
Learn how to scrape job portals with login requirements using FineData API, including session handling and secure credential management.
TutorialWeb Scraper in Python: Build a Robust, Anti-Detection Tool with FineData API
Learn how to build a Python web scraper that bypasses anti-bot systems using FineData's API, with real code examples for Cloudflare, CAPTCHA, and JavaScript rendering.