CompatibilityOnly available on Node.js.
CheerioWebBaseLoader instead.
Setup
npm install @langchain/community @langchain/core playwright
Usage
import { PlaywrightWebBaseLoader } from "@langchain/community/document_loaders/web/playwright";
/**
 * Loader uses `page.content()`
 * as default evaluate function
 **/
const loader = new PlaywrightWebBaseLoader("https://www.tabnews.com.br/");
const docs = await loader.load();
Options
Here’s an explanation of the parameters you can pass to the PlaywrightWebBaseLoader constructor using the PlaywrightWebBaseLoaderOptions interface:
type PlaywrightWebBaseLoaderOptions = {
  launchOptions?: LaunchOptions;
  gotoOptions?: PlaywrightGotoOptions;
  evaluate?: PlaywrightEvaluate;
};
- 
launchOptions: an optional object that specifies additional options to pass to the playwright.chromium.launch() method. This can include options such as the headless flag to launch the browser in headless mode.
- 
gotoOptions: an optional object that specifies additional options to pass to the page.goto() method. This can include options such as the timeout option to specify the maximum navigation time in milliseconds, or the waitUntil option to specify when to consider the navigation as successful.
- 
evaluate: an optional function that can be used to evaluate JavaScript code on the page using a custom evaluation function. This can be useful for extracting data from the page, interacting with page elements, or handling specific HTTP responses. The function should return a Promise that resolves to a string containing the result of the evaluation.
By passing these options to thePlaywrightWebBaseLoader constructor, you can customize the behavior of the loader and use Playwright’s powerful features to scrape and interact with web pages.
Here is a basic example to do it:
import {
  PlaywrightWebBaseLoader,
  Page,
  Browser,
} from "@langchain/community/document_loaders/web/playwright";
const url = "https://www.tabnews.com.br/";
const loader = new PlaywrightWebBaseLoader(url);
const docs = await loader.load();
// raw HTML page content
const extractedContents = docs[0].pageContent;
import {
  PlaywrightWebBaseLoader,
  Page,
  Browser,
} from "@langchain/community/document_loaders/web/playwright";
const loader = new PlaywrightWebBaseLoader("https://www.tabnews.com.br/", {
  launchOptions: {
    headless: true,
  },
  gotoOptions: {
    waitUntil: "domcontentloaded",
  },
  /** Pass custom evaluate, in this case you get page and browser instances */
  async evaluate(page: Page, browser: Browser, response: Response | null) {
    await page.waitForResponse("https://www.tabnews.com.br/va/view");
    const result = await page.evaluate(() => document.body.innerHTML);
    return result;
  },
});
const docs = await loader.load();