Skip to content

Web

HTMLLoader

A loader that loads HTML, optionally converting it to markdown or stripping tags

SitemapLoader

A loader that loads URLs from a sitemap. Attributes: include: A list of strings or regular expressions. Only URLs that match one of these will be included. exclude: A list of strings or regular expressions. URLs that match one of these will be excluded. url_loader: The loader to use for loading the URLs. Examples: Load all URLs from a sitemap:

from raggy.loaders.web import SitemapLoader
loader = SitemapLoader(urls=["https://controlflow.ai/sitemap.xml"])
documents = await loader.load()
print(documents)

URLLoader

Given a list of URLs, loads whatever it finds there.

Attributes:

Name Type Description
urls list[str]

The URLs to load from.

response_to_document async

Convert an HTTP response to a Document.