Skip to content

raggy.loaders.web

HTMLLoader

A loader that loads HTML, optionally converting it to markdown or stripping tags

SitemapLoader

A loader that loads URLs from a sitemap.

Attributes:

Name Type Description
include list[str | Pattern]

A list of strings or regular expressions. Only URLs that match one of these will be included.

exclude list[str | Pattern]

A list of strings or regular expressions. URLs that match one of these will be excluded.

url_loader URLLoader

The loader to use for loading the URLs.

Examples:

Load all URLs from a sitemap:

from raggy.loaders.web import SitemapLoader
loader = SitemapLoader(urls=["https://askmarvin.ai/sitemap.xml"])
documents = await loader.load()
print(documents)

URLLoader

Given a list of URLs, loads whatever it finds there.

Attributes:

Name Type Description
urls list[str]

The URLs to load from.