Web
HTMLLoader
¶
A loader that loads HTML, optionally converting it to markdown or stripping tags
SitemapLoader
¶
A loader that loads URLs from a sitemap. Attributes: include: A list of strings or regular expressions. Only URLs that match one of these will be included. exclude: A list of strings or regular expressions. URLs that match one of these will be excluded. url_loader: The loader to use for loading the URLs. Examples: Load all URLs from a sitemap:
from raggy.loaders.web import SitemapLoader
loader = SitemapLoader(urls=["https://controlflow.ai/sitemap.xml"])
documents = await loader.load()
print(documents)