raggy.loaders.web
HTMLLoader
¶
A loader that loads HTML, optionally converting it to markdown or stripping tags
SitemapLoader
¶
A loader that loads URLs from a sitemap.
Attributes:
Name | Type | Description |
---|---|---|
include |
list[str | Pattern]
|
A list of strings or regular expressions. Only URLs that match one of these will be included. |
exclude |
list[str | Pattern]
|
A list of strings or regular expressions. URLs that match one of these will be excluded. |
url_loader |
URLLoader
|
The loader to use for loading the URLs. |
Examples:
Load all URLs from a sitemap:
from raggy.loaders.web import SitemapLoader
loader = SitemapLoader(urls=["https://askmarvin.ai/sitemap.xml"])
documents = await loader.load()
print(documents)
URLLoader
¶
Given a list of URLs, loads whatever it finds there.
Attributes:
Name | Type | Description |
---|---|---|
urls |
list[str]
|
The URLs to load from. |