Skip to content

Collections

batched

Yield batches of items from an iterable.

If size_fn is not provided, then the batch size will be determined by the number of items in the batch.

If size_fn is provided, then it will be used to compute the batch size. Note that if a single item is larger than the batch size, it will be returned as a batch of its own.

Parameters:

Name Type Description Default
iterable Iterable[T]

The iterable to batch

required
size int

The size of the batch

required
size_fn Callable[[T], int] | None

A function to compute the size of an item in the iterable

None

Yields:

Type Description
tuple[T, ...]

A batch of items from the iterable

Example

Batch a list of strings by the number of characters:

from raggy.utilities.collections import batched

items = [
    "foo",
    "bar",
    "baz",
    "qux",
    "quux",
    "corge",
    "grault",
    "garply",
    "waldo",
    "fred",
    "plugh",
    "xyzzy",
    "thud",
]

batches = list(batched(items, size=10, size_fn=len))

assert batches == [
    ('foo', 'bar', 'baz'),
    ('qux', 'quux'),
    ('corge',),
    ('grault',),
    ('garply',),
    ('waldo', 'fred'),
    ('plugh', 'xyzzy'),
    ('thud',)
]

distinct

Yield distinct items from an iterable.

Parameters:

Name Type Description Default
iterable Iterable[T]

The iterable to filter

required
key Callable[[T], Any]

A function to compute a key for each item

lambda i: i

Yields:

Type Description
T

Distinct items from the iterable

Example

Dedupe a list of Pydantic models by a key:

from pydantic import BaseModel
from raggy.utilities.collections import distinct

class MyModel(BaseModel):
    id: int
    name: str

items = [
    MyModel(id=1, name="foo"),
    MyModel(id=2, name="bar"),
    MyModel(id=1, name="baz"),
]

deduped = list(distinct(items, key=lambda i: i.id))

assert deduped == [
    MyModel(id=1, name="foo"),
    MyModel(id=2, name="bar"),
]