Collections
batched
¶
Yield batches of items from an iterable.
If size_fn is not provided, then the batch size will be determined by the number of items in the batch.
If size_fn is provided, then it will be used to compute the batch size. Note that if a single item is larger than the batch size, it will be returned as a batch of its own.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable
|
Iterable[T]
|
The iterable to batch |
required |
size
|
int
|
The size of the batch |
required |
size_fn
|
Callable[[T], int] | None
|
A function to compute the size of an item in the iterable |
None
|
Yields:
Type | Description |
---|---|
tuple[T, ...]
|
A batch of items from the iterable |
Example
Batch a list of strings by the number of characters:
from raggy.utilities.collections import batched
items = [
"foo",
"bar",
"baz",
"qux",
"quux",
"corge",
"grault",
"garply",
"waldo",
"fred",
"plugh",
"xyzzy",
"thud",
]
batches = list(batched(items, size=10, size_fn=len))
assert batches == [
('foo', 'bar', 'baz'),
('qux', 'quux'),
('corge',),
('grault',),
('garply',),
('waldo', 'fred'),
('plugh', 'xyzzy'),
('thud',)
]
distinct
¶
Yield distinct items from an iterable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iterable
|
Iterable[T]
|
The iterable to filter |
required |
key
|
Callable[[T], Any]
|
A function to compute a key for each item |
lambda i: i
|
Yields:
Type | Description |
---|---|
T
|
Distinct items from the iterable |
Example
Dedupe a list of Pydantic models by a key:
from pydantic import BaseModel
from raggy.utilities.collections import distinct
class MyModel(BaseModel):
id: int
name: str
items = [
MyModel(id=1, name="foo"),
MyModel(id=2, name="bar"),
MyModel(id=1, name="baz"),
]
deduped = list(distinct(items, key=lambda i: i.id))
assert deduped == [
MyModel(id=1, name="foo"),
MyModel(id=2, name="bar"),
]