Persistent Hashing and Persistent Dictionaries¶
This module contains functionality that allows hashing with keys that remain valid across interpreter invocations, unlike Python’s built-in hashes.
This module also provides a disk-backed dictionary that uses persistent hashing.
- exception pytools.persistent_dict.NoSuchEntryError[source]¶
Raised when an entry is not found in a
PersistentDict
.
- exception pytools.persistent_dict.NoSuchEntryCollisionError[source]¶
Raised when an entry is not found in a
PersistentDict
, but it contains an entry with the same hash key (hash collision).
- exception pytools.persistent_dict.ReadOnlyEntryError[source]¶
Raised when an attempt is made to overwrite an entry in a
WriteOncePersistentDict
.
- exception pytools.persistent_dict.CollisionWarning[source]¶
Warning raised when a collision is detected in a
PersistentDict
.
- class pytools.persistent_dict.Hash(*args, **kwargs)[source]¶
A protocol for the hashes from
hashlib
.
- class pytools.persistent_dict.KeyBuilder[source]¶
A (stateless) object that computes persistent hashes of objects fed to it. Subclassing this class permits customizing the computation of hash keys.
This class follows the same general rules as Python’s built-in hashing:
Only immutable objects can be hashed.
If two objects compare equal, they must hash to the same value.
Objects with the same hash may or may not compare equal.
In addition, hashes computed with
KeyBuilder
have the following properties:The hash is persistent across interpreter invocations.
The hash is the same across different Python versions and platforms.
The hash is invariant with respect to
PYTHONHASHSEED
.Hashes are computed using functionality from
hashlib
.
Key builders of this type are used by
PersistentDict
, but other uses are entirely allowable.- rec(key_hash: Hash, key: Any) Hash [source]¶
- Parameters:
key_hash – the hash object to be updated with the hash of key.
key – the (immutable) Python object to be hashed.
- Returns:
the updated key_hash
Changed in version 2021.2: Now returns the updated key_hash.
- static new_hash()¶
Return a new hash instance following the protocol of the ones from
hashlib
. This will permit switching to different hash algorithms in the future. Subclasses are expected to use this to create new hashes. Not doing so is deprecated and may stop working as early as 2022.Added in version 2021.2.
Note
Some key-building uses system byte order, so the built keys may not match across different systems. It would be desirable to fix this, but this is not yet done.
- class pytools.persistent_dict.PersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None)[source]¶
A concurrent disk-backed dictionary.
Note
This class intentionally does not store all values with a certain key, based on the assumption that key conflicts are highly unlikely, and if they occur, almost always due to a bug in the hash key generation code (
KeyBuilder
).- __init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None) None [source]¶
- Parameters:
identifier – a filename-compatible string identifying this dictionary
key_builder – a subclass of
KeyBuilder
container_dir – the directory in which to store this dictionary. If
None
, the default cache directory fromplatformdirs.user_cache_dir()
is usedenable_wal – enable write-ahead logging (WAL) mode. This mode is faster than the default rollback journal mode, but it is not compatible with network filesystems.
- store(key: K, value: V, _skip_if_present: bool = False) None [source]¶
Store (key, value) in the dictionary.
- class pytools.persistent_dict.WriteOncePersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None, in_mem_cache_size: int = 256)[source]¶
A concurrent disk-backed dictionary that disallows overwriting/ deletion (but allows removing all entries).
Compared with
PersistentDict
, this class has faster retrieval times because it uses an LRU cache to cache entries in memory.Note
This class intentionally does not store all values with a certain key, based on the assumption that key conflicts are highly unlikely, and if they occur, almost always due to a bug in the hash key generation code (
KeyBuilder
).- __init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None, in_mem_cache_size: int = 256) None [source]¶
- Parameters:
identifier – a filename-compatible string identifying this dictionary
key_builder – a subclass of
KeyBuilder
container_dir – the directory in which to store this dictionary. If
None
, the default cache directory fromplatformdirs.user_cache_dir()
is usedenable_wal – enable write-ahead logging (WAL) mode. This mode is faster than the default rollback journal mode, but it is not compatible with network filesystems.
in_mem_cache_size – retain an in-memory cache of up to in_mem_cache_size items (with an LRU replacement policy)
- clear_in_mem_cache() None [source]¶
Clear the in-memory cache of this dictionary.
Added in version 2023.1.1.
- store(key: K, value: V, _skip_if_present: bool = False) None [source]¶
Store (key, value) in the dictionary.
Internal stuff that is only here because the documentation tool wants it¶
- class pytools.persistent_dict.K¶
A type variable for the key type of a
PersistentDict
.
- class pytools.persistent_dict.V¶
A type variable for the value type of a
PersistentDict
.