Persistent Hashing and Persistent Dictionaries#
This module contains functionality that allows hashing with keys that remain valid across interpreter invocations, unlike Pythonโs built-in hashes.
This module also provides a disk-backed dictionary that uses persistent hashing.
- exception pytools.persistent_dict.NoSuchEntryError[source]#
Raised when an entry is not found in a
PersistentDict
.
- exception pytools.persistent_dict.NoSuchEntryInvalidKeyError[source]#
Raised when an entry is not found in a
PersistentDict
due to an invalid key file.
- exception pytools.persistent_dict.NoSuchEntryInvalidContentsError[source]#
Raised when an entry is not found in a
PersistentDict
due to an invalid contents file.
- exception pytools.persistent_dict.NoSuchEntryCollisionError[source]#
Raised when an entry is not found in a
PersistentDict
, but it contains an entry with the same hash key (hash collision).
- exception pytools.persistent_dict.ReadOnlyEntryError[source]#
Raised when an attempt is made to overwrite an entry in a
WriteOncePersistentDict
.
- exception pytools.persistent_dict.CollisionWarning[source]#
Warning raised when a collision is detected in a
PersistentDict
.
- class pytools.persistent_dict.Hash(*args, **kwargs)[source]#
A protocol for the hashes from
hashlib
.
- class pytools.persistent_dict.KeyBuilder[source]#
A (stateless) object that computes persistent hashes of objects fed to it. Subclassing this class permits customizing the computation of hash keys.
This class follows the same general rules as Pythonโs built-in hashing:
Only immutable objects can be hashed.
If two objects compare equal, they must hash to the same value.
Objects with the same hash may or may not compare equal.
In addition, hashes computed with
KeyBuilder
have the following properties:The hash is persistent across interpreter invocations.
The hash is the same across different Python versions and platforms.
The hash is invariant with respect to
PYTHONHASHSEED
.Hashes are computed using functionality from
hashlib
.
Key builders of this type are used by
PersistentDict
, but other uses are entirely allowable.- rec(key_hash: Hash, key: Any) Hash [source]#
- Parameters:
key_hash โ the hash object to be updated with the hash of key.
key โ the (immutable) Python object to be hashed.
- Returns:
the updated key_hash
Changed in version 2021.2: Now returns the updated key_hash.
- static new_hash()#
Return a new hash instance following the protocol of the ones from
hashlib
. This will permit switching to different hash algorithms in the future. Subclasses are expected to use this to create new hashes. Not doing so is deprecated and may stop working as early as 2022.New in version 2021.2.
- class pytools.persistent_dict.PersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None)[source]#
A concurrent disk-backed dictionary.
- __init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None) None [source]#
- Parameters:
identifier โ a file-name-compatible string identifying this dictionary
key_builder โ a subclass of
KeyBuilder
container_dir โ the directory in which to store this dictionary. If
None
, the default cache directory fromplatformdirs.user_cache_dir()
is used
- store(key: K, value: V, _skip_if_present: bool = False, _stacklevel: int = 0) None [source]#
Store (key, value) in the dictionary.
- store_if_not_present(key: K, value: V, _stacklevel: int = 0) None [source]#
Store (key, value) if key is not already present.
- class pytools.persistent_dict.WriteOncePersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, in_mem_cache_size: int = 256)[source]#
A concurrent disk-backed dictionary that disallows overwriting/ deletion (but allows removing all entries).
Compared with
PersistentDict
, this class has faster retrieval times because it uses an LRU cache to cache entries in memory.- __init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, in_mem_cache_size: int = 256) None [source]#
- Parameters:
identifier โ a file-name-compatible string identifying this dictionary
key_builder โ a subclass of
KeyBuilder
container_dir โ the directory in which to store this dictionary. If
None
, the default cache directory fromplatformdirs.user_cache_dir()
is usedin_mem_cache_size โ retain an in-memory cache of up to in_mem_cache_size items (with an LRU replacement policy)
- clear_in_mem_cache() None [source]#
Clear the in-memory cache of this dictionary.
New in version 2023.1.1.
- store(key: K, value: V, _skip_if_present: bool = False, _stacklevel: int = 0) None [source]#
Store (key, value) in the dictionary.
Internal stuff that is only here because the documentation tool wants it#
- class pytools.persistent_dict.K#
A type variable for the key type of a
PersistentDict
.
- class pytools.persistent_dict.V#
A type variable for the value type of a
PersistentDict
.