Persistent Hashing and Persistent Dictionaries

This module contains functionality that allows hashing with keys that remain valid across interpreter invocations, unlike Python’s built-in hashes.

This module also provides a disk-backed dictionary that uses persistent hashing.

exception pytools.persistent_dict.NoSuchEntryError[source]

Raised when an entry is not found in a PersistentDict.

exception pytools.persistent_dict.NoSuchEntryCollisionError[source]

Raised when an entry is not found in a PersistentDict, but it contains an entry with the same hash key (hash collision).

exception pytools.persistent_dict.ReadOnlyEntryError[source]

Raised when an attempt is made to overwrite an entry in a WriteOncePersistentDict.

exception pytools.persistent_dict.CollisionWarning[source]

Warning raised when a collision is detected in a PersistentDict.

class pytools.persistent_dict.Hash(*args, **kwargs)[source]

A protocol for the hashes from hashlib.

update(data: ReadableBuffer) None[source]
digest() bytes[source]
hexdigest() str[source]
copy() Self[source]
class pytools.persistent_dict.KeyBuilder[source]

A (stateless) object that computes persistent hashes of objects fed to it. Subclassing this class permits customizing the computation of hash keys.

This class follows the same general rules as Python’s built-in hashing:

  • Only immutable objects can be hashed.

  • If two objects compare equal, they must hash to the same value.

  • Objects with the same hash may or may not compare equal.

In addition, hashes computed with KeyBuilder have the following properties:

  • The hash is persistent across interpreter invocations.

  • The hash is the same across different Python versions and platforms.

  • The hash is invariant with respect to PYTHONHASHSEED.

  • Hashes are computed using functionality from hashlib.

Key builders of this type are used by PersistentDict, but other uses are entirely allowable.

__call__(key: Any) str[source]

Return the hash of key.

rec(key_hash: Hash, key: Any) Hash[source]
Parameters:
  • key_hash – the hash object to be updated with the hash of key.

  • key – the (immutable) Python object to be hashed.

Returns:

the updated key_hash

Changed in version 2021.2: Now returns the updated key_hash.

static new_hash()

Return a new hash instance following the protocol of the ones from hashlib. This will permit switching to different hash algorithms in the future. Subclasses are expected to use this to create new hashes. Not doing so is deprecated and may stop working as early as 2022.

Added in version 2021.2.

class pytools.persistent_dict.PersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None)[source]

A concurrent disk-backed dictionary.

Note

This class intentionally does not store all values with a certain key, based on the assumption that key conflicts are highly unlikely, and if they occur, almost always due to a bug in the hash key generation code (KeyBuilder).

__init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None) None[source]
Parameters:
  • identifier – a filename-compatible string identifying this dictionary

  • key_builder – a subclass of KeyBuilder

  • container_dir – the directory in which to store this dictionary. If None, the default cache directory from platformdirs.user_cache_dir() is used

  • enable_wal – enable write-ahead logging (WAL) mode. This mode is faster than the default rollback journal mode, but it is not compatible with network filesystems.

__getitem__(key: K) V[source]

Return the value associated with key in the dictionary.

__setitem__(key: K, value: V) None[source]

Store (key, value) in the dictionary.

__delitem__(key: K) None[source]

Remove the entry associated with key from the dictionary.

clear() None[source]

Remove all entries from the dictionary.

store(key: K, value: V, _skip_if_present: bool = False) None[source]

Store (key, value) in the dictionary.

store_if_not_present(key: K, value: V) None[source]

Store (key, value) if key is not already present.

fetch(key: K) V[source]

Return the value associated with key in the dictionary.

remove(key: K) None[source]

Remove the entry associated with key from the dictionary.

class pytools.persistent_dict.WriteOncePersistentDict(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None, in_mem_cache_size: int = 256)[source]

A concurrent disk-backed dictionary that disallows overwriting/ deletion (but allows removing all entries).

Compared with PersistentDict, this class has faster retrieval times because it uses an LRU cache to cache entries in memory.

Note

This class intentionally does not store all values with a certain key, based on the assumption that key conflicts are highly unlikely, and if they occur, almost always due to a bug in the hash key generation code (KeyBuilder).

__init__(identifier: str, key_builder: KeyBuilder | None = None, container_dir: str | None = None, *, enable_wal: bool = False, safe_sync: bool | None = None, in_mem_cache_size: int = 256) None[source]
Parameters:
  • identifier – a filename-compatible string identifying this dictionary

  • key_builder – a subclass of KeyBuilder

  • container_dir – the directory in which to store this dictionary. If None, the default cache directory from platformdirs.user_cache_dir() is used

  • enable_wal – enable write-ahead logging (WAL) mode. This mode is faster than the default rollback journal mode, but it is not compatible with network filesystems.

  • in_mem_cache_size – retain an in-memory cache of up to in_mem_cache_size items (with an LRU replacement policy)

__getitem__(key: K) V[source]

Return the value associated with key in the dictionary.

__setitem__(key: K, value: V) None[source]

Store (key, value) in the dictionary.

clear() None[source]

Remove all entries from the dictionary.

clear_in_mem_cache() None[source]

Clear the in-memory cache of this dictionary.

Added in version 2023.1.1.

store(key: K, value: V, _skip_if_present: bool = False) None[source]

Store (key, value) in the dictionary.

store_if_not_present(key: K, value: V) None[source]

Store (key, value) if key is not already present.

fetch(key: K) V[source]

Return the value associated with key in the dictionary.

Internal stuff that is only here because the documentation tool wants it

class pytools.persistent_dict.K

A type variable for the key type of a PersistentDict.

class pytools.persistent_dict.V

A type variable for the value type of a PersistentDict.