Persistent Hashing and Persistent Dictionaries#

This module contains functionality that allows hashing with keys that remain valid across interpreter invocations, unlike Python’s built-in hashes.

This module also provides a disk-backed dictionary that uses persistent hashing.

exception pytools.persistent_dict.NoSuchEntryError[source]#
exception pytools.persistent_dict.ReadOnlyEntryError[source]#
exception pytools.persistent_dict.CollisionWarning[source]#
class pytools.persistent_dict.KeyBuilder[source]#

A (stateless) object that computes hashes of objects fed to it. Subclassing this class permits customizing the computation of hash keys.

__call__(key)[source]#

Call self as a function.

rec(key_hash, key)[source]#
Parameters:
  • key_hash – the hash object to be updated with the hash of key.

  • key – the (immutable) Python object to be hashed.

Returns:

the updated key_hash

Changed in version 2021.2: Now returns the updated key_hash.

static new_hash()#

Return a new hash instance following the protocol of the ones from hashlib. This will permit switching to different hash algorithms in the future. Subclasses are expected to use this to create new hashes. Not doing so is deprecated and may stop working as early as 2022.

New in version 2021.2.

class pytools.persistent_dict.PersistentDict(identifier, key_builder=None, container_dir=None)[source]#

A concurrent disk-backed dictionary.

__init__(identifier, key_builder=None, container_dir=None)[source]#
Parameters:
  • identifier – a file-name-compatible string identifying this dictionary

  • key_builder – a subclass of KeyBuilder

__getitem__(key)[source]#
__setitem__(key, value)[source]#
__delitem__(key)[source]#
clear()[source]#
store(key, value, _skip_if_present=False, _stacklevel=0)[source]#
store_if_not_present(key, value, _stacklevel=0)[source]#
fetch(key, _stacklevel=0)[source]#
remove(key, _stacklevel=0)[source]#
class pytools.persistent_dict.WriteOncePersistentDict(identifier, key_builder=None, container_dir=None, in_mem_cache_size=256)[source]#

A concurrent disk-backed dictionary that disallows overwriting/deletion.

Compared with PersistentDict, this class has faster retrieval times.

__init__(identifier, key_builder=None, container_dir=None, in_mem_cache_size=256)[source]#
Parameters:
  • identifier – a file-name-compatible string identifying this dictionary

  • key_builder – a subclass of KeyBuilder

  • in_mem_cache_size – retain an in-memory cache of up to in_mem_cache_size items

__getitem__(key)[source]#
__setitem__(key, value)[source]#
clear()[source]#
clear_in_mem_cache() None[source]#

New in version 2023.1.1.

store(key, value, _skip_if_present=False, _stacklevel=0)[source]#
store_if_not_present(key, value, _stacklevel=0)[source]#
fetch(key, _stacklevel=0)[source]#