-
Aleksandar Mastilovic authored
* Changes to importing configured callable functions * Import a callable function only if it's present as a string in config * Added missing return type annotations * Better mechanism to import symbols defined in YAML config * Docstrings, better logging * cache_key_fn loadable from YAML, additional refactoring * Cache key functions moved to workflow_utils.artifact.cache module level * Cache key functions renamed * "current" and "tstamped" cache key functions removed from workflow_utils.artifact.cache module * Removed CacheKeyUtil class * cache_key_fn now loadable from YAML configuration * Refactoring of ArtifactCache classes * Removed cache_key method from FsArtifactCache and replaced it with a function as an argument to the constructor. * Created utility class CacheKeyUtil for different cache key function implementations. * Removed FsVersionedArtifactCache class, replaced it with regular FsArtifactCache with custom cache key functions. * README.md update, docstrings updated, put method removed from FsArtifactSource * README.md updated to reflect the refactored classes * Missing docstrings updated for methods in source.py and cache.py * put(artifact) method removed from FsArtifactSource * MavenArtifactSource constructor now checks if base_uri is not None * Recursive cache delete * Enabled recursive cache artifact delete * Added a relevant unit test * Refactoring work done * Since we have no plans on supporting underlying FS libraries other than fsspec, abstract base classes ArtifactCache and ArtifactSource have been removed * Added URI validity check in cache and source constructors * FsVersionedArtifactCache refactored to accept a callable argument that provides the final component of the cache output path, instead of automatically creating "current" and tstamped paths * Switched to using fsspec.core.url_to_fs function to get a handle to the underlying filesystem * Better tests, pendulum removed, performance improvement * tests/test_artifact.py improvements: hard-coded string for file content put into a shared variable, better fixture naming, cleaner comparison of source and cached folders/files * Removed dependency on pendulum library, switched to simple datetime * In versioned cache artifact, "current" copies from "tstamped" to improve performance * Another linter bug * Fix linting issue with regular expression backslash * Introducing versioned artifact cache In this MR we introduce a new kind of cache - a versioned cache. This cache stores copies of artifacts in two separate directories. One directory is named by a timestamp at the time of caching, in the YYYYMMDDHHmmss format, and the other directory is named "current". So, where a normal cache would take an artifact like this: rootfs://artifact_root_dir | +-- artifact.file and place it in cache like this: cachefs://cache_root_dir | +-- artifact.file the versioned cache will produce a directory layout as follows: cachefs://cache_root_dir | +-- 20241004121212 | | | +-- artifact.file | +-- current | +-- artifact.file This MR also changes the way the library uses fsspec API - instead of opening a stream to write to caches, the library now uses fsspec's `copy` method that can work on both files and directories.
c82de8e0Aleksandar Mastilovic authored* Changes to importing configured callable functions * Import a callable function only if it's present as a string in config * Added missing return type annotations * Better mechanism to import symbols defined in YAML config * Docstrings, better logging * cache_key_fn loadable from YAML, additional refactoring * Cache key functions moved to workflow_utils.artifact.cache module level * Cache key functions renamed * "current" and "tstamped" cache key functions removed from workflow_utils.artifact.cache module * Removed CacheKeyUtil class * cache_key_fn now loadable from YAML configuration * Refactoring of ArtifactCache classes * Removed cache_key method from FsArtifactCache and replaced it with a function as an argument to the constructor. * Created utility class CacheKeyUtil for different cache key function implementations. * Removed FsVersionedArtifactCache class, replaced it with regular FsArtifactCache with custom cache key functions. * README.md update, docstrings updated, put method removed from FsArtifactSource * README.md updated to reflect the refactored classes * Missing docstrings updated for methods in source.py and cache.py * put(artifact) method removed from FsArtifactSource * MavenArtifactSource constructor now checks if base_uri is not None * Recursive cache delete * Enabled recursive cache artifact delete * Added a relevant unit test * Refactoring work done * Since we have no plans on supporting underlying FS libraries other than fsspec, abstract base classes ArtifactCache and ArtifactSource have been removed * Added URI validity check in cache and source constructors * FsVersionedArtifactCache refactored to accept a callable argument that provides the final component of the cache output path, instead of automatically creating "current" and tstamped paths * Switched to using fsspec.core.url_to_fs function to get a handle to the underlying filesystem * Better tests, pendulum removed, performance improvement * tests/test_artifact.py improvements: hard-coded string for file content put into a shared variable, better fixture naming, cleaner comparison of source and cached folders/files * Removed dependency on pendulum library, switched to simple datetime * In versioned cache artifact, "current" copies from "tstamped" to improve performance * Another linter bug * Fix linting issue with regular expression backslash * Introducing versioned artifact cache In this MR we introduce a new kind of cache - a versioned cache. This cache stores copies of artifacts in two separate directories. One directory is named by a timestamp at the time of caching, in the YYYYMMDDHHmmss format, and the other directory is named "current". So, where a normal cache would take an artifact like this: rootfs://artifact_root_dir | +-- artifact.file and place it in cache like this: cachefs://cache_root_dir | +-- artifact.file the versioned cache will produce a directory layout as follows: cachefs://cache_root_dir | +-- 20241004121212 | | | +-- artifact.file | +-- current | +-- artifact.file This MR also changes the way the library uses fsspec API - instead of opening a stream to write to caches, the library now uses fsspec's `copy` method that can work on both files and directories.
Loading