Make ArtifactLocator methods work with Artifacts, rather than artifact ids, use artifact.name as default cache_key
This will allow Locator implementations to be smarter about how they source or cache artifacts, because they will have richer information about the Artifact.
This is needed in order to work around usage with Gitlab package registry, where the default URL to a package ends in 'download', even if that package file is a .tgz archive. We need more control over the final cached file name.
The final cached filename ends up influencing the behavior of e.g. spark and skein, as both of those take 'archives' which are automatically unpacked if the filename ends in an archive file ext like .tgz. So, we need cached archive files to have the appropriate file extension!
The default implementation of ArtifactCache.cache_key
now uses artifact.name
as the
cache key, giving artifact instantiators easy control over the final cached filename. To make sure a gitlab download url works as an FsArtifactSource, one would declare:
artifacts:
my_conda_env_artifact-0.1.0.tgz:
id: gitlab.wm.org/packages/..../download # gitlab package download link here
source: fs_artifact_souce
The cached filename will end up being 'my_conda_env_artifact-0.1.0.tgz'.
Another option would have been to add a new optional Artifact config property like cache_key
or something,
using it for the cached filename if is set. However I prefer the artifact name to option that introduces less config.
Fixes: airflow-dags!47 (comment 6838)
Bug: T307115