search: specifiy analytics-hadoop in urls (!696) · Merge requests · repos / data-engineering / Airflow DAGs

Ebernhardson requested to merge work/ebernhardson/hdfs-default-cluster-url into main May 15, 2024

While both spark and the typical hdfs tooling accept the hdfs:/// shorthand to request the default cluster, pyarrow refuses to use those urls. Recently, in discolytics 0.18.0, convert_to_esbulk.py was changed to use pyarrow so now we need to update these urls to use the default cluster.

This should have no downsides. Using the default cluster wasn't doing much of anything, we operate on a single cluster and the name has never historically changed.

Admin message

Admin message

search: specifiy analytics-hadoop in urls

Merge request reports