Prepare for Airflow deployment

  • fuse scored topics with null ones
  • test final output
  • add pipeline docstring with a link to the expected output schema
  • no explicit output partitioning. It worsens performance and isn't needed, since downstream jobs are expected to read all the data all the time
  • make I/O HDFS paths relative to a working directory
  • optionally pass the working directory to the CLI, defaulting to section_topics relative to the current user home
  • update CLI

