1. 18 Aug, 2022 3 commits
  2. 05 Aug, 2022 2 commits
  3. 06 Jul, 2022 2 commits
  4. 11 Jul, 2022 5 commits
  5. 01 Jul, 2022 2 commits
  6. 30 Jun, 2022 3 commits
  7. 29 Jun, 2022 6 commits
  8. 08 Jun, 2022 2 commits
  9. 31 May, 2022 1 commit
    • Cparle's avatar
      pipeline optimizations · 68c841bb
      Cparle authored and Xcollazo's avatar Xcollazo committed
      - If we write a dataframe to hive and want to use it later,
      it's quicker to read the data back from Hive than to use the original dataframe
      - writing search index data one wiki at a time means we end up with
      70k partitions in the search_index tables, so do it all at once instead
      68c841bb
  10. 25 May, 2022 1 commit
  11. 24 May, 2022 1 commit
  12. 18 May, 2022 2 commits
  13. 17 May, 2022 3 commits
  14. 12 May, 2022 3 commits
  15. 05 May, 2022 2 commits
    • Marco Fossati's avatar
      pass the expected snapshot · d4ea87e8
      Marco Fossati authored and Xcollazo's avatar Xcollazo committed
      d4ea87e8
    • Marco Fossati's avatar
      remove `day < 5` heuristic · 9ddc56de
      Marco Fossati authored and Xcollazo's avatar Xcollazo committed
      This solution is not robust against eventual failures of the jobs we
      depend on: for instance, if one job crashes and completes at `day >= 5`,
      it would break ours.
      We opt for the Airflow Hive sensor to wait for all the relevant DB
      tables: an elegant solution that would also get the freshest data.
      9ddc56de
  16. 04 May, 2022 2 commits