Skip to content

Add aggregation and content gap pipeline

AikoChou requested to merge 2-timeseries-aggregation into main
  • added aggregation pipeline (gender/sexual orientation/geographic)
  • dataset generated from aggregation pipeline with the following schema: https://docs.google.com/document/d/1Z-EpXMnfzHAp-M5vdQ-NbFkXx3tQRMQpeZyYwBaK4xU/edit?usp=sharing (not prescriptive)
  • added content gap pipeline (previously under interactive/ directory)
  • some function names modified in func.py and util.py
  • deleted unnecessary code
  • tested on a small dataset using the spark2-submit --master local on the stat machine

Merge request reports