research-ml merge requestshttps://gitlab.wikimedia.org/fab/research-ml/-/merge_requests2021-09-27T14:25:58Zhttps://gitlab.wikimedia.org/fab/research-ml/-/merge_requests/1Moving code from GH2021-09-27T14:25:58ZFabian KaelinMoving code from GHmoving code from gh. test pr.moving code from gh. test pr.https://gitlab.wikimedia.org/fab/research-ml/-/merge_requests/2knowledge gap DAG2021-09-30T15:19:30ZFabian Kaelinknowledge gap DAGhttps://gitlab.wikimedia.org/fab/research-ml/-/merge_requests/3add content gaps files2021-11-23T09:25:47ZAikoChouadd content gaps filescontent_gaps.py
* add `get_wikidata_qitems` to extract all wikidata qitems
* add `get_wikidata_properties` to extract all property-value pairs for all wikidata qitems
* change `aggregate_item_property` to multiple functions. one function...content_gaps.py
* add `get_wikidata_qitems` to extract all wikidata qitems
* add `get_wikidata_properties` to extract all property-value pairs for all wikidata qitems
* change `aggregate_item_property` to multiple functions. one function handles one property
* change `get_wikipedia_revision_text`: remove redirect pages by joining mediawiki_page table
content_gaps_metrics.py
* show the flow of getting one big dataframe (not including article features yet)
* examples for computing metrics
* examples for plotting graph
article_features.py
* wrap multiple wikitext functions from Isaac's [example code](https://github.com/geohci/miscellaneous-wikimedia/blob/master/article-features/article_data.ipynb) to one udfhttps://gitlab.wikimedia.org/fab/research-ml/-/merge_requests/4Aiko/local mode2022-02-17T15:52:27ZAikoChouAiko/local mode`config.py` - contains parameters, file path, etc
`spark.py` - contains spark configuration for running on the cluster
`script.py` - the main script to produce content gap features and pages dataframe
`func.py` - contains functions to...`config.py` - contains parameters, file path, etc
`spark.py` - contains spark configuration for running on the cluster
`script.py` - the main script to produce content gap features and pages dataframe
`func.py` - contains functions to create dataframes and transform them to content gap features
`util.py` - contains functions to download and import external data, and preprocessing for the geography gapAikoChouAikoChou