Add more details on elements that can be extracted to README
Go through src/mwparserfromhtml/parse/elements.py
and document the different classes in the README so folks know what's covered. Specifically, the current README.md
has an example code block that says:
for article in html_dump:
print(article.html.wikistew.get_templates())
print(article.html.wikistew.get_categories())
print(article.html.wikistew.get_wikilinks())
print(article.html.wikistew.get_externallinks())
print(article.html.wikistew.get_images())
print(article.html.wikistew.get_references())
This is incomplete. Two changes to make:
-
Make sure all the current get_...
functions under the WikiStew class (code) are documented here. You can keep them in the order that they appear in the WikiStew class as that was my basic attempt at some semantic clustering (e.g., images appear alongside audio/video). -
Printing is actually kinda silly as it wouldn't show much. A more reasonable alternative is to compute the counts -- e.g., num_templates = len(article.html.wikistew.get_templates())
. So for each function that we document, it should be in that form.
When this is complete, we'll go back and update the docstrings for each function (#127) so for someone who is like "what's a reference? how's that different from a source?", they'll be able to understand the difference by inspecting each function.