add function to extract media to library
According to this manual, media files appear either inside figure
or span
wrapper nodes and they all have the attribute typeof = "mw:File/*"
. Additionally, the specific information related to specific media types can be found inside specialized nodes, i.e:
- Images ->
<img>
- Video ->
<video>
- Audio ->
<audio>
However, upto HTML 2.4.0 there were separate mw attributes for each type of files, i.e: mw: image, mw:audio, mw:video . See this issue for reference.