Support using an existing file obj
This provides more flexibility; e.g. no longer needs to be a local file
Example use case would be for running within a spark job, where there is no local file: there's no access to NFS share & not enough disk space to downloads full dumps to, but I can have the dumps in HDFS, and access them from within the job like so:
cat = subprocess.Popen(['hdfs', 'dfs', '-cat', wiki_dump_path], stdout=subprocess.PIPE)
html_dump = HTMLDump(filepath=wiki_dump_path, fileobj=cat.stdout)