/home/arbel/sites/lrscraper/sitemap_pdf_scraper_base.py:251: XMLParsedAsHTMLWarning: It looks like you're using an HTML parser to parse an XML document. Assuming this really is an XML document, what you're doing might work, but you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the Python package 'lxml' installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor. If you want or need to use an HTML parser on this document, you can make this warning go away by filtering it. To do that, run this code before calling the BeautifulSoup constructor: from bs4 import XMLParsedAsHTMLWarning import warnings warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning) soup = BeautifulSoup(html, "html.parser")