Python: BeautifulSoup - Insert tag
I’ve been scraping the Game of Thrones wiki in preparation for a meetup at Women Who Code next week and while attempting to extract character allegiances I wanted to insert missing line breaks to separate different allegiances.
I initially tried creating a line break like this:
>>> from bs4 import BeautifulSoup
>>> tag = BeautifulSoup("<br />", "html.parser")
>>> tag
<br/>
It looks like it should work but later on in my script I check the 'name' attribute to work out whether I’ve got a line break and it doesn’t return the value I expected it to:
>>> tag.name
u'[document]'
My script assumes it’s going to return the string 'br' so I needed another way of creating the tag. The following does the trick:
>>> from bs4 import Tag
>>> tag = Tag(name = "br")
>>> tag
<br></br>
>>> tag.name
'br'
That’s all for now, back to scraping for me!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.