RegEx match open tags except XHTML self-contained tags
I need to match all of these opening tags:
<p>
<a href="foo">
But not these:
<br />
<hr class="foo" />
I came up with this and wanted to make sure I've got it right. I am only capturing the a-z
.
<([a-z]+) *[^/]*?>
I believe it says:
- Find a less-than, then
- Find (and capture) a-z one or more times, then
- Find zero or more spaces, then
- Find any character zero or more times, greedy, except
/
, then - Find a greater-than
Do I have that right? And more importantly, what do you think?
3Answer
If you only want the tag names it should be possible to do this via regex.
<([a-zA-Z]+)(?:[^>]*[^/] *)?>
should do what you need. But I think the solution of "moritz" is already fine. I didn't see it in the beginning.
For all downvoters: In some cases it just makes sense to use regex, because it can be the easiest and quickest solution. I agree that in general you should not parse HTML with regex. But regex can be a very powerful tool when you have a subset of HTML where you know the format and you just want to extract some values. I did that hundreds of times and almost always achieved what I wanted.
- answered 8 years ago
- Sunny Solu
If you're simply trying to find those tags (without ambitions of parsing) try this regular expression:
/\<[^/]*?\/>/g
I wrote it in 30 seconds, and tested here: http://gskinner.com/RegExr/
It matches the types of tags you mentioned, while ignoring the types you said you wanted to ignore.
- answered 8 years ago
- G John
While it is true that asking regexes to parse arbitrary HTML is like asking Paris Hilton to write an operating system, it's sometimes appropriate to parse a limited, known set of HTML.
If you have a small set of HTML pages that you want to scrape data from and then stuff into a database, regexes might work fine. For example, I recently wanted to get the names, parties, and districts of Australian federal Representatives, which I got off of the Parliament's Web site. This was a limited, one-time job.
Regexes worked just fine for me, and were very fast to set up.
- answered 8 years ago
- Gul Hafiz
Your Answer