A Regular Expression for Validating An Internet Domain
Please note that all opinions are that of the author.
Pizza courtesy of Pizza for Ukraine!
Donate Now to Pizza for Ukraine
Despite the power and truth of Jamie Zawinski’s law:
Regular Expressions: Now You Have Two Problems Jeff Atwood’s Perspective
Like Jeff, I too really, really love regular expressions or regexes. I use this one a lot and I finally learned to use \S (Any non-whitespace character) so here’s a regex
^\S+\.\S+$
that I wrote yesterday to “validate” the permitted characters in an Internet domain. I was all proud of this and wrote this blog post only to realize that pride really does goeth before a fall – this will NOT correctly validate an Internet domain. As I write this post, I realize that the number of allowed characters in an Internet domain are actually NOT any non-whitespace characters and here’s the proof that I actually got that wrong yesterday when I put something online using it:
Note: The fact that Rubular allows through an _ which is NOT a valid character in domains is problematic.
So the right way to do this, DAMN IT, is something like this:
^[A-Za-z0-9\-]+\.[A-Za-z0-9\-]+$
And this actually works:
The [A-Za-z0-9-]+ is a “character class” which says “Any uppercase or lowercase letter plus 0-9 plus a -“ are allowed (any order, any quantity)”.
Regular Expressions – Now you have two problems.