The source of url_verify.php (click to demo the file) viewed 764 times.
If I wrote this code, then it is licensed under the GPL. If someone else wrote it, then please ask them if you want to use the code.
This function will verify a http URL is formatted properly, returning
either with false or the url as fully qualified as possible.
I used rfc #2396 URI: Generic Syntax as my guide when creating the
regular expression. For all the details see the comments below.
For a short version of the urlverify() function without all the
regular expression notes, see the bottom of the script.
Urlverify() will assume all filenames without extensions are folders,
Thus '' becomes ''
Last Edited: August 29th 2003
Author: Rod Apeldoorn -
License: You may use any any or all of this script as long
as I am credited in either your code or credits
//if (!$urladdr) $urladdr="#top";
function verifyurl( $urladdr ){
// Here is the regular expression split up, I have it all on one line below.
$regexp = "^(https?://)?"; // http:// or https://
$regexp .= "(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?"; // username:password@
$regexp .= "("; // begin domain/ip section
$regexp .= "((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})"; // IP-
$regexp .= "|"; // allows either IP or domain
$regexp .= "(([0-9a-z_!~*'()-]+\.)*"; // tertiary domain(s)- www.
$regexp .= "([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\."; // second level domain
$regexp .= "(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2}))"; // top level domains- .coms or .museums or country codes
$regexp .= ")"; // end domain/ip section
$regexp .= "(:[1-6]?[0-9]{1,4})?"; // port number- :80 or :8080 or :12480
$regexp .= "((/?)|"; // a slash isn't required if there is no file name
$regexp .= "(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"; // filename/queries/anchors
1- http:// may or may not be there
2- https:// is also allowed
3- "username:password@" functionality. The password is not required allowing
"username@". I allowed the following characters:
"0-9" "a-z" "_" "!" "~" "*" "'" "(" ")" "." "&" "=" "+" "$" "%" "-"
4- The IP is 4 groups of 1 to 3 digits seperated by periods
5- OR - either an IP address or a domain name is needed
6- tertiary domains may use the following unreserved characters:
"a-z" | "1-0" | "_" | "!" | "~" | "*" | "(" | ")" | "-" | "'" | "."
NOTE: I made it so periods could not be first nor side by side
7- secondary domains must be alpha nummeric
Dashes are allowed but not at the beginning or end
must be between 1 and 63 characters
8- top level domains must be either 2 alpha characters (for country codes) or a recognized TLD (including .com and .info)
9- port number, a colon ":" followed by 1 to 5 numbers
A- The final slash "/" is needed only if anything follows the domain / ip / port
NOTE: This includes fragments and queries which may not be proper, but unlikely
B- Filenames and directories may contain the unreserved characters
along with the following special characters which can be used
for queries, fragments, and escaping extended characters
";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "%" | "#"
NOTE: I made it so forward slashes could not be side by side, but periods can be
NOTE: I left this section as broad as possible since there are so many
ways to combine files, queries and fragments. Generally it should be:
// /-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------\
// /---------------------------------------------------\ |/-------------------------------------\ /-----------------------------------------------------------------------------------------------------------------------------------\| /----------------------------------------\
// /--------\ |/-----------------------\ | ||/-----------------\ | |/-------------------\ /-----------------------\ /-----------------------------------------------------------------------\||/-----------------\ |/--\ /------------------------------\ |
$regexp = "^(https?://)?(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})|(([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2})))(:[1-6]?[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$";
// 1 2 3 4 5 6 7 8 9 A B
//qualified domain
if (eregi( $regexp, $urladdr )){
// No http:// at the front? lets add it.
if (!eregi( "^https?://", $urladdr )) $urladdr = "http://" . $urladdr;
// If it's a plain domain or IP there should be a / on the end
if (!eregi( "^https?://.+/", $urladdr )) $urladdr .= "/";
// If it's a directory on the end we should add the proper slash
// We should first make sure it isn't a file, query, or fragment
if ((eregi( "/[0-9a-z~_-]+$", $urladdr)) && (!eregi( "[\?;&=+\$,#]", $urladdr))) $urladdr .= "/";
return ($urladdr);
else return false; // The domain didn't pass the expression
} // END Function verifyurl
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">
<html xmlns="">
<title>Url Verification using PHP and Regular Expressions</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<h2>Url Verification using PHP and Regular Expressions</h2>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
<p>Enter the URL address to test:<br />
<input type="text" name="urladdr" value="<?php if (isset($_POST['urladdr'])) echo $_POST['urladdr']; ?>" size="40" /><br />
<br />
<input type="submit" value="Check URL Address" />
// If the form has already been submitted, send the text to urlverify()
if (isset($_POST['urladdr'])) {
$urladdr = verifyurl( $_POST['urladdr'] );
// Now that we know the URL is fully qualified, we can play with it
if ($urladdr) {
echo '
<h3><a href="', $urladdr, '">', $urladdr, '</a></h3>
<b>Here it is all split up:</b><br />';
$urladdrparsed = parse_url( $urladdr );
while (list ($key, $value) = each ($urladdrparsed)) echo "$key => $value<br>\n";
else echo '<h3>Invalid URL</h3>';
} // End if (isset($_POST['urladdr'])) {
<hr />
<p><em><font color="#999999">If you find any bugs in this code, please <a href="">email the author</a>.</font></em></p>
function verifyurl( $urladdr ){
// Regular Expression To Verify Url
$regexp = "^(https?://)?(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})|(([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2})))(:[1-6]?[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$";
//qualified domain
if (eregi( $regexp, $urladdr )){
// No http:// at the front? lets add it.
if (!eregi( "^https?://", $urladdr )) $urladdr = "http://" . $urladdr;
// If it's a plain domain or IP there should be a / on the end
if (!eregi( "^https?://.+/", $urladdr )) $urladdr .= "/";
// If it's a directory on the end we should add the proper slash
// We should first make sure it isn't a file, query, or fragment
if ((eregi( "/[0-9a-z~_-]+$", $urladdr)) && (!eregi( "[\?;&=+\$,#]", $urladdr))) $urladdr .= "/";
return ($urladdr);
else return false; // The domain didn't pass the expression
} // END Function verifyurl
If you want to have a look at the source code, chose a file from this list:
To colour code your own PHP paste it here: