The source of url_verify.php (click to demo the file) viewed 764 times.
If I wrote this code, then it is licensed under the GPL. If someone else wrote it, then please ask them if you want to use the code.
<?php
/*
This function will verify a http URL is formatted properly, returning
either with false or the url as fully qualified as possible.
I used rfc #2396 URI: Generic Syntax as my guide when creating the
regular expression. For all the details see the comments below.
For a short version of the urlverify() function without all the
regular expression notes, see the bottom of the script.
Note:
Urlverify() will assume all filenames without extensions are folders,
Thus 'http://abc.com/news' becomes 'http://abc.com/news/'
Last Edited: August 29th 2003
Author: Rod Apeldoorn - rod@canowhoopass.com
License: You may use any any or all of this script as long
as I am credited in either your code or credits
page.
*/
//if (!$urladdr) $urladdr="#top";
function verifyurl( $urladdr ){
// Here is the regular expression split up, I have it all on one line below.
$regexp = "^(https?://)?"; // http:// or https://
$regexp .= "(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?"; // username:password@
$regexp .= "("; // begin domain/ip section
$regexp .= "((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})"; // IP- 199.194.52.184
$regexp .= "|"; // allows either IP or domain
$regexp .= "(([0-9a-z_!~*'()-]+\.)*"; // tertiary domain(s)- www.
$regexp .= "([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\."; // second level domain
$regexp .= "(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2}))"; // top level domains- .coms or .museums or country codes
$regexp .= ")"; // end domain/ip section
$regexp .= "(:[1-6]?[0-9]{1,4})?"; // port number- :80 or :8080 or :12480
$regexp .= "((/?)|"; // a slash isn't required if there is no file name
$regexp .= "(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$"; // filename/queries/anchors
/*
1- http:// may or may not be there
2- https:// is also allowed
3- "username:password@" functionality. The password is not required allowing
"username@". I allowed the following characters:
"0-9" "a-z" "_" "!" "~" "*" "'" "(" ")" "." "&" "=" "+" "$" "%" "-"
4- The IP is 4 groups of 1 to 3 digits seperated by periods
5- OR - either an IP address or a domain name is needed
6- tertiary domains may use the following unreserved characters:
"a-z" | "1-0" | "_" | "!" | "~" | "*" | "(" | ")" | "-" | "'" | "."
NOTE: I made it so periods could not be first nor side by side
7- secondary domains must be alpha nummeric
Dashes are allowed but not at the beginning or end
must be between 1 and 63 characters
8- top level domains must be either 2 alpha characters (for country codes) or a recognized TLD (including .com and .info)
9- port number, a colon ":" followed by 1 to 5 numbers
A- The final slash "/" is needed only if anything follows the domain / ip / port
NOTE: This includes fragments and queries which may not be proper, but unlikely
B- Filenames and directories may contain the unreserved characters
along with the following special characters which can be used
for queries, fragments, and escaping extended characters
";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "%" | "#"
NOTE: I made it so forward slashes could not be side by side, but periods can be
NOTE: I left this section as broad as possible since there are so many
ways to combine files, queries and fragments. Generally it should be:
/directory/file?query&newquery#fragment
HERE IT IS ON ONE LINE */
// /-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------\
// /---------------------------------------------------\ |/-------------------------------------\ /-----------------------------------------------------------------------------------------------------------------------------------\| /----------------------------------------\
// /--------\ |/-----------------------\ | ||/-----------------\ | |/-------------------\ /-----------------------\ /-----------------------------------------------------------------------\||/-----------------\ |/--\ /------------------------------\ |
$regexp = "^(https?://)?(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})|(([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2})))(:[1-6]?[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$";
// 1 2 3 4 5 6 7 8 9 A B
//qualified domain
if (eregi( $regexp, $urladdr )){
// No http:// at the front? lets add it.
if (!eregi( "^https?://", $urladdr )) $urladdr = "http://" . $urladdr;
// If it's a plain domain or IP there should be a / on the end
if (!eregi( "^https?://.+/", $urladdr )) $urladdr .= "/";
// If it's a directory on the end we should add the proper slash
// We should first make sure it isn't a file, query, or fragment
if ((eregi( "/[0-9a-z~_-]+$", $urladdr)) && (!eregi( "[\?;&=+\$,#]", $urladdr))) $urladdr .= "/";
return ($urladdr);
}
else return false; // The domain didn't pass the expression
} // END Function verifyurl
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Url Verification using PHP and Regular Expressions</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<h2>Url Verification using PHP and Regular Expressions</h2>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
<p>Enter the URL address to test:<br />
<input type="text" name="urladdr" value="<?php if (isset($_POST['urladdr'])) echo $_POST['urladdr']; ?>" size="40" /><br />
<br />
<input type="submit" value="Check URL Address" />
</p>
</form>
<?php
// If the form has already been submitted, send the text to urlverify()
if (isset($_POST['urladdr'])) {
$urladdr = verifyurl( $_POST['urladdr'] );
// Now that we know the URL is fully qualified, we can play with it
if ($urladdr) {
echo '
<h3><a href="', $urladdr, '">', $urladdr, '</a></h3>
<b>Here it is all split up:</b><br />';
$urladdrparsed = parse_url( $urladdr );
while (list ($key, $value) = each ($urladdrparsed)) echo "$key => $value<br>\n";
}
else echo '<h3>Invalid URL</h3>';
} // End if (isset($_POST['urladdr'])) {
?>
<hr />
<p><em><font color="#999999">If you find any bugs in this code, please <a href="mailto:rod@canowhoopass.com">email the author</a>.</font></em></p>
</body>
</html>
<?php
/*
function verifyurl( $urladdr ){
// Regular Expression To Verify Url
$regexp = "^(https?://)?(([0-9a-z_!~*'().&=+$%-]+:)?[0-9a-z_!~*'().&=+$%-]+@)?((([12]?[0-9]{1,2}\.){3}[12]?[0-9]{1,2})|(([0-9a-z_!~*'()-]+\.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.(com|net|org|edu|mil|gov|int|aero|coop|museum|name|info|biz|pro|[a-z]{2})))(:[1-6]?[0-9]{1,4})?((/?)|(/[0-9a-z_!~*'().;?:@&=+$,%#-]+)+/?)$";
//qualified domain
if (eregi( $regexp, $urladdr )){
// No http:// at the front? lets add it.
if (!eregi( "^https?://", $urladdr )) $urladdr = "http://" . $urladdr;
// If it's a plain domain or IP there should be a / on the end
if (!eregi( "^https?://.+/", $urladdr )) $urladdr .= "/";
// If it's a directory on the end we should add the proper slash
// We should first make sure it isn't a file, query, or fragment
if ((eregi( "/[0-9a-z~_-]+$", $urladdr)) && (!eregi( "[\?;&=+\$,#]", $urladdr))) $urladdr .= "/";
return ($urladdr);
}
else return false; // The domain didn't pass the expression
} // END Function verifyurl
*/
?>
If you want to have a look at the source code, chose a file from this list:
To colour code your own PHP paste it here: