Script generated url structure based on actual data coming from external sources such as user input and rss feed, may result in a “404 (Not Found)” error in Apache especially when it contains url encoded slashes. Here’s why…
By default, Apache http web server immediately returns a “404 (Not Found)” error when it encounters url encoded versions of path separators or slashes in URL (Uniform Resource Locator). These are %2F for forward slash (/) and %5C for back slash (\).
The server simply rejects the URL without invoking mod_proxy or mod_rewrite.
This behavior was made to prevent a CGI security hole where sensitive files are mistakenly accessed by scripts using PATH_INFO as a result of poor or no security checks at all.
While this was a legitimate thing to do, this causes problems with some scripts that depend on receiving encoded values which may contain slashes, in URLs.
Here are five (5) ways to solve the problem.
Turn on “AllowEncodedSlashes” directive in Apache
This directive may be set in server config file (e.g. httpd.conf) and may appear inside <VirtualHost> containers to affect certain websites. Using it in .htaccess files is not allowed.
<VirtualHost *:80> AllowEncodedSlashes On </VirtualHost>
Turning on this directive tells the web server to allow encoded slashes in URLs.
One advantage of using this solution is that you don’t have to modify anything in your scripts to process the URL successfully.
Unfortunately, this option is not always available. Some shared hosting services do not allow access to their server config files, and some might even have Apache installations older than 2.0.46.
See the Apache HTTP server documentation for allowencodedslashes directive for more info.
Anyway, there are better alternatives and even more appropriate for dealing with the problem. Though, they require knowledge in PHP scripting and url encodings, implementing them will make your web site less dependent on web server environment.
Replace %2F with %252F and %5C with %255C after url encoding
Consider the PHP code below.
$url = ‘http://www.example.com/books/’;
$title = ‘the lamp, linux/apache/mysql/php solution’;
// replace all spaces with underscores (_) in the $title
$title = str_replace(‘ ‘, ‘_’, $title);
// urlencode the $title and replace all occurrences of encoded slashes in one line
$url .= str_replace(array(‘%2F’,’%5C’), array(‘%252F’,’%255C’), urlencode($title)) . ‘.html’;
The url will become:
%25 is actually the url encoded equivalent of percent (%) so the code above is actually url encoding the slashes twice.
Using the same variables in #2, the code below,
$url .= urlencode(urlencode($title)) . '.html’
will result in a url
Note that other url encoded characters are transformed as well. Notice the %252C. It is changed from %2C which is the encoded equivalent of comma (,).
Both #2 and #3 will work but they will make your URLs longer, ugly and harder to read.
Use unencoded slashes
As a good practice, we urlencode strings which may contain slashes, for use in a url. To get back the slashes, we need to replace %2F with slash (/) character.
Using same variables in #2, the following code,
$url .= str_replace('%2F', '/', urlencode($title)) . '.html’
A very nice looking slashes in the URL. However, it reads as though you have a page named “php_solution.html” inside the mysql folder. And, in fact, if you happen to have one, that page will be displayed instead.
Replace slashes with underscores (_)
The safest and most practical way of handling slashes, is to replace them with underscores before url encoding.
Using the variables in #2, the following PHP code (also replaces commas),
$url .= urlencode(str_replace(array('/',','), '_', $title)) . '.html’
A more meaningful URL.
The solutions mentioned above may not be applicable to the project you are currently working on. But remember, when selecting a solution to implement, choose the one that works best with your development platform and style. And, keep things simple.
- To capture the title part of the output url above using mod_rewrite:
RewriteRule ^books/(.*?)\.html$ /index.php?title=$1
- A quick method for checking whether a web site is running on Apache http server without looking at the server header field:
- Open a browser window, and enter the address http://www.apache.org// (two forward slashes at the end). You will get the home page.
- Replace the forward slash at the end with %2F (url encoded slash) so address becomes http://www.apache.org/%2F. The web server will now respond with a “404 (Not Found)” response code!