Form Handling with PHP

Lesson 7 - Wrapping It Up

Index   Lesson << Prev 1 2 3 4 5 6 7 Next >>

Okay, I mentioned earlier that I'd address a couple of other issues in later lessons. Now, we'll get to those.

Protecting Your Script From Spammers

First is protecting your script from spammers. Let's talk about that now. As I mentioned, most browsers will send the HTTP_REFERER header when they request a page from the server. However, it's not entirely reliable. So what can we do to ensure that the page was submitted from our server? The most reliable and secure method is to use a session. How do we do that? I'm glad you asked because that's what we're going to talk about now.

As you may, or may not know, HTML is stateless. What does that mean? It simply means that the browser sends a request for a page. The server receives that request, locates the page, and sends it to the browser. It then forgets everything its done. Each request is a new session. The server does not maintain its "state" from one request to the next. Normally, that's a good thing. Sometimes, however, you need to maintain some values between requests. If you think about a password protected portion of a web site, the server needs to be able to tell, from one request to the next, that the user is a valid, logged in user. Values like that are preserved by using sessions.

Sessions allow you to preserve values, so the easiest and most secure way to ensure that your script was, in fact, called from your server is to start a session when the form loads and test for the existence of that session when the form is submitted. It's easy enough to do. How you do it will depend on whether you are processing the form in the same file that contains the form, or you are using a separate file to process the form. Let's assume, for the moment that you're processing the form in the same file that displays the form.

Add the following code to the very top of your form page, before everything else, including the DOCTYPE:

<?php
ini_set('session.use_only_cookies',1);
session_start(); ?>

That's simple enough. All that does is tell PHP to only use cookies to only use cookies to store the session id and to begin a session. This does, however, require the user's browser to support cookies for them to be able to use the form. It's very rare that a user would have session cookies turned off, so that's a risk I'm willing to take.

Now, in the portion of the file that actually displays the form, include the following:

<?php
   session_register("SESSION");
?>
  	

This just registers a session value named "SESSION". Now, in the portion of the file that processes the form, add this to the validation portion of the code:

if (!session_is_registered("SESSION")){
   $errors[] = "Invalid form submission";
}

All that does is check for the existence of the session value and, if it does not exist, adds an error to our errors array. An automated "bot" attempting to submit your form normally won't store cookies, so the session won't exist. If you haven't read it already, the error array refers to the validation we talked about in Lesson 2. Once you've implemented this method of testing for a legal form submission, you can remove any testing you were doing using the HTTP_REFERER value.

One other thing that you should test is to make sure that no illegal characters are inserted in the user submitted values. The reason for this is a technique of abusing e-mail forms called e-mail injection. You can read more about what e-mail injection is at the Secure PHP Wiki. I won't reinvent the wheel by going over what's already posted all over the Internet. I will, however, suggest a simple work around to avoid the vulnerability is to ensure that no new line characters are in any fields you use in the e-mail headers. One way to do that is:

if(preg_match("/\r|\n/",urldecode($from))){
   $errors[] = "Invalid form submission";
}else{
	$headers = "From: $from";
}

What's that do? It simply tests the value of the $from variable to see if a newline character is contained in it. If so, it creates the error that we handle leter. If not, it uses the value as the "from" address in the e-mail.

Magic Quotes

The other issue I mentioned would be covered later was the use of backslashes to escape quotation marks in form data. To restate the problem, PHP uses backslashes to escape quote marks in form data when it is submitted. What that means is that, if you have a text input field in a form, and the user enters a string that includes quotation marks, PHP will "escape" those quotation marks with backslashes. All that really means is that PHP will insert a backslash character immediately before the quote mark to indicate that it is part of the data and not an actual string delimiter. Consider this input:

If the quote marks were not escaped, your string would include the quote marks and PHP would not be able to tell that they were not intended to delimit the string. It would look like this:

   $_POST['var']="Doc "Hollywood" Jones";
	

See the problem? Here, you have $var="Doc " which makes a complete PHP assignment. However, you still have more on the line: Hollywood" Jones";. PHP will not be able to parse that because it's a nonsense command. PHP will generate a compiler error and the command will not execute. If you want to embed quotation marks in a string, they must be escaped with the backslash character. A properly formed command would then look like this:

   $_POST['var']="Doc \"Hollywood\" Jones";
	

The backslash tells PHP that the following character is not a delimiter, but is instead, just part of the value. PHP, in a "normal" installation will default to automatically including the backslash character in any form data that is passed to it. This is handled by the setting of the magic_quotes_gpc configuration setting. When this is turned on, any double quote, single quote, backslash, or null value in the user input will be escaped with the backslash character. The "gpc" simply means GET, POST, and COOKIE.

As I mentioned, the normal default is for this to be turned on. It is possible for a particular installation to turn magic quotes off. This could lead to unexpected results when using the stripslashes() function. Consider what would happen if the user typed a backslash character in the input. Say, for example, the user typed: stop\go. When magic quotes are turned on, that backslash character will be escaped with another backslash and the result will be stop\\go. There are two backslash characters there. The first one tells PHP that the second one is not an escape character, but it is really part of the data. If you apply the stripslashes() function to it, it will then look like it's supposed to: stop\go. Now, think about what would happen if magic quotes were turned off. When you get the value, it will look like this: stop\go. If you apply the stripslashes() function to it, it turns into this: stopgo. Clearly, this would not be good.

Fortunately, the developers of PHP realized that you need to be able to detect whether magic quotes are turned on or off. For this, they included the get_magic_quotes_gpc() function. It returns true (1) for ON, or false (0) for OFF. You can use this to detect whether or not you should apply the stripslashes() function. The implementation of this might look something like this:

   if(get_magic_quotes_gpc()){
      $var1=stripslashes($_POST['var1']);
      $var2=stripslashes($_POST['var2']);
   }else{
      $var1=$_POST['var1'];
      $var2=$_POST['var2'];
   }
	

Easy enough, right? If you have several items to get, you can wrap them all inside a single call to get_magic_quotes_gpc(), or if you have only a few you can do them like this:

$var=get_magic_quotes_gpc()?stripslashes($_POST['var']):$_POST['var'];
	

All that does is tests the value returned from the get_magic_quotes_gpc() function and, if true, assigns the value following the question mark (?), otherwise, it assigns the value following the colon (:). The result is exactly the same, regardless of which technique you use. Like I said, if you have a lot of variables to assign, the first is slightly more efficient because it only calls the get_magic_quotes_gpc() function once instead of calling it repeatedly for every variable you assign.

In Conclusion

So that's it! We've come to the conclusion of our little dissertation. While it may not address every possible situation you might encounter, I hope it's given you a solid foundation that you will be able to draw upon if your needs exceed the scope of this tutorial. As I said at the beginning, our goal was to help you learn how it works, so that you can create your own custom script that does exactly what you want. I hope I've succeeded and you've found it helpful.

Home