Chapter 17
Voting Booths
CONTENTS
- Voting Booths-Gathering and Managing Opinions
- Starting Simple-A Low-Level Voting Booth
- Bad Voting Booth, Good Code-Technical Merits of greenegg.cgi
- Use of Associative Arrays for the Handling of GET/POST Method Data
- Immediate Error Trapping Using the Statement Modifier Form of if
- Slurping Data with @array = <FILEHANDLE>;
- Use of Output Filter to Provide stdin to /usr/sbin/sendmail and sendmail -t Flag
- Use of print FH<<END; ... END Syntax When Outputting Formatted Sections
- A Voting Booth Wish List
- Multistate CGI Programs-More Than Just a URL
- Generating Meaningful Reports
- Summary
While there is no special technical distinction to gathering information in a "voting booth" style, voting booths are used quite often on the Web. A large part of this popularity is due to the social familiarity with surveys, polling, and voting. Also, the structure of a voting booth lends itself well to the technology of forms. The measure of a voting booth is more a function of the database back end than the complexity of the form that creates the data. This chapter presents examples of simple and complicated voting booth CGI programs.
Voting Booths-Gathering and Managing Opinions
Since the Web began, people have been using it to gather information. An interesting turn-around can be engineered with CGI programming: Web pages can use people to gather information. There are many different strategies on how to accomplish this: the soliciting of orders, the active tracking of users, and the creation of threaded discussion areas are a few examples you'll find in this book. In addition to these methods, a very popular scheme for gathering information from users is the voting booth.The concept of a voting booth should be intuitively obvious, but at the same time the words "intuitively obvious" seem like the computer-industry platitude equivalent to "at least you still have your health." With this in mind, I'll take a second to describe the procedure of on-line voting:
- A user loads an HTML page with a form area.
- The user answers questions posed by the form.
- The user submits the form through either a SUBMIT button or an Imagebutton.
- The form data is sent to a CGI program for parsing and processing.
- Reports based on the submitted data are generated. These reports could be in the form of dynamically generated Web pages, e-mail, server-based data files, images created on-the-fly, and so on.
Starting Simple-A Low-Level Voting Booth
Though obvious, it must be stated that there are both good and bad voting booths on the Web. I think the difference between the two is- The quality of the report generated by the CGI program
- The degree of data management capability the voter and administrator have over the votes
http://www.anadas.com/cgiunleashed/voting/greenegg.html
Listing 17.1. greenegg.html-The HTML half of a simple voting booth system.
<HTML>
<HEAD><TITLE>Are green eggs and ham meant for you?</TITLE></HEAD>
<BODY>
<H2>Funny Foods Grillhouse Customer Survey</H2>
<P>
Funny Foods is conducting a survey of our restaurent goers to test the
marketability of a new main course we're planning.
<P>
<FORM METHOD=GET ACTION=exe/greenegg.cgi>
Do you like green eggs and ham?<BR>
<SELECT SIZE=2 NAME=disposition>
<OPTION SELECTED VALUE="dislike">I do not like them, Sam I am</OPTION>
<OPTION VALUE="like">Yes, I'll try them, Sam I am</OPTION>
</SELECT>
<P>
If yes, what is the most you would pay for green eggs and ham at a
restaurant?<BR>
<SELECT NAME=pay>
<OPTION SELECTED>[select an amount]</OPTION>
<OPTION>$6</OPTION>
<OPTION>$9</OPTION>
<OPTION>$12</OPTION>
<OPTION>$15</OPTION>
<OPTION>$18</OPTION>
<OPTION>$21</OPTION>
</SELECT>
<P>
<INPUT TYPE=SUBMIT VALUE="Click here when finished">
</FORM>
</BODY></HTML>
This voting booth isn't particularly good. In my humble, objective, and removed opinion, it's a fine computer program. However, it doesn't live up to the two points I made previously. I'll discuss its technical merits and also examine how a better voting booth could be constructed. For your viewing delight, and to save you from booting up your browser, I've provided you with a screen shot of my simple voting booth page in Figure 17.1.
Figure 17.1 : The entry page of my simple voting booth.
Listing 17.2. greenegg.cgi-A simple voting booth CGI program.
#!/usr/bin/perl
#
# The following lines are "initializations" of one sort or another.
#
# Redirecting standard error to /dev/null is handy when dealing with
# the 'sendmail' UNIX command since sendmail has an odd habit of having
# errors it may encounter being piped to the HTTPd. This is the only
# circumstance where I've seen standard out (STDOUT) and standard error
# (STDERR) get confused with each other.
#
chop($time = 'date');
$refpage = 'http://www.anadas.com/cgiunleashed/voting/greenegg.html';
$admin_email = 'rdice@anadas.com';
open(STDERR,"> /dev/null");
# -- start of my standard GET/POST method handler --
#
# At the end of this section of code, I'll have the %tokens associative
# array with the 'name' information from the submitting form as keys and
# 'value' information as values. Hex-encoded special characters will be
# restored to their original characters. Note that many browsers will
# return annoying DOS-style CRLF at line end in textarea boxes. This
# code will _not_ strip ^M. If this is desirable, use
# $tokens{$field} =~ s/\cM/ /g;
# $tokens{$field} =~ s/\n//g;
# as the last lines in the foreach loop.
#
if ( $ENV{REQUEST_METHOD} eq 'POST' ) {
read(stdin,$input,$ENV{CONTENT_LENGTH});
} elsif ( $ENV{REQUEST_METHOD} eq 'GET' ) {
$input = $ENV{QUERY_STRING};
} else {
print "Content-type: text/html\n\n";
print "This program doesn't support the <b>$ENV{REQUEST_METHOD}</b> httpd";
print " request method.\n";
exit 1;
}
$input =~ tr/+/ /;
@fields = split(/\&/,$input);
$input = '';
foreach $i (@fields) {
($field,$data) = split(/=/,$i);
$field =~ s/%(..)/pack("c",hex($1))/ge;
$data =~ s/%(..)/pack("c",hex($1))/ge;
$tokens{$field} = $data;
}
@fields = (); # delete the @fields array
# -- end of my standard GET/POST method handler --
#
# The next 3 lines trap errors. The first defuses improper accesses to
# greenegg.cgi -- this CGI program -- while the other two trap for
# illogical form submissions. The format is:
# goto subroutine if error condition
# The various subroutines print to the HTTPd and then abort the program.
#
&refpage_error if $ENV{HTTP_REFERER} ne $refpage;
&dislike_error if ( ($tokens{'disposition'} eq 'dislike') &&
($tokens{'pay'} ne '[select an amount]') );
&like_error if ( ($tokens{'disposition'} eq 'like') &&
($tokens{'pay'} eq '[select an amount]') );
#
# To reach this point, there must be no input errors in the program.
# Now, open the database file of historical responses
#
open(DB,"greenegg.dat");
chop(@dataline = <DB>);
close(DB);
#
# parse each line, update as appropriate (Didn't like or Liked and
# pay amount)
#
foreach ( @dataline ) {
($name,$count) = split(/:\t/);
if ( (($name eq 'Didn\'t like') && ($tokens{'disposition'} eq 'dislike'))
|| (($name eq 'Liked') && ($tokens{'disposition'} eq 'like'))
|| ( $tokens{'pay'} eq $name ) ) {
$count += 1;
$_ = join(":\t",$name,$count);
}
}
#
# write new version of data to file
#
open(DB,"> greenegg.dat");
foreach ( @dataline ) {
print DB "$_\n";
}
close(DB);
chmod 0660, "greenegg.dat";
#
# Construct a formatted email to send to the administrator of the Web site.
# The email is composed of 4 sections: The header (parsed by sendmail),
# environment info, info regarding the immediate form submission, and
# historical submission info.
#
open(EM,"| /usr/sbin/sendmail -t");
print EM <<END;
To: $admin_email
From: "Funny Food Form"
Subject: Submission of Funny Food Form
The Funny Food Grillhouse Web questionnaire has just received a submission!
Time of Submission: $time
Using Browser: $ENV{HTTP_USER_AGENT}
From Host: $ENV{REMOTE_HOST}
END
if ( $tokens{'disposition'} eq 'like' ) {
print EM "The respondant likes green eggs and ham and would be willing ";
print EM "to pay $tokens{'pay'} for a green eggs and ham main course.\n";
} else {
print EM "The respondant doesn't like likes green eggs and ham...",
" yet.\n";
}
print EM "\nHistorical Responses\n";
print EM "--------------------\n\n";
foreach ( @dataline ) {
print EM "$_\n";
}
close(EM);
#
# Jump to a subroutine which will send formatted HTML to the HTTPd. The
# message thanks the user for the form submission, including a report
# on their submission information and also historical information.
#
&thank_user;
exit 0;
sub refpage_error {
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Improper refering page!</TITLE></HEAD>
<BODY>
<P>
The page which invoked this CGI program was not
<A HREF=$refpage><B>$refpage</B></A>. Please re-submit this form using
that page.
</BODY></HTML>
END
exit 1;
}
sub dislike_error {
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Improper Input!</TITLE></HEAD>
<BODY>
<P>
If you don't like green eggs and ham then it doesn't make sense to say
you'd buy them for ANY price. Please press the BACK button on your browser and re-enter your reply.
</BODY></HTML>
END
exit 1;
}
sub like_error{
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Improper Input!</TITLE></HEAD>
<BODY>
<P>
Since you said you like green eggs and ham it is required that you provide
a price you'd buy them at. Please press the BACK button on your browser
and re-enter your reply.
</BODY></HTML>
END
exit 1;
}
sub thank_user {
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Thank you!</TITLE></HEAD>
<BODY>
<P>
Thank you for taking the time to fill out Funny Food's questionnaire. Your
input has been added to the database. Now, you can see how your replies
stack up against what other people have submitted:
END
print "<P>You said that you <B>$tokens{'disposition'}</B> green eggs",
" and ham.\n";
if ( $tokens{'disposition'} eq 'like' ) {
print "<P>You would pay as much as <B>$tokens{'pay'}</B> for a " ,
"green eggs and ham main course.\n";
}
print "<H3>Historical Information</H3><HR><PRE>\n";
foreach ( @dataline ) { print "$_\n"; }
print "</PRE>\n";
print "</BODY></HTML>\n";
}
This code handles the fairly simple form data generated by greenegg.html. The specific actions of this voting booth program are as follows:
- Get and decode the form data.
- Check for errors. If any are found, display an error message to the Web and abort the voting process.
- Update the database file with the newly submitted information.
- Compose an e-mail containing information on the current submission and send it to both the voter and the CGI system maintainer.
- Output a message thanking the user for voting to the Web. This message contains a report on both the current vote submission and votes in the past. (as shown in Figure 17.2).
This example has managed to capture the essentials of a voting booth. The process asks your voting/polling-style questions, and it generates a report that contains historical information regarding all votes submitted to date. Still, I find that the "feel" produced by this voting booth to be lacking. Yes, the questions are phrased in a vote-like way, but really there isn't much difference between this voting booth and a page that asks you whether or not you'd like to purchase a charity gift-basket and how much you'd like to pay for it-yet this example would be considered an on-line ordering form.
Usually, there would be no utility in providing users with a report of how many other people had bought charity baskets and a payment breakdown. So in this sense, Funny Foods has managed to create a bona fide voting booth. The display of the voting booth is fairly primitive, though, and it's questionable as to how much useful information can be gleaned from this report.
Bad Voting Booth, Good Code-Technical Merits of greenegg.cgi
I've already talked about how I feel that greenegg.cgi falls short as a voting booth. Of course, I did this on purpose; my hope is that you'll learn how to build a better voting booth from my (intentional) mistakes. However, one aspect of greenegg.cgi I will not compromise is its coding. Bad philosophy can be examined, but bad code should never be duplicated.Having said that, I will state that there is one aspect of this code that could be improved upon in theory: modularity. What I refer to as my standard GET/POST method handler could be put in a subroutine, called as &subroutine_name and defined by sub subroutine_name { ... }. Even better, it could be encapsulated as a subroutine in an external file. Within the Perl program, the subroutine would still be called as &subroutine_name, but its definition would be within another file. The Perl interpreter/compiler would be instructed to find this file by placing the following line at the top of your Perl program, immediately following #!/usr/bin/perl:
require "external_file_name.pl";In other words, I could have re-created cgi-lib.pl. This is a very popular package that does essentially the same job as my standard GET/POST handler, but it also has a lot of other useful CGI-handling features. This Perl library file can be found at
http://www.bio.cam.ac.uk/cgi-lib/
Caution |
Please don't take this observation as an unmitigated endorsement! Though cgi-lib.pl is very popular, it does have its detractors. For a critical analysis of cgi-lib.pl, you can go to |
http://www.perl.com/perl/info/www/!cgi-lib.html
For more modularity, the section within the main portion of the program that handles the creation of the e-mail could also be brought within a subroutine.
Now, let's look at the positive points of greenegg.cgi:
Use of Associative Arrays for the Handling of GET/POST Method Data
Perl's associative array feature allows data to be arranged in an array that is indexed by character strings rather than integers. This lets the programmer easily preserve the name-value pair structure that forms pass to CGI programs through GET and POST. If you wanted to keep this structure without associative arrays, as you might if you programmed in C, you could build parallel arrays and supply a backwards-indexing scheme. Me, I'll stick to Perl.Caution |
Associative arrays have a slight amount of difficulty when presented with name/value pairs generated by <INPUT TYPE=chECKBOX> or <SELECT MULTIPLE>. These sorts of form elements can create several name-value pairs that share the same name. Using the standard handler I provided, those memory locations in the array that would all be indexed by the same name will be overwritten with the newest value. Schemes to avoid this problem can be built as best suits the situation at hand. |
Immediate Error Trapping Using the Statement Modifier Form of if
Programmers who are familiar with C will easily recognize the meaning of the following code:if (comparison) {This structure is also perfectly valid in Perl, too. However, a simple C if structure could be written as follows:
statement 1;
...
statement n;
}
if (comparison) statement;This is not the case in Perl; Perl doesn't allow dangling conditionals. The Perl equivalent of this code would be
if (comparison) { statement; }To "make up" for this (non) deficiency, Perl has a statement-modifier if structure:
statement if comparison;Many people who migrate to Perl from C find this style of if structure somewhat disconcerting the first time they see it. Many of those people eventually learn to love it, though. I am a member of both these sets. It's a clear, concise, and intuitive way of writing an if statement.
(statement) if comparison;
Slurping Data with @array = <FILEHANDLE>;
In more traditional languages, one would write a loop that would read a file line by line until EOF was encountered. This can certainly be done in Perl, as well:while ( $line = <FILEHANDLE> ) {This sort of loop is essential in Perl if you wanted to do more than just read lines into an array; for example, if you wanted some sort of if (condition) to decide whether or not to include $line in the array or not. But if not, you can accomplish the same in less code and less run-time with
@array[$i++] = $line;
}
@array = <FILEHANDLE>;
Note |
In the case of a file attached to a filehandle, it's easy to see where the <FILEHANDLE> will stop being read into the array-at the end-of-file. However, you can also use the default standard input filehandle, STDIN, in this context. I'm not sure you'd want to, though. How does @array know when EOF has been reached? When standard input encounters a |
Use of Output Filter to Provide stdin to /usr/sbin/sendmail and sendmail -t Flag
E-mail gateways are often used in CGI programming because they are among the few standard UNIX tools that lend themselves to CGI development. The preferred UNIX options for sending e-mail composed by a CGI program are mail and sendmail. I prefer sendmail because it is the more powerful of the two. sendmail takes its data from stdin and sends the message after it reaches EOF or a line with single . on it.One method of utilizing sendmail from Perl would be to write to-be-e-mailed data to a temporary file and then cat it and pipe the output to sendmail, as follows:
open(EMAIL,"> tempfile$$.txt");The $$ in the name tempfile$$.txt is a special variable in Perl that represents the process identification number (PID) of the Perl process. I do this so that tempfile isn't accidentally overwritten if two people invoke this CGI program simultaneously; there are two unique tempfiles with this method. This is a Good Idea that can be used in many applications. In this particular case, there are better ways to handle the situation, though.
print EMAIL "stuff to be emailed... la la la...\n";
close(EMAIL);
system("cat tempfile$$.txt | /usr/sbin/sendmail $tokens{'email'}");
system("rm tempfile$$.txt");
The sendmail command finds its destination e-mail address from the associative array element $tokens{'email'}. You can assume that this variable is obtained from the user putting this value into his or her form submission. Without any safeguards, this is a Bad Thing. Consider the malicious hacker who provides the following "e-mail address" to your Web form:
noone@nowhere.net ; cd / ; rm -R *The semicolons are shell metacharacters that signify the end of a UNIX command to the shell interpreter. Should any Webmaster be foolish enough to run their httpd as root, that example would delete the entire file system. While most Webmasters haven't set up their systems for this kind of disaster, lesser evils can be committed if unchecked user input is allowed to reach the shell.
This problem is avoided in my greenegg.cgi code in two ways. First, I avoid the intermediate tempfile and instead open a filehandle directly to the pipe that sendmail uses to receive standard input. Second, I use the -t option of sendmail that instructs it to parse standard input to look for an e-mail address rather than get the address from the command line. With this, no user input is brought to the attention of the shell at all.
Tip |
If user input must go to the shell, then you might consider using the following Perl subroutine on it first: |
sub shell_proof {
local(@strings);
@strings = @_;
foreach $string ( @strings ) {
$string =~
s/[\001-\011\013-\014\016-\037\041-\052\057\074-\077\133-\136\140\173-
377]//g;
}
@strings;
}
This code will remove shell metacharacters and binary characters from strings. Its syntax of use is
@string_list = &shell_safe(@string_list);
Use of print FH<<END; ... END Syntax When Outputting Formatted Sections
Perl has a strong built-in facility for printing the formatted output of a large number of consecutive lines of text.$things = 'variables';This syntax is taken from UNIX Bourne shell programming, where it is very useful when writing batch jobs. I find this feature of Perl programming to be particularly useful when writing HTML within Perl.
print <<END_DESCRIPTOR;
Now you type
what you want to
on as many lines as you'd like and you can even include
$things which will be interpolated to their values.
END_DESCRIPTOR
Note |
Emacs, the ultimate UNIX-based text editor for hacking, has built-in modes for dealing with many different programming languages: C, C++, FORTRAN, Lisp, Prolog, and Perl to name a few. Among other things, these modes will automatically format your code as you type it. This is useful for discovering certain sorts of errors before you even touch the compiler. Unfortunately, emacs' Perl mode has a tough time figuring out how to format your source following a print <<END_DESCRIPTOR; statement. We can hope that this bug will one day be fixed. |
A Voting Booth Wish List
I think that my main complaint of greenegg.cgi is that it didn't go far enough. Let's all try closing our eyes and clicking our heels together three times, thinking about what we'd most want in a voting booth CGI system...- The voting booth would recognize individual users, each having a unique e-mail address.
- Users could not only vote but re-vote should they change their minds in the future.
- There would be an extensible database of vote-able objects.
- There would be different classes of objects within this database.
- The reports that were generated by the voting booth were meaningful.
Figure 17.3 shows the vote.cgi entry page showing the various options a user would have when reaching it: registering as a new user, logging in as a current user, and viewing the vote patterns of all users. I have added my e-mail address into the field that shows how a new user is registered.
Figure 17.3: The vote.cgi entry page.
I've really put myself on the line with this example. vote.cgi gives the whole world an opportunity to tell me what they think about my taste in music. (Hey, cut me some slack. I happened to have a tab-delimited ASCII file of my music handy, and I needed to give myself a voting booth-come-database project, so....) The Perl source code for vote.cgi is shown in Listing 17.3.
Listing 17.3. vote.cgi-A complex voting booth CGI program.
#!/usr/bin/perl
#
# This program written by Richard Dice of Anadas Software Development
# as part of the Sams Net "CGI Programming Unleashed" book. The author
# intends this code to be used for instructional purposes and not for
# resale or commercial gain.
#
# Any questions or comments regarding this program are welcome. You
# may contact the author by Internet email: rdice@anadas.com
#
#
# minor set-up --> seed the random number generator (used in issuing
# passwords) and define the $ref_url, the only URL that this program
# will accept FORM-based submissions from. I redirect standard error
# so that sendmail won't crash the system if it can't find an address
#
$ref_url = 'http://www.anadas.com/cgiunleashed/voting/exe/vote.cgi';
srand;
open(STDERR,"> /dev/null");
$admin = 'rdice@anadas.com';
# -- start of my standard GET/POST method handler --
if ( $ENV{REQUEST_METHOD} eq 'POST' ) {
read(stdin,$input,$ENV{CONTENT_LENGTH});
} elsif ( $ENV{REQUEST_METHOD} eq 'GET' ) {
$input = $ENV{QUERY_STRING};
} else {
print "Content-type: text/html\n\n";
print "This program doesn't support the <b>$ENV{REQUEST_METHOD}</b>",
" httpd request method.\n";
exit 1;
}
$input =~ tr/+/ /;
@fields = split(/\&/,$input);
$input = '';
foreach $i (@fields) {
($field,$data) = split(/=/,$i);
$field =~ s/%(..)/pack("c",hex($1))/ge;
$data =~ s/%(..)/pack("c",hex($1))/ge;
$tokens{$field} = $data;
}
@fields = (); # delete the @fields array
# -- end of my standard GET/POST method handler --
#
# MULTISTATE DETERMINATION
# ------------------------
# Switch to one of the possible actions given the value of the 'submit'
# name
#
# * if the submitting URL isn't this URL and yet there is form input,
# likely someone is trying to hack the system
# * if no form input is encountered, it's the first access to this page
# * view historical statistics
# * allow a registered user to vote
# * process form following voting
#
if ( ($ENV{HTTP_REFERER} ne $ref_url) && defined(%tokens) ) {
&referer_error;
}
&entry_page if !defined($tokens{'submit'});
&new_user($tokens{'email'}) if $tokens{'submit'} eq 'Register a New User';
&view_stats if $tokens{'submit'} eq 'View Historical Statistics';
®_user if $tokens{'submit'} eq 'Registered Users Proceed';
&process_votes if $tokens{'submit'} eq 'Submit these Votes';
exit 0;
#
# Some URL other than $ref_url attempted to make a form submission
#
sub refer_error {
print <<END;
<HTML><HEAD><TITLE>Refering page error</TITLE></HEAD>
<BODY>
<H2>Refering Page Error</H2>
<P>
The page which is being used to access this CGI program is not permitted
to invoke the program. Please use this program via:
<P>
<A HREF=$ref_page>$ref_page</A>
END
}
#
# Entry without form submission : throw up an HTML page which presents
# a user with 3 area options. Note that a different VALUE is associated
# with each INPUT TYPE=SUBMIT button. The Multistate switcher in the
# main of this program looked for that value to determine which state to
# employ
#
sub entry_page {
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Welcome to Richard's Vote-able Music Database</TITLE></HEAD>
<BODY>
<H3>Choose one of the following options for interacting with Richard's
Vote-able Music Database:</H3>
<HR>
<FORM METHOD=POST ACTION=$ref_url>
<INPUT TYPE=SUBMIT NAME="submit" VALUE="View Historical Statistics">
</FORM>
<HR>
<FORM METHOD=POST ACTION=$ref_url>
<TABLE>
<TR><TD VALIGN=TOP ALIGN=LEFT COLSPAN=1><B>Email Address:</B></TD>
<TD VALIGN=TOP ALIGN=LEFT COLSPAN=1><INPUT TYPE=TEXT NAME="email"></TD>
<TD VALIGN=TOP ALIGN=LEFT COLSPAN=1>
<INPUT TYPE=SUBMIT NAME="submit" VALUE="Register a New User">
</TD></TR></TABLE>
</FORM>
<HR>
<FORM METHOD=POST ACTION=$ref_url>
<TABLE>
<TR><TD VALIGN=TOP ALIGN=LEFT COLSPAN=1><B>Email Address:</B></TD>
<TD VALIGN=TOP ALIGN=LEFT COLSPAN=2><INPUT TYPE=TEXT NAME="email"></TD>
<TD VALIGN=CENTER ALIGN=LEFT COLSPAN=1 ROWSPAN=2>
<INPUT TYPE=SUBMIT NAME="submit" VALUE="Registered Users Proceed"></TR>
<TR><TD VALIGN=TOP ALIGN=LEFT COLSPAN=1><B>Password:</B></TD>
<TD VALIGN=TOP ALIGN=LEFT COLSPAN=1><INPUT TYPE=PASSWORD NAME="password"></TD>
</TD></TR></TABLE>
</FORM>
<HR>
</BODY></HTML>
END
}
#
# a user has asked for a new account. Gives a fairly random 6 digit
# number to them as their password via email
# Also, creates an empty .cdd file -- this file will sort this users
# vote information
#
sub new_user {
local ($email);
($email) = @_;
#
# Trap for no email address entry. It's still very possible for someone
# to feed the program a non-existant email address, or a random string of
# characters for that matter, but it won't crash the program and it won't
# do the offending person any good, either, as their password will be
# provided to them by email anyhow
#
$email =~ s/\s//g; # eliminates all whitespace in email address
if ( $email eq '' ) {
print "Content-type: text/html\n\n";
print <<END;
<HTML><HEAD><TITLE>No Email Address was entered</TITLE></HEAD>
<BODY>
<P>
Either nothing was entered into the email address field or only whitespace
characters were entered. Return to the <A HREF=$ref_url>home page</A> of this
site to re-start the process.
</BODY></HTML>
END
} else {
#
# create a password of 6 mostly random digits... check to see if a data file
# with that password already take exists... if so, add one to the password
# and check to see if that exists... repeat until available password is found
#
$pswd = rand; # put a random number into pswd
substr($pswd,$[,2) = ''; # remove 2 digits from its front
# to this number, add the number of seconds since 1970 & the current PID
$pswd = $pswd + time + $$;
reverse $pswd; # reorder string back to front
substr($pswd,$[+6) = ''; # take the first 6 characters
while ( -e ($pswd . '.cdd') ) { $pswd++; } # checks for file Existance
#
# create the datafile with name "PASSWORD.cdd"
# Store email address and password as data fields
#
$passfilename = $pswd . '.cdd';
open(DF,"> $passfilename");
print DF "Email Address\t$email\n";
print DF "Password\t$pswd\n";
close(DF);
chmod 0660, $passfilename;
#
# Send email to new subscriber telling them what their password is
#
open(EM,"| /usr/sbin/sendmail -t");
print EM <<END;
To: $email
Cc: $admin
From: "Richard's Music Database Program"
Reply-To: "Richard Dice" <rdice\@anadas.com>
Subject: Welcome, new subscriber!
You are now a subscriber to Richard's Music Database. This allows you to
vote on what you think of his music collection.
To access your account, go to:
$ref_url
Email Address : $email
Password : $pswd
Hope it's fun for you!
END
close(EM);
#
# Output a message to the Web providing further instructions
#
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Welcome, New Registered User!</TITLE></HEAD>
<BODY>
<P>
<H3>Welcome, New Registered User!</H3>
<P>
You will be receiving an email shortly which will tell you your password.
Once you have that information, please go back to the
<A HREF=$ref_url>home page</A> and enter as a registered user.
</BODY></HTML>
END
}
}
sub view_stats {
$num_votes = 0;
#
# construct @line array of all cd.txt datafile lines which have voted-upon
# albums, and @shortlist array of all such lines which are also marked
# as being on the short list (that is, the 3rd datafield is a "*").
# Also, create running total of the number of votes recorded and the total
# 0-10 votes submitted
#
open(CD,"cd.txt");
while ( <CD> ) {
chop;
@field = split(/\t/,$_,5);
if ( $field[3] != 0 ) {
push(@line,$_);
$vote_total += $field[4];
$num_votes += $field[3];
push(@shortlist,$_) if $field[2] eq '*';
}
}
close(CD);
if ($num_votes != 0 ) {
$average = $vote_total / $num_votes;
} else { $average = 0; }
#
# order both @line and @shortlist arrays by descending order of album
# average vote score
#
(@line = sort by_average @line) if defined(@line);
(@shortlist = sort by_average @shortlist) if defined(@shortlist);
#
# output reports to the Web -- first is total # of voters, then Average Vote,
# then table of @line-related information... all pretty straightforward stuff
#
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Historical Voting Record</TITLE></HEAD>
<BODY>
<P>
Here is the historical record of all votes taken regarding Richard's Music.
<P>
END
print "<B>Number of votes cast: $num_votes <BR>\n";
printf("Average Score of Album: %5.2f </B>\n",$average);
print "<H3>All Albums</H3>\n";
print "<TABLE BORDER WIDTH=100%><TR>\n";
print "<TR><TH>Artist</TH><TH>Album</TH><TH># of Votes</TH>",
"<TH>Vote Ave.</TH></TR>\n";
foreach ( @line ) {
@field = split(/\t/,$_,5);
print "<TD ALIGN=LEFT VALIGN=TOP>$field[0]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=TOP>$field[1]</TD>\n";
if ( $field[2] eq '*' ) {
$shortcut = &hash($field[0],$field[1]);
print "<TD ALIGN=CENTER VALIGN=TOP COLSPAN=2><A HREF=\"#$shortcut\">",
"See Short List</A></TD></TR>\n";
} else {
print "<TD ALIGN=LEFT VALIGN=TOP>$field[3]</TD>\n";
printf("<TD ALIGN=LEFT VALIGN=TOP>%5.2f</TD></TR>\n",
$field[4]/$field[3]);
}
}
print "</TABLE>\n";
#
# generate "short-list" table... same as above, but also includes standard
# deviation in votes. Also, contains <A NAME> information which is refered
# to by links in the standard list. This hashing scheme of
# "ARTIST _ ALBUM" and then remove all characters not in the range a-z, A-Z,
# 0-9 and _ is used as a standard throughout this program to uniquely
# identify any album.
#
#
# create an array with the file names of all .cdd files within
#
@cddfile = <*.cdd>;
print "<H3>The Short List</H3>\n";
print "<TABLE BORDER WIDTH=100%><TR>\n";
print "<TR><TH>Artist</TH><TH>Album</TH><TH># Votes</TH><TH>Vote Ave.</TH>",
"<TH>Std. Dev.</TH></TR>\n";
foreach ( @shortlist ) {
@field = split(/\t/,$_,5);
#
# Create the standard deviation for an entry in the short list. This is
# done using the formula:
# StDev = ( (1/n) * sigma(i=1,i=n,(VOTE_i - Average Vote)^2) ) ^ 1/2
#
# VOTE_i is determined by parsing each and every .cdd file and checking for
# references to the album currently being parsed for
#
$stdev = 0;
$n = 0;
$average = $field[4]/$field[3];
foreach $cddf ( @cddfile ) {
open(CDD,$cddf);
while ( <CDD> ) {
chop;
@shortline = split(/\t/);
if ( ($shortline[0] eq $field[0]) &&
($shortline[1] eq $field[1]) ) {
$stdev += ($shortline[2] - $average)**2;
$n++;
}
}
close(CDD);
}
$stdev = ($stdev / $n)**0.5;
$shortcutname = &hash($field[0],$field[1]);
print "<TD ALIGN=LEFT VALIGN=TOP><A NAME=\"$shortcutname\">$field[0]</A>",
"</TD>\n";
print "<TD ALIGN=LEFT VALIGN=TOP>$field[1]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=TOP>$field[3]</TD>\n";
printf("<TD ALIGN=LEFT VALIGN=TOP>%5.2f</TD>\n",$field[4]/$field[3]);
printf("<TD ALIGN=LEFT VALIGN=TOP>%7.4f</TD></TR>\n",$stdev);
}
print "</TABLE>\n";
#
# Now, output a graph of how many votes an artist got... I'm not sure
# what the statistical significance of this would but, but it'll look cool,
# and also show how graphs can be produced
#
print "<H3>Graph of Vote Points Per Artist</H3>\n<PRE>\n";
$maxkeylength = -1;
foreach ( @line ) {
@field = split(/\t/,$_,5);
$votes{$field[0]} += $field[4];
$maxkeylength = length($field[0]) if length($field[0]) > $maxkeylength;
}
$maxvotes = -1;
foreach ( values(%votes) ) { $maxvotes = $_ if $_ > $maxvotes; }
foreach ( sort by_votes keys(%votes) ) {
printf("%${maxkeylength}s : ",$_);
$i = 1;
$numstars = 30 * $votes{$_} / $maxvotes;
$numstars = int(++$numstars) if ( ($numstars - int($numstars)) >= 0.5);
while ( $i <= $numstars ) {
print "*";
$i++;
}
print " $votes{$_} vote points\n";
}
print <<END;
</PRE>
<P>
Return to <A HREF=$ref_url>Home Page</A>
</BODY></HTML>
END
}
sub reg_user {
#
# - open user file corresponding to the email address / password pair provided
# - display error page if file can't be found or email/password don't match
# - create @userfile array containing each line in the .cdd data file
#
$userfilename = $tokens{'password'} . '.cdd';
&password_error if ( !(-e $userfilename) );
open(UF,$userfilename);
chop(@userfile = <UF>);
close(UF);
($email,$passwd) = splice(@userfile,0,2); # remove 1st 2 entries
($discard,$keep) = split(/\t/,$email);
&password_error if ( $tokens{'email'} ne $keep);
@userfile = sort @userfile;
#
# Construct @line array of all entries and @shortlist array for shortlisted
# entries. Then, sort these two arrays alphabetically
#
open(CD,"cd.txt");
while ( <CD> ) {
chop;
@field = split(/\t/,$_,5);
push(@line,$_);
push(@shortlist,$_) if $field[2] eq '*';
}
close(CD);
@line = sort @line;
@shortlist = sort @shortlist;
print "Content-type: text/html\n\n";
print "<HTML>\n<HEAD><TITLE>Here is your voting profile</TITLE></HEAD>\n" ,
"<BODY>\n<H3>Voting Profile of user <I>$tokens{'email'}</I></H3>\n" ,
"<P>To re-cast or remove votes, use the appropriate selection menus " ,
" and submit the form.\n";
print "<FORM ACTION=$ref_url METHOD=POST>\n";
print "<INPUT TYPE=HIDDEN NAME=\"email\" VALUE=$tokens{'email'}>\n";
print "<INPUT TYPE=HIDDEN NAME=\"password\" VALUE=$tokens{'password'}>\n";
print "<INPUT TYPE=SUBMIT NAME=\"submit\" VALUE=\"Submit these Votes\">\n";
print "<INPUT TYPE=RESET VALUE=\"Reset form to its original values\">\n";
print "<P><TABLE BORDER>\n";
print "<TR><TH VALIGN=TOP NOWRAP COLSPAN=3>Short-listed Albums currently",
" Voted Upon</TH></TR>\n";
print "<TR><TH VALIGN=TOP NOWRAP>Artist</TH><TH VALIGN=TOP NOWRAP>Album",
"</TH><TH VALIGN=TOP>Your Vote</TH>\n";
#
# The following loop goes through all entries in the userfile which
# correspond with short-listed entries in the album database
#
foreach $userline ( @userfile ) {
@field = split(/\t/,$userline);
$shortflag = 0;
foreach $shortentry ( @shortlist ) {
@shortfield = split(/\t/,$shortentry,5);
if ( ($shortfield[0] eq $field[0]) && ($shortfield[1] eq $field[1]) ) {
$shortflag = 1;
$shortentry = 'DONE';
$userline = 'DONE';
}
}
next if $shortflag == 0;
print "<TR>\n<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[0]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[1]</TD>\n";
$namehash = &hash($field[0],$field[1]);
print "<TD ALIGN=LEFT VALIGN=CENTER><SELECT NAME=$namehash>\n";
print "<OPTION>[remove vote]</OPTION>\n";
$select{$field[2]} = ' SELECTED';
for $i (0..10) {
print "<OPTION$select{$i}>$i</OPTION>\n";
}
undef(%select);
print "</SELECT></TD></TR>\n";
}
print "<TR><TH VALIGN=TOP NOWRAP COLSPAN=3>Other Short-listed Albums",
"</TH></TR>\n";
print "<TR><TH VALIGN=TOP NOWRAP>Artist</TH><TH VALIGN=TOP NOWRAP>Album",
"</TH><TH VALIGN=TOP>Your Vote</TH>\n";
#
# do the same for each short-list item not already encountered, as marked
# by the DONE flag... notice that now I don't have to use any special
# code to determine what OPTION to mark as SELECTED
#
foreach ( @shortlist ) {
next if $_ eq 'DONE';
@field = split(/\t/,$_,5);
print "<TR>\n<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[0]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[1]</TD>\n";
$namehash = &hash($field[0],$field[1]);
print "<TD ALIGN=LEFT VALIGN=CENTER><SELECT NAME=$namehash>\n";
print "<OPTION SELECTED>[no current vote]</OPTION>\n";
for $i (0..10) {
print "<OPTION>$i</OPTION>\n";
}
print "</SELECT></TD></TR>\n";
}
print "<TR><TH VALIGN=TOP NOWRAP COLSPAN=3>Long-listed Albums currently",
" Voted Upon</TH></TR>\n";
print "<TR><TH VALIGN=TOP NOWRAP>Artist</TH><TH VALIGN=TOP NOWRAP>Album",
"</TH><TH VALIGN=TOP>Your Vote</TH>\n";
#
# do the same for already voted-upon long-listed entries... all short-listed
# entries will be listed as DONE so skip
#
foreach $userline ( @userfile ) {
next if $userline eq "DONE";
@field = split(/\t/,$userline);
$userline = "DONE";
foreach $entry ( @line ) {
@entryfield = split(/\t/,$entry,5);
if ( ($entryfield[0] eq $field[0]) && ($entryfield[1] eq $field[1]) )
{ $entry = "DONE"; }
}
print "<TR>\n<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[0]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[1]</TD>\n";
$namehash = &hash($field[0],$field[1]);
print "<TD ALIGN=LEFT VALIGN=CENTER><SELECT NAME=$namehash>\n";
print "<OPTION>[remove vote]</OPTION>\n";
$select{$field[2]} = ' SELECTED';
for $i (0..10) {
print "<OPTION$select{$i}>$i</OPTION>\n";
}
undef(%select);
print "</SELECT></TD></TR>\n";
}
print "<TR><TH VALIGN=TOP NOWRAP COLSPAN=3>Other Long-listed Albums",
"</TH></TR>\n";
print "<TR><TH VALIGN=TOP NOWRAP>Artist</TH><TH VALIGN=TOP NOWRAP>Album",
"</TH><TH VALIGN=TOP>Your Vote</TH>\n";
#
# do the same for not voted-upon long-listed entries... all short-listed
# and voted-upon entries will be listed as DONE so skip... note, no fancy
# code to determine which OPTION to mark as SELECTED
#
foreach $entry ( @line ) {
next if $entry eq "DONE";
@field = split(/\t/,$entry,5);
next if $field[2] eq '*';
print "<TR>\n<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[0]</TD>\n";
print "<TD ALIGN=LEFT VALIGN=CENTER NOWRAP>$field[1]</TD>\n";
$namehash = &hash($field[0],$field[1]);
print "<TD ALIGN=LEFT VALIGN=CENTER><SELECT NAME=$namehash>\n";
print "<OPTION SELECTED>[no current vote]</OPTION>\n";
for $i (0..10) {
print "<OPTION>$i</OPTION>\n";
}
undef(%select);
print "</SELECT></TD></TR>\n";
}
print "</TABLE>\n";
print "<P><INPUT TYPE=SUBMIT NAME=\"submit\" VALUE=\"Submit these",
" Votes\">\n";
print "<INPUT TYPE=RESET VALUE=\"Reset form to its original values\">\n";
print "</FORM>\n";
}
sub process_votes {
#
# remove all entries from tokens which amount to no voting information
# for that hashed artist/album pair
#
foreach ( keys %tokens ) {
delete $tokens{$_} if $tokens{$_} eq '[no current vote]';
delete $tokens{$_} if $tokens{$_} eq '[remove vote]';
}
#
# read in old vote information for all albums from cd.txt file
#
open(CD,"cd.txt");
while ( $line = <CD> ) {
chop $line;
@field = split(/\t/,$line,5);
$hash = &hash($field[0],$field[1]);
$albums{$hash,'artist'} = $field[0];
$albums{$hash,'album'} = $field[1];
$albums{$hash,'shortlist'} = $field[2];
$albums{$hash,'num_votes'} = $field[3];
$albums{$hash,'vote'} = $field[4];
push(@hashes,$hash);
}
close(CD);
@hashes = sort @hashes;
#
# read in old vote information for this user from .cdd data file
#
$userfile = $tokens{'password'} . '.cdd';
open(UF,"$userfile");
while ( $line = <UF> ) {
chop($line);
@field = split(/\t/,$line,3);
next if (($field[0] eq 'Email Address') &&
($field[1] eq $tokens{'email'}));
next if (($field[1] eq 'Password') &&
($field[1] eq $tokens{'password'}));
$hash = &hash($field[0],$field[1]);
$albums{$hash,'vote'} -= $field[2];
$albums{$hash,'num_votes'} -= 1;
}
close(UF);
#
# re-write the album list reflecting the new votes
#
open(CD,"> cd.txt");
foreach $hash ( @hashes ) {
print CD "$albums{$hash,'artist'}\t";
print CD "$albums{$hash,'album'}\t";
print CD "$albums{$hash,'shortlist'}\t";
if (defined($tokens{$hash})) {
$albums{$hash,'num_votes'}++;
$albums{$hash,'vote'} += $tokens{$hash};
}
print CD "$albums{$hash,'num_votes'}\t";
print CD "$albums{$hash,'vote'}\n";
}
close(CD);
chmod 0660,"cd.txt";
#
# rewrite the .cdd data with the new vote information
#
open(UF,"> $userfile");
print UF "Email Address\t$tokens{'email'}\n";
print UF "Password\t$tokens{'password'}\n";
delete $tokens{'email'};
delete $tokens{'password'};
delete $tokens{'submit'};
foreach $hash ( keys %tokens ) {
print UF "$albums{$hash,'artist'}\t";
print UF "$albums{$hash,'album'}\t";
print UF "$tokens{$hash}\n";
}
close(UF);
chmod 0660, "$userfile";
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Thank you for voting!</TITLE></HEAD>
<BODY>
<H3>You vote has been received. Please go to the <A HREF=$ref_url>home
page</A> to view statistics.</H3>
</BODY></HTML>
END
}
#
# for use in sorting tabular output in View Statistics option
#
sub by_average {
@line1 = split(/\t/,$a,5);
@line2 = split(/\t/,$b,5);
return ( ($line2[4]/$line2[3]) <=> ($line1[4]/$line1[3]) );
}
#
# for use in sorting bar graphs
#
sub by_votes {
return $votes{$a} <=> $votes{$b};
}
sub password_error {
print "Content-type: text/html\n\n";
print <<END;
<HTML>
<HEAD><TITLE>Invalid Email / Password Pair</TITLE></HEAD>
<BODY>
That email address / password pair is invalid. Please return to the
<A HREF=$ref_url>home page</A> and try again.
</BODY></HTML>
END
exit 1;
}
#
# standard hashing scheme used throughout the program to make a hash string
# out of artist & album strings
#
sub hash {
local($artist,$album);
($artist,$album) = @_;
$artist .= " _ $album";
$artist =~ tr/a-zA-Z0-9_//cd;
return $artist;
}
#
# standard hashing scheme used throughout the program to make a hash string
# out of artist & album strings
#
sub hash {
local($artist,$album);
($artist,$album) = @_;
($artist .= " _ $album") =~ tr/a-zA-Z0-9_//cd;
return $artist;
}
The raw programming power and technology exhibited by this program isn't all that different from greenegg.cgi. Sure, there's a lot more of it, but the commands and techniques are pretty much the same. The major difference lies in the organization of data structures and external files. Also, I've managed to create a few more conceptual aspects.
Multistate CGI Programs-More Than Just a URL
With the greenegg.cgi example, I first included greenegg.html, the file used to invoke the CGI program. This isn't the way it has to be. Using the same &subroutine if (condition); structure that traps errors in greenegg.cgi, I have made vote.cgi a multistate CGI program.The word "state" is thrown around quite a bit in the field of computer science, but I've never really seen a good definition of the term. So, I'll have to make one up myself: state is the property that tells a system that can assume many different forms which of those forms is the one to manifest.
What does this mean in the context of vote.cgi, though? Well, give it a try! Go to
http://www.anadas.com/cgiunleashed/voting/exe/vote.cgiWhen you first enter the page, it presents you with a number of different options. If you then choose one, the form calls vote.cgi, but this time vote.cgi doesn't react by displaying the same entry page. The value of NAME in the <INPUT TYPE=SUBMIT NAME=whatever> tag provides vote.cgi with the information necessary to choose an appropriate state-displaying historical statistics, sending an e-mail with a generated password, and so on. Figure 17.4 shows the interpreted HTML output that vote.cgi produces when displaying historical voting statistics.
Figure 17.4: A portion of the screen showing historical voting information. Notice that the URL is the same as before.
Multistate CGI programs have ups and downs to them. Some of their useful features are that people must follow your chain of events to proceed to the "page" you want them in. Also, you can bundle a whole Web site into one file if you want to-keeping everything in one place can be handy. However, it's harder to upkeep HTML embedded in a CGI program, especially a compiled one.
Generating Meaningful Reports
The game of voting, on the Web as in the rest of life, will eventually reduce itself to playing with numbers. The problem becomes devising a way to make those numbers meaningful.Sorting Lists in Perl
In vote.cgi, I have decided to show output in tabular form. The columns are artist, album, number of votes submitted, and average vote. The observant user of this list will notice that the album with higher vote point averages is on the top of the list, and that vote point averages descend from there. This is not a coincidence.Sorting a list is one of the canonical problems of computer science. The methods people come up with on the spot are often quite bad and usually turn out to be a variant on either the bubble sort or the selection sort. What I mean by "bad" in this context is that they have an order n-squared time complexity. This is compsci mumbo-jumbo meaning that the amount of logical operations required to complete the process is directly proportional to the square of the number of elements in the list. Think of it this way: If there are 10 albums in my database that have to be reported on, the sort will take 10*10 = 100 units of time to finish. If there are 20 albums to be sorted, then it will take 20*20 = 400 units of time. This sort of geometric growth in the time it takes to solve a computer problem absolutely must be avoided in real-world applications where tens of millions of data elements might need to be sorted.
Fortunately, there are better options when it comes to sorting. Two algorithms, Mergesort and Quicksort, are both n*log(n) algorithms. This would mean that the 10 element list would take 10 new-and-improved arbitrary time units to complete, while the 20 element list would take 20*log(20) = 26 arbitrary time units. Note that the arbitrary time units I summon up here aren't the same as before, so we can't compare these two algorithms directly. However, we do notice that there is a factor of 4 difference between the 10 and 20 element lists in the first case, while there is only a factor of 2.6 in the second. n*log(n) algorithms make sorting large lists manageable.
Now for your wake-up call: In Perl, you don't have to sort things directly. Instead, you can take advantage of Perl's built-in n*log(n) sorting function, reasonably called sort.
@destination_array = sort by_criteria @source_array;@source_array holds your list of elements you want to be sorted. @destination_array will receive the sorted list. Note that they can both be the same array.
by_criteria is a subroutine you must build that will provide the sort command with its basis of comparing two elements with each other. Its general structure is
sub by_criteria {Note that you can name by_criteria whatever you find appropriate for the circumstance.
$atemp = $a;
$btemp = $b;
(statements regarding $atemp and $btemp, ultimately producing $value,
a variable with an integer numeric value)
return $value;
}
sort is a special case when it comes to passing information to its by_criteria function. While Perl subroutines usually receive arguments through the @_ special list variable, sort will pick two elements from @source_array and call them $a and $b. $atemp and $btemp aren't strictly necessary, but they can be very useful. Anything that is done within the array that changes $a and $b will change values within @source_array itself! The inclusion of by_criteria is optional; if it isn't supplied, sort will do its job using the standard ASCII collating sequence as its criterion.
Tip |
I often find it difficult to predict which order I'm going to end up sorting an array in using sort. As often as not, I end up sorting the list in exactly the opposite way I was intending. To fix this, I could hunt through my statements and negate of the logic within, but more often, I just use return ($value * -1);. |
Average and Standard Deviation-Statistics 101
Everyone is familiar with the concept of the simple average. For some reason, we all feel very comfortable assigning meaning to an average value. I have decided to perform the tabular sorts on the criterion of descending average vote.When dealing with vast amounts of numerical data, we mere humans can be easily overwhelmed with the quantity of numbers involved, and we start to lose track of the meaning behind those numbers. Sometimes, we don't even know the meaning of the numbers in the first place! This situation has given rise to the science of statistics. Statistics, put simply, are numbers that are used to represent other numbers. A good statistic is one that allows us a greater understanding of the scenario at hand with a lesser amount of sheer numbers.
An average of a set of numbers is an example of a very popular statistic. Not only does it have an intuitive meaning associated with it, but there's a wealth of mathematical study that shows that averages do a very good job of representing a great deal of information in a compact form. However, determining the average of a set of numbers isn't enough. Consider the following two number sets, both with average 5: (5,5,5,5) and (0,0,10,10). Think of these sets as being representative of a score on a scale of 0 to 10. The first set would show us that the thing being measured is wholly average-there is unanimous consent that the thing being polled on is middle-of-the-road. The second set, also averaging to 5, tells an entirely different story: You either love it or hate it.
Representing these two sets with just an average would be misleading; in doing so, we would have lost a great deal of information in the process. To help preserve the information relating to the agreement within the set on the average, a new statistic is introduced: standard deviation.
The concept of standard deviation is almost as intuitive as that of an average. Essentially, once you've determined your average, the standard deviation can be calculated as the average distance between the average value and the values of the members of the set of data points. Stated mathematically, this is written as
s represents the standard deviation, n the number of data elements in the set, -y the average of the data set, and yi the ith data element. S is the summation symbol, mathematical shorthand for "add up everything to the left of me between the value of i that's below me to the value that's on top of me."
I have included the standard deviation of the average of votes for "short-listed" albums in the table reporting on those albums.
Using standard deviation, we see that the standard deviation, s, for the set (5,5,5,5) is 0, while for (0,0,10,10) it is 5. We interpret this to mean that there is no disagreement between the numbers in the first set and the average of the first set, while the average disagreement between the numbers in the second set and their average is 5.
Tip |
There is an alternative (yet equivalent) procedure used to calculate standard deviation: Depending on how your data is stored, this might be the better formula for your program, both in terms of simplicity of programming and speed of execution. |
Visual Output-Creating Graphs
Creating a graph isn't all that mathematically intense. The issues to keep in mind are- Keeping the left margin of the graph aligned, given that your labels will be of different lengths.
- Imposing an upper limit to the length of any given bar in the bar graph. To do this, first make a pass through all your data and determine which is the greatest element. Then, divide all your output by that number.
- Making sure that you "round to the nearest unit" when outputting your bars. Simply relying on your loop structure to print out bar elements can leave you open to bars that are 9 units in length when they should be 10, because your loop has an integer index. This will cause the loop to ignore the fact that your data element was actually 9.9 units. You'll have to do this rounding by hand.
Tip |
Graphical (rather than ASCII) bar graphs can be easily created thanks to features found in modern Web browsers. Here's the procedure: |
- Create a number of single-pixel GIF images, each in a different color.
- Write code that determines how wide (x) and tall (y) you want your bars to be.
- Output to the Web statements of the form <IMG SRC=appropriate_pixel.gif BORDER=0 WIDTH=x HEIGHT=y>.
Now, you have a graphical bar graph that took next to no time to transmit across the Internet (single pixels aren't large image files) and could be either horizontal or vertical with equal ease.
Handling Data Internally and Externally
Imagine a world where computer memory is virtually free, fast as light, and doesn't disappear when you turn off your computer. In that world, there wouldn't be any need to make a distinction between internal and external memory. A hard drive? A CD-ROM? What are those? Memory is...just memory!The world I describe would justifiably be called "Utopia," a word composed from the Latin meaning "no where." Compare this to where we live: Memory exists in internal and external states. Internal memory is fast, operating on a time scale that corresponds with CPU speed. It can also be accessed asynchronously by multiple systems or user processes. However, it's also expensive and vanishes when you turn off your computer. External memory, exemplified by a hard drive, is cheap and relatively permanent but painfully slow. It's also stored sequentially even if it can be read (more or less) randomly. If you are a programmer whose programs end up "swapping to virtual memory," you could end up a "marked man" in many computer labs-I speak from experience!
The difference in characteristics between internal and external memory is one of the great problems of computer science. The solutions are often no more than workarounds. Even in a program as simple as my voting booth example, vote.cgi, I had to create different data representations depending on whether the context is disk based or internal. This section discusses the structures I built and explains my thinking behind making the choices I did in their design.
External Data files
vote.cgi is an engine, but the accompanying data files are its fuel. Here are the first few lines of the main database file, cd.txt:Alice Donut[tab]Donut Comes Alive[tab][tab]0[tab]0A line in cd.txt has five tab-separated fields: artist, album, shortlist status, number of votes, and total vote points. An * is used to flag a short-listed album. Note that I'm using [tab] to represent an actual tab character in the preceding code snippet. I hope this makes it easier to read.
Beastie Boys[tab]Check Your Head[tab]*[tab]1[tab]8
Beastie Boys[tab]Ill Communication[tab][tab]0[tab]0
Blind Melon[tab]Blind Melon[tab][tab]0[tab]0
Bowie, David[tab]Outside[tab][tab]0[tab]0
Bowie, David[tab]Sound + Vision[tab][tab]0[tab]0
Bowie, David[tab]The Rise and Fall of Ziggy Stardust[tab]*[tab]1[tab]9
In addition to this main data file, each voter is given an individual data file, named ######.cdd, where ###### is their random 6-digit password. A .cdd file will have the following sort of structure:
Email Address[tab]rdice@anadas.comThe first two lines of this file are header information. The following lines relate directly to the albums in cd.txt. The first two fields of any album-related line directly match those fields in cd.txt. The fields relating to the short-list and number of votes cast aren't needed in this file, as only one vote can be cast on any given album by an individual, and the short-list information in cd.txt is sufficient for vote.cgi to operate.
Password[tab]905430
Prince[tab]Purple Rain[tab]7
Yes[tab]90125[tab]5
Simon, Paul[tab]Graceland[tab]8
Talking Heads[tab]Remain in Light[tab]8
My main consideration when designing both external data file structures was human readability. The main reason behind this was I wanted to be able to test portions of my code without having the whole program available to me. For instance, I wanted to test the display mechanisms before I'd written the input routines. So, I had to edit the data files by hand, and I wanted files I would understand while doing that. Figure 17.5 shows what the voting booth looks like to a registered user who logs in as such. Essentially, they are presented with information on their current voting status and given the option to amend their previous votes, as well as cast new ones.
Figure 17.5: The registered user sectio of vote.cgi.
Internal Data Representation
In most instances, vote.cgi simply reads in cd.txt line by line and places it in an array called @line. I decided to call it @line because I think that there is a special ring to the statement foreach (@line) { ... }, which is used quite often throughout the program.While reading in each line of cd.txt, vote.cgi performs a test to see if the third field (referenced by $field[2]) of the current line being read is a *. If so, that line is directly copied into @shortlist.
When arranging the ASCII bar graph of artists and their votes, I use an associative array with the name of the artist as the key field, and the value is the running total of the votes received by that artist across all their albums in my collection. This is easy to accomplish by parsing through @line and finding the values of each field in lines, using $field[0] to represent the artist name and $field[4] as the total votes for that artist's album. The code for this is very straightforward and yet surprisingly powerful:
foreach ( @line ) {The split command is told to perform its splitting of $_, the value the @line element being looked at in the current loop iteration, on tabs, represented by \t.
@field = split(/\t/,$_,5);
$votes{$field[0]} += $field[4];
}
One last programming technique I'd like to comment on is my use of a one-way hashing function to create a key that can be used as a "shorthand" album identifier. When an element of @line is split on tabs into a @field array, $field[0] corresponds to the artist and $field[1] the album name. I have produced a function that will combine these two fields, slightly modified, into a single string. This hash is often useful when I require a simplified way of identifying an album uniquely. For instance, in the registered voter section, <SELECT> tags are used to allow a registered voter to vote on an album. The hash is used to create the <SELECT NAME=hash> specifier. The hash is used more elegantly in the routine that updates data files with new voting information as a way to identify which elements of @line need to be updated.
Summary
I think I've covered about all the theory there is to cover on voting booths in this chapter, and I've given two rather specific examples of how minor and extensive voting booth systems can get. Before I close this chapter, I want to leave you with a few examples of voting booths to be found on the World Wide Web.Though I made quite a show of panning my first voting booth, I think a lot of that had to do with its lack of interesting subject matter. Two dirt-simple voting booths I quite enjoy are the WWWF Grudge Match and Horus' History Poll.
WWWF - http://www.cheme.cornell.edu/~slevine/The WWWF Grudge Match is a voting booth that asks your opinion on who would be the victor in some of the most unlikely contests imaginable: for instance, a battle to the death between "A Rottweiler vs. A Rottweiler's Weight in Chihuahuas." This is some seriously funny stuff.
Horus' - http://www.kaiwan.com/~lucknow/pollbook/pollbk.html
The Horus' History Poll takes a more serious look at Web polling when it asks the question: What was the most important military battle in history? Though the question isn't difficult, the answers supplied are often thought provoking and insightful.
A slightly more extensive voting system can be found at
http://www.georgemag.com/cgi-unprot/poll.plThis is the George On-Line Magazine Weekly Poll. The poll asks for your opinion on a number of topical subjects and keeps an extensive history of past replies, with some minor statistic analysis on the side.
Because 1996 is an American presidential election year, be on the look-out for an explosion of voting booths on this topic. If you see any really good ones, please let me know. My e-mail address is easy to find. Cheers!
No comments:
Post a Comment