One of my Perl scripts here at work used the Add_Delta_Days subroutine from the Date::Calc module to do some calendar date arithmetic. I'm in the process of building a new machine on which this script will run, and I don't have access to an external network. Unfortunately, the install process for Date::Calc is fairly difficult. The module relies on a C library which must be compiled with the same compiler as was used to build the local Perl install. To make matters worse, the modules that Date::Calc is dependent on have similar requirements. As a result, I decided to skip installing this non-standard module, and instead use a home-brew replacement. It turns out that Add_Delta_Days is fairly straightforward to replace:

use Time::Local; # Standard module

sub addDaysToDate
{
    my ($y, $m, $d, $offset) = @_;

    # Convert the incoming date to epoch seconds
    my $TIME = timelocal(0, 0, 0, $d, $m-1, $y-1900);

    # Convert the offset from days to seconds and add
    # to our epoch seconds value
    $TIME += 60 * 60 * 24 * $offset;

    # Convert the epoch seconds back to a legal 'calendar date'
    # and return the date pieces
    my @values = localtime($TIME);
    return ($values[5] + 1900, $values[4] + 1, $values[3]);
}

You call this subroutine like this:

my $year = 2009;
my $month = 4;
my $day = 22;

my ($nYear, $nMonth, $nDay) = addDaysToDate($year, $month, $day, 30);

This subroutine isn't a one-to-one replacement, obviously. Unlike Date::Calc, my home-brew subroutine suffers from the Year 2038 problem (at least on 32-bit operating systems). It likewise can't go back in time by incredible amounts (I'm bound to the deltas around the epoch). However, this workaround saves me a bunch of setup time, and works just as well.

It's been quite a while since my last programming tips grab bag article, and it's high time for another. As promised, I'm discussing PHP this time around. Although simple, each of these tips is geared towards writing cleaner code, which is always a good thing.

1. Use Helper Functions to Get Incoming Data

Data is typically passed to a given web page through either GET or POST requests. To make things easy, PHP give us two superglobal arrays for each of these request types: $_GET and $_POST, respectively. I prefer to use helper functions to poke around in these superglobal arrays; it results in cleaner looking code. Here are the helper functions I typically use:

// Helper function for getting $_GET data
function getGet($key)
{
    if(isset($_GET[$key]))
    {
        if(is_array($_GET[$key]))
            return $_GET[$key];
        else
            return (trim($_GET[$key]));
    }
    else
        return null;
}

// Helper function for getting $_POST data
function getPost($key)
{
    if(isset($_POST[$key]))
    {
        if(is_array($_POST[$key]))
            return $_POST[$key];
        else
            return (trim($_POST[$key]));
    }
    else
        return null;
}

Calling these functions is super simple:

$someValue = getGet('some_value');

If the some_value parameter is set, the variable will get the appropriate value. If it's not set, the variable gets assigned null. So, all that's needed after calling getGet or getPost, is a test to make sure the variable is non-null:

if(! is_null($someValue))
{
    // ... do something
}

Note that these functions also handle the case where the incoming data may be an array (useful when processing lots of similar data fields at once). If the data is simply a scalar value, I run it through the trim function to make sure there's no stray whitespace on either side of the incoming value.

2. Write Your Own SQL Sanitizer

The first and most important rule when accepting data from a user is: never trust the user, even if that user is you! When incoming data is going to be put into a database, you need to sanitize the input to avoid SQL injection attacks. Like the superglobal arrays above, I like using a helper function for this task:

function dbSafe($string)
{
    global $db; // MySQLi extension instance
    return "'" . $db->escape_string($string) . "'";
}

In this example, I'm making use of the MySQLi extension. The $db variable is an instance of this extension, which gets created in another file. Here's an example of creating that instance, minus all the error checking (which you should do); the constants used as parameters should be self explanatory, and are defined elsewhere in my code:

$db = new mysqli(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME);

Back to our dbSafe function, all I do is create a string value: a single quote, followed by the escaped version of the incoming data, followed by another single quote. Let's assume that my test data is the following:

$string = dbSafe("Isn't this the greatest?");

The resulting value of $string becomes 'Isn\'t this the greatest?'. Nice and clean for insertion into a database! Again, this helper makes writing code faster and cleaner.

3. Make a Simple Output Sanitizer

If you work with an application that displays user-generated content (and after all, isn't that what PHP is for?), you have to deal with cross-site scripting (XSS) attacks as well. All such data that is to be rendered to the screen must be sanitized. The htmlentities and htmlspecialchars functions provide us with the capability to encode HTML entities, thus making our output safe. I prefer using the latter, since it's a little safer when working with UTF-8 encoded data (see my article Unicode and the Web: Part 1 for more on that topic). As before, I wrap the call to this function in a helper to save me some typing:

function safeString($text)
{
    return htmlspecialchars($text, ENT_QUOTES, 'UTF-8', FALSE);
}

Everything here should be self explanatory (see the htmlspecialchars manual entry for explanations on the parameters to that function). I make sure to use this any time I display user-generated content; even content that I myself generate! Not only is it important from an XSS point of view, but it helps keep your HTML validation compliant.

4. Use Alternate Conditional Syntax for Cleaner Code

Displaying HTML based on a certain condition is incredibly handy when working with any web application. I used to write this kind of code like this:

<?php
if($someCondition)
{
    echo "\t<div class=\"myclass\">Some element to insert</div>\n";
}
else
{
    echo "\t<div class=\"myclass\"></div>\n"; // Empty element
}
?>

Not only do the backslashed double quotes look bad, the whole thing is generally messy. Instead, I now make use of PHP's alternative syntax for control structures. Using this alternative syntax, the above code is modified to become:

<?php if($someCondition): ?>
    <div class="myclass">Some element to insert</div>
<?php else: ?>
    <div></div>
<?php endif; ?>

Isn't that better? The second form is much easier to read, arguably making things much easier to maintain down the road. And no more backslashes!

Giant Grocery Portions

Apr 20, 2009

It's no surprise to anyone that obesity in America is getting worse every year. This animated map shows the progression in the US between 1985 and 2007, and it's quite a depressing sight. Lots of factors are contributing to everyone's weight gain: poor eating habits, no exercise, etc., but part of the blame certainly lies with food manufacturers. In recent times, food portions have increased by an incredible amount, and they only seem to be getting worse. Not only are the larger portions contributing to our weight gain, they are also making it much harder for people like me to shop in the grocery store.

Before I go much farther, I must confess that I'm not a big eater. Growing up, I knew guys who could eat two or three times as much as I do at each meal. And there are plenty of my peers today who can do the same thing. So I realize that I'm already starting out on the low side of the curve. However, this doesn't change the fact that food manufacturers have gotten out of control with portion management.

Shopping for one is difficult enough to begin with, but I've noticed that it's gotten more so in recent times. While at the grocery store recently, I picked up some potato chips for lunch through the week. The bag I bought had "20% more chips free," making it even larger than the normal bag (which is a little too big to begin with). A sign below the bags of chips offered the following deal: buy 2 get 2 free. So, you have to buy four bags of chips to get a deal! Who in their right mind eats four, full-sized bags of potato chips? Even in reasonably sized families, that's an insane number of chips to buy at once.

Similarly, doughnut manufacturer Krispy Kreme apparently no longer sells their half-dozen doughnut boxes. Instead, they offer a new box of 9. Every once in a while (maybe once every two months), I used to pick up a half-dozen doughnuts and eat them through the week with my breakfast. By the end of that week, the last doughnuts had nearly grown stale, but were still good enough to reheat. A box of 9 would certainly not last the way I eat them.

There are plenty of other examples, but these two stick out in my mind since I encountered them recently. If food manufacturers would provide smaller portions, at somewhat lower prices, I would be able to enjoy their products more often and I wouldn't be wasting perfectly good food. As an added bonus, I wouldn't eat as much, and would feel better as a result. Does anyone else feel the way I do?

In a surprising move, Time Warner Cable is scrapping the bandwidth cap tests for Rochester, NY. Not only that, but it looks like TWC will be shelving the tiered pricing tests across the board, while "the customer education process continues." Maybe the internet works after all.

Digg Gets Shady

Apr 16, 2009

Jeffrey Zeldman has pointed his readers to an interesting article entitled Digg, Facebook Stealing Content, Traffic, Money From Publishers? The article focuses on the recently launched DiggBar, and the negative effects it's having on the web. I gave up on Digg long ago, and this just furthers my intent to stay away from the site. With shady practices like this, it doesn't deserve my attention.

It was great news to hear that captain Richard Phillips was rescued yesterday. I'm amazed that snipers could hit someone on a boat, from another boat, at a distance of nearly 100 feet.

As a result of this hostage situation, there has been a lot of news about pirate attacks around Somalia. It's clearly a big business for these people, seeing as their country is essentially a nonexistent entity. The problem is that these pirates have yet to be punished for their actions. 100% of their ransom attempts (up until now) have been carried through. In other words, they always win.

The other day, it occurred to me how we can solve this problem. All we need is a throwback to the days of World War 2. It's fairly apparent that these pirates have little naval power. They aren't heavily armed, they attack in small boats, and they haven't (yet) appeared in large numbers. As a direct result, this is a perfect opportunity to employ the use of the convoy system.

All we need to do is establish a perimeter around the problem area. If you want to go inside this perimeter, even if you're just passing through, you have to be a part of a convoy. Multiple convoys would leave daily, protected by the various naval ships that are already patrolling the area. This would make attacks much harder, much less infrequent, and would (I claim) put a large stop on the activity going on.

It looks like Time Warner is expanding their broadband bandwidth caps to new markets. One of those new markets is in Greensboro, NC, about 1 hour from where I live. To add insult to injury, it looks like prices are going up as well. The 40GB tier will cost $55, which is $5 more than what I pay today. As they say, this stuff is getting real.

The Office has really seen a resurgence in quality over the past several weeks. I lamented once before about a perceived downward spiral for the show, but tonight's episode was a real throwback to the good old days of seasons 2 and 3. Other recent episodes have been equally as strong, making the show a joy to watch again. Michael's tension with his new boss Charles is truly palpable, allowing the viewer to share those awkward moments that made the early seasons so fun.

It looks as if the writers are setting things up for a huge season finale. We get to see Ryan for the first time in a while in the next episode (which is two weeks from tonight), and hilarity should ensue as Michael and Pam set out to start a new paper company. Will Michael hire Ryan as an employee in his new company? Could Holly make another appearance at the end of the season? Will the folks at Dunder-Mifflin realize that, despite his antics, Michael Scott is the right man for their company?

I can't wait to find out.

Back in the spring of 2005, after having graduating from college, I went looking for a job. I got the chance to interview for Microsoft, though I'm not sure what I would have ended up doing had I gotten the job (they never really told me). My interview was conducted entirely over the phone, and consisted of the typical "brain teaser" type questions that Microsoft is famous for. Needless to say, I performed very poorly and was instantly rejected. The guy on the phone said he'd let me know and, 10 minutes later via email, I knew.

One of the questions they asked me stumped me beyond belief, and I butchered my answer terribly. Not only was I embarrassed for myself, I was embarrassed for the interviewer, having to patiently listen to me. :oops: Anyway, here's a retelling of the question I was asked:

Given a large NxN tic-tac-toe board (instead of the regular 3x3 board), design a function to determine whether any player is winning in the current round, given the current board state.

I realize now that I misinterpreted the question horribly. The interviewer stated the question quite differently than I have it written above; I believe he used something along the lines of "given a tic-tac-toe board of N dimensions ..." I assumed that the bit about dimensionality meant delving into the realm of 3 or more physical dimensions; essentially something like 3-D tic-tac-toe. Obviously, solving such a problem is much more difficult than solving on an NxN 2-D board.

Tonight, for whatever reason, I recalled this question and the fact that I never found an answer for myself. Happily, I subsequently stumbled upon someone else's answer (see question 4), which is quite clever. It's good to finally resolve this problem.

I know interviewing candidates for a job can be tricky, but asking these kinds of questions is silly. Does someone's ability to answer this kind of question really prove they are a better programmer than someone who can't? In the end, I'm eternally glad I didn't get hired for Microsoft; I now realize they are one of the companies I would least like to work for. My current employer seemed much more concerned with real-world problems, my previous employment experience, and the (increasingly rare) ability to program in C++. For that, I am oh-so-grateful.

When I added the favorite photos feature to my photo album software, I wanted a way to randomly show a subset of said favorites on the albums display page. I initially thought about implementing my own means of doing this through PHP. Ultimately, I wanted random selection without replacement, so that viewers would not see multiple copies of the same image in the 'Favorites Preview' section. Thankfully, MySQL saved the day!

When sorting a MySQL query, you can opt to sort randomly:

SELECT {some columns} FROM {some tables}
WHERE {some condition} ORDER BY rand()

The rand() function in PHP essentially gives you random selection without replacement for free! How great is that? It was an easy solution to a not-so-simple problem, and saved me a lot of programming time.

Update: I have since learned that the ORDER BY rand() call is horribly inefficient for large data sets. As such, it should ideally be avoided. There's a great article describing ways to work around these performance limitations.

Reading With Franz

Mar 17, 2009

My dad stumbled upon an incredibly well produced video entitled "Reading With Franz." In it, we learn how Franz, a puppet representing a person with a disability, is able to read books with a simple switch device and Tar Heel Reader. For those who may not know, Tar Heel Reader is a website my dad started a while back with an emphasis on providing books for beginning readers. There are over 3000 books on the website as of this writing, with more being added every day. Over 2200 visitors surf the site every week, with nearly 300,000 weekly page views. This map of readers shows that visitors are coming in from all over the world (a total of 80 countries so far). If you know a beginning reader, particularly one with a disability, be sure to check out the site.

About this time last year, I noted that our build machines at work were way out of sync in their respective local times. As a result, we were seeing a bunch of "clock skew" warnings when building our code. To fix the problem, I figured out how to use NTP on a private network. Imagine my surprise when, while performing a build today, I noticed more clock skew warnings! I checked our setup, and NTP was still functioning as expected. The problem, it turns out, was that some of our build machines had not yet changed over to Daylight Savings Time (DST), something NTP doesn't assist with. Only the oldest machines were affected, which wasn't surprising, seeing as Congress feels the need to change the DST rules every few years.

Thankfully, updating time zone information is easy to do. Here's how:

Step 1: Verify Your System Settings
Clearly, we should first check to see if we even need to update our system. To do this, we can issue this command, replacing 2009 with the appropriate year:
zdump -v /etc/localtime | grep 2009
The reported DST times should correspond with your local area. In my case, the reported date on the broken systems was April 5, not March 8. So this particular system needed updating. See the end of this article for a note on potential zdump problems on 64-bit systems.
Step 2: Obtain the Latest Time Zone Information
The latest time zone data can be obtained via the tz FTP distribution website. You'll want to get the tzdata{year}{version}.tar.gz file. In my case, the filename was tzdata2009c.tar.gz. Copy this file to the system to be updated, and unpack it in a temporary location (I put it in a subfolder in /tmp).
Step 3: Compile the New Time Zone Data
We now need to compile the new time zone data. This can be done through use of the handy zic command:
zic -d {temp_dir} {file_to_compile}
In my case, I used the name of zoneinfo for the {temp_dir} parameter, and I wanted to compile the northamerica file, seeing as that's where I live:
zic -d zoneinfo northamerica
Upon completing this compilation step, a new zoneinfo directory was created in the temporary location where I unpacked the time zone data.
Step 4: Copy the Newly Built Files
Now that the appropriate files have been built, we'll need to copy the specific region files to the right location. By default, Linux time zone information lives in the /usr/share/zoneinfo directory. Since I live in the Eastern time zone, I copied the EST and EST5EDT files to the aforementioned location (I didn't know which file I really needed, so I just grabbed both). These files will overwrite the existing versions, so you may want to back those old versions up, just to be safe. In addition to this 'global time zone' file, you'll want to copy the appropriate specific time zone data file to the right place. In my case, I copied the America/New_York file to the corresponding location in the /usr/share/zoneinfo directory. Again, you'll be overwriting an existing file, so make backups as necessary.
Step 5: Update the localtime Link in /etc
The file /etc/localtime should theoretically be a soft link to the appropriate specific time zone data file in the /usr/share/zoneinfo directory. On a few of the machines I had to update, this was not the case. To create the soft link, issue the standard command:
ln -s /usr/share/zoneinfo/{path_to_zone_file} /etc/localtime
Here's the command I issued for my example:
ln -s /usr/share/zoneinfo/America/New_York /etc/localtime
Step 6: Validate the Changes
Now that we have installed the new time zone information file, we can verify that the data has been updated properly, again by using the zdump command:
zdump -v /etc/localtime | grep 2009
This time, the dates shown should be correct. If you issue a date command, your time zone should also now be correct.

There is one word of warning I can provide to you. On some older 64-bit systems, the zdump command will seg-fault when you run it. This is a bug with the installed glibc package. I found this RedHat errata page covering the issue (at least, it refers to the package version that fixes this issue). Thankfully, I was able to compile and install the new time zone information without having to update glibc (I simply validated my changes by issuing a date command). It seems that only the zdump command exhibits the seg-fault on those older systems. Your mileage may vary.

I ran into a weird problem in one of our build scripts at work today. We compile our tools across a number of platforms and architectures, and I ran across this issue on one of our oldest boxes, running RedHat 9. Here's the horrible error that I got when linking:

/usr/bin/ld: myFile.so: undefined versioned symbol name std::basic_string<char, std::char_traits<char>, std::allocator<char> >& std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace_safe<char const*>(__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, char const*, char const*)@@GLIBCPP_3.2 /usr/bin/ld: failed to set dynamic section sizes: Bad value

It seems as if the standard C++ libraries on this system were compiled with gcc 3.2, while the version we're using to build our tools is 3.2.3. Unfortunately, the 3.2 compiler isn't installed on the system, and I'm not sure where we would find it for RH9 anyway. Thankfully, I found a workaround for this problem. Our link step originally looked like this:

gcc -shared -fPIC -lstdc++ -lrt -lpthread -o myFile.so {list_of_object_files}

I found out that by moving the standard libraries to the end of the line, the problem disappeared. Here's the new link step:

gcc -shared -fPIC -o myFile.so {list_of_object_files} -lstdc++ -lrt -lpthread

I don't fully understand why ordering should matter during the link step, but by putting the standard libraries last, we were able to get rid of this error. If you understand the root cause of this, please leave a comment explaining. I'd love to know more about why changing the order makes a difference.

Burrito Filling

Mar 8, 2009
  • 1 can (15-oz) kidney beans
  • 1 can (15-oz) pinto beans
  • 1 can (15-oz) black beans
  • 1 large carrot
  • 1 stalk celery
  • 1 medium onion
  • 1 garlic clove
  • 2 green chili peppers (optional)
  • 1 tsp chili powder
  • 1 tsp cumin
  • 1 tsp vegetable seasoning
  • 1/4 tsp kelp
  • 1/8 tsp thyme
  • Dash of cayenne pepper
  • 1 can (5-oz) tomato juice

Place all three cans of beans into a colander; rinse and drain thoroughly. Pour the beans into a large skillet and mash them. Place the carrot, celery, onion, garlic, chili peppers, and tomato juice in a blender, and blend well. Pour the blended ingredients into the beans, mixing them together. Add the rest of the ingredients, again mixing well. Simmer over low heat for 10 to 15 minutes, or until the mixture is warmed to your liking, stirring occasionally.

In my recent post on analyzing bandwidth usage, I promised an update once February was done. Seeing as it's now March, it's time for said update. Here's the graph of my bandwidth usage for the month of February:

I didn't break the 40 GB barrier, but I wasn't far from it this month at 37 GB. The highest daily total was 3304 MB on February 2, though several other days came close to that total. This is the first month that I haven't noticed any interesting trends, but it's still enjoyable to chart my activity. As I predicted, my daily average seems higher this month, thanks to my Roku player and Netflix Watch Instantly. If I break the barrier in March, I'll be sure to let everyone know. It appears Time Warner has done their homework on their proposed upper limit...

Ground Zero

Feb 27, 2009

Gizmodo pointed me this morning to an oh-so-wrong yet oh-so-fun Google Maps mashup, that allows you to nuke the city of your choice. Simply search for your favorite (or least-favorite) city, select your weapon, and nuke it! It was interesting to compare the blast radius of the Little Man and the more modern nuclear weapons. Suffice it to say that today's weapons are awfully scary.

My favorite, however, is the asteroid impact. Most. Destruction. Ever.

If I Ran the Oscars

Feb 22, 2009

If I ran the Academy Award ceremony:

  • The host would be a news reporter, chosen specifically for their inability to make lame jokes.
  • Said host would read the award category, the nominations, and the winner, without any pauses or cuts to montages of said nominations.
  • Award presentations that no one cares about (best sound editing, best art direction, best makeup, etc) wouldn't be televised.
  • Award winners would receive their award on a side stage with no podium or microphone, thereby removing their ability to give an acceptance speech.
  • The entire award ceremony would be 30 minutes long.
  • Nielsen ratings for the event would be at an all time high.

Hold your applause, please.

A PHP Include Pitfall

Feb 22, 2009

I ran into an interesting problem with the PHP include mechanism last night (specifically, with the require_once variant, but this discussion applies to all of the include-style functions). Suppose I have the following folder structure in my web application:

myapp/
 |-- includes.php
 +-- admin/
      |-- admin_includes.php
      +-- ajax/
           +-- my_ajax.php

Let's take a look at the individual PHP files in reverse order. These examples are bare bones, but will illustrate the problem. First, my_ajax.php:

// my_ajax.php
<?php
require_once("../admin_includes.php");

some_generic_function();
?>

Here's the code for admin_includes.php:

// admin_includes.php
<?php
require_once("../includes.php");
?>

And finally, includes.php:

// includes.php
<?php
function some_generic_function()
{
    // Do something here
}
?>

When I go to access the my_ajax.php file, I'll get a "no such file or directory" PHP error. This immediately doesn't make much sense, but a quick glance at the PHP manual clears things up:

Files for including are first looked for in each include_path entry relative to the current working directory, and then in the directory of the current script. If the file name begins with ./ or ../, it is looked for only in the current working directory.

The important part is in that last sentence: if your include or require statement starts with a ./ or ../, PHP will only look in the current working directory. So, in our example above, our working directory when accessing the AJAX script is "/myapp/admin/ajax." The require_once within the admin_functions.php file will therefore fail, since there's no '../includes.php' in the current working directory.

This is surprising behavior and should be kept in mind when chaining includes. A simple workaround is to use the following code in your include statements:

require_once(dirname(__FILE__) . "../../some/relative/path.php");

It's not the most elegant solution in the world, but it gets around this PHP annoyance.

TF2 Scout Update

Feb 21, 2009

It looks like I'll have a reason to get back into Team Fortress 2 next week: the official Scout update is nearly here! So far, Valve has released information on the following:

There are still two days of updates left to be unveiled. One of them, if I recall correctly, is a new payload map, and the other is undoubtedly the new primary unlockable weapon (replacing the scatter gun). Very exciting!

Watchmen Review

Feb 16, 2009

Reading Watchmen is, for me, akin to looking at the Mona Lisa. In my heart of hearts, I know it's a masterpiece, but I just don't like it. My main problem with Watchmen, and a problem I'm increasingly having with LOST (which I'm trying to catch up on), is that there's no hope for the characters. I have absolutely no reason to root for the characters in Watchmen; they're the saddest group of people in the world. The story is overly complex, the pacing erratic, and the tone is way too preachy for my liking.

I know lots of folks out there adore this story, but I say 'skip it.'