Posts Tagged “programming”

Virtual Environment Helper

Published on February 11, 2020

I use Python virtual environments a bunch at work, and this morning I finally put together a small helper script, saved as a Gist at GitHub, that makes enabling and disabling virtual environments a lot easier. I'm not sure why I didn't do this a lot earlier. Simply type work to enable the virtual environment, and work off to disable it. This script should be in your PATH, if it's not already obvious.

Here's the script itself:

@echo off

if exist "%cd%\venv" (
    if "%1" == "off" (
        echo Deactivating virtual environment
        call "%cd%\venv\Scripts\deactivate.bat"
    ) else (
        echo Activating virtual environment
        call "%cd%\venv\Scripts\activate.bat"
) else (
    echo No venv folder found in %cd%.

Permanent Redirects Get Cached

Published on April 20, 2019

I maintain multiple tools at work that all run in Docker containers on the same machine. The overall setup looks like the following diagram:

Tool Network Diagram

The router container on top (nginx) routes traffic to the various application containers based on the hostname seen in each request (each tool has its own internal domain name). Each application has an nginx container for serving static assets, and a gunicorn container to serve the dynamic parts of the application (using the Django framework).

Earlier this week, I was trying to add a redirect rule to one of my application containers (at the application nginx layer), because a URL was changing. As a convenience for users, I wanted to redirect them to the new location so they don't get the annoying "404: Not Found" error. I set up the redirect as a permanent redirect using a rewrite rule in nginx. For some strange reason, the port of the application's nginx layer, which should never be exposed to the outside world, was being appended to the redirect!

Adding the port_in_redirect off; directive to my nginx rules made no difference (or so I thought), and I struggled for an entire day on why this redirect wasn't working properly. At the end of the day, I learned that permanent redirects are aggressively cached by the browser! This annoyance means you need to clear your browser's cache to remove bogus redirects. I wasted an entire day because my stupid browser was using a bogus cached reference. Ugh!


Published on January 3, 2019

One of the web comics I follow is Saturday Morning Breakfast Cereal. The official RSS feed for this comic only includes the comic itself and the associated hover-text joke. To see the extra joke, you have to visit the SMBC website. But no longer!

I've just created a new project on GitHub that fixes this issue. It's another RSS feed generator, and the feed that it generates contains the daily comic, the hover-text joke, and the hidden joke, all inline.

As always, there's room for improvement in a place or two. Let me know if you spot any issues.

King Features Comics Feeds

Published on October 6, 2018

Since I no longer subscribe to my local newspaper, I now primarily read daily comic strips through RSS feeds. carries the vast majority of the strips I read, but several key strips are not included. It turns out that these missing strips are all owned by King Features which, frustratingly, doesn't provide RSS feeds to their strips.

I have now fixed that.

My new project, comics-rss, is now available for users interested in creating RSS feeds to the comic strips provided by King Features. The project is admittedly brittle at the moment, but it has worked well for me so far. A number of improvements are planned:

  1. The script currently caches the comic strips locally, linking to the cached copy. I'd like to provide an option to use direct links instead, skipping the cache altogether.
  2. Cached strips are not currently cleaned up, so the folder into which they are stored will grow each day. I'll be adding an "expired" configuration option to clean things up.
  3. Error checking in the configuration file isn't very robust, and needs to be improved.

I would be interested in any feedback you might have on this project. If you find bugs or have suggestions for improvement, be sure to file them on the project issues board.

A Subtle Python Bug

Published on February 23, 2018

I recently had a very subtle bug with an OrderedDict in my Python code at work. I constructed the contents of this object from a SQL query that was output in a specific order (can you spot the bug?):

qs = models.MyModel.objects.all().order_by("-order")
data = OrderedDict({ for x in qs})

My expectation was output like the following, which I was seeing on my development system (Python 3.6):

OrderedDict([(4, 'Four'), (3, 'Three'), (2, 'Two'), (1, 'One')])

However, on my official sandbox test system (which we use for internal testing, running Python 3.5), I was seeing output like this:

OrderedDict([(1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four')])

There are actually two issues in play here, and it took me a while to figure out what was going on.

  1. First, I'm constructing the OrderedDict element incorrectly. I'm using a dictionary comprehension as the initialization data for the object's constructor. Dictionaries are (until recently) not guaranteed to preserve insertion order when iterated over. This is where my order was being screwed up.
  2. Second, the above behavior for dictionary order preservation is an implementation detail that changed in Python 3.6. As of 3.6 (in the CPython implementation), dictionaries now preserve the insertion order when iterated over. My development system, running on 3.6, was therefore outputting things as I expected them. The sandbox system, still running 3.5, did not. What an annoyance!

I've learned two valuable lessons here: (a) make sure you're running on the same levels of code in various places, and (b) don't initialize an OrderedDict with a dictionary comprehension.

Born Geek on GitHub

Published on March 21, 2014

I have uploaded the source of both CoLT and Googlebar Lite to GitHub:

This should make it way easier for folks to submit new ideas and bug reports for each extension, provide patches (if you feel so inclined), and view sample code for Firefox extension development. I've already posted a few issues to the CoLT repo, and a number should be appearing for Googlebar Lite as well.

Things I Learned Using Stack Overflow

Published on February 2, 2012

In my last post, I complained about my initial experience with Stack Overflow. I decided to give myself 30 days with the service, to see whether or not I warmed up to it. Now that those 30 days are over, I will be posting several of my thoughts and observations. This first post won't be about the site itself; instead, it will cover some of the things I learned during my 30 days. A second upcoming post will cover some problems I think exist with the Stack Overflow model, and my final post will provide a few suggestions for how I think things can be improved.

Let me first say that I learned a lot simply by browsing the site. Reading existing questions and their answers was fascinating, at least for the programming topics I care about. Some of what I learned came through mistakes I made attempting to answer open questions. Other bits of information just came through searching the web for the solution to someone's problem (something that a lot of people at Stack Overflow are apparently too lazy to do). Without further ado, here's a list of stuff I learned, in no particular order (each item lists the corresponding language):

C (with GNU Extension), PHP (5.3+)
The true clause in a ternary compare operation can be omitted. In this case, the first operand (the test) will be returned if true. This is a bizarre shortcut, and one I would never personally use. Here's a PHP example (note that there's no space between the question mark and the colon; in C, a space is necessary):

$a = $b ?: $c; // No true clause (too lazy to type it, I guess)
$a = $b ? $b : $c; // The above is equivalent to this
Regular Expressions (Perl, PHP, possibly others)
The $ in a regular expression doesn't literally match the absolute end of the string; it can also match a new-line character that is the last character in the string. Pattern modifiers are usually available to modify this behavior. This fact was a surprise to me; I've had it wrong all these years!
I found a terrific article that details the differences between test, [, and [[.
Firefox Extensions (XUL, JS)
You can use the addTab method in the global browser object to inject POST data to a newly opened tab.
The way I learned to open files for output in Perl (over a decade ago) is now not advised. It's going to take a lot of effort on my part to change to the new style; old habits, and all that.

# Old way of doing it (how I learned)
open OUT, "> myfile.txt" or die "Failed to open: $!";

# The newer, recommended way (as of Perl 5.6)
open my $out, '>', "myfile.txt" or die "Failed to open: $!";

Getting Form Data With PHP

Published on September 19, 2011

A couple of years ago, I blogged about two helper functions I wrote to get HTML form data in PHP: getGet and getPost. These functions do a pretty good job, but I have since replaced them with a single function: getData. Seeing as I haven't discussed it yet, I thought I would do so today. First, here's the function in its entirety:

 * Obtains the specified field from either the $_GET or $_POST arrays
 * ($_GET always has higher priority using this function). If the value
 * is a simple scalar, HTML tags are stripped and whitespace is trimmed.
 * Otherwise, nothing is done, and the array reference is passed back.
 * @return The value from the superglobal array, or null if it's not present
 * @param $key (Required) The associative array key to query in either
 * the $_GET or $_POST superglobal
function getData($key)
            return $_GET[$key];
            return (strip_tags(trim($_GET[$key])));
    else if(isset($_POST[$key]))
            return $_POST[$key];
            return (strip_tags(trim($_POST[$key])));
        return null;

Using this function prevents me from having to do two checks for data, one in $_GET and one in $_POST, and so reduces my code's footprint. I made the decision to make $_GET the tightest binding search location, but feel free to change that if you like.

As you can see, I first test to see if the given key points to an array in each location. If it is an array, I do nothing but pass the reference along. This is very important to note. I've thought about building in functionality to trim and strip tags on the array's values, but I figure it should be left up to the user of this function to do that work. Be sure to sanitize any arrays that this function passes back (I've been bitten before by forgetting to do this).

If the given key isn't found in either the $_GET or $_POST superglobals, I return null. Thus, a simple if(empty()) test can determine whether or not a value has been provided, which is generally all you care about with form submissions. An is_null() test could also be performed if you so desire. This function has made handling form submissions way easier in my various work with PHP, and it's one tool that's worth having in your toolbox.

MySQL and Localhost Performance

Published on April 5, 2011

I ran into an interesting phenomenon with PHP and MySQL this morning while working on a web application I've been developing at work. Late last week, I noted that page loads in this application had gotten noticeably slower. With the help of Firebug, I was able to determine that a 1-second delay was consistently showing up on each PHP page load. Digging a little deeper, it became clear that the delay was a result of a change I recently made to the application's MySQL connection logic.

Previously, I was using the IP address as the connection host for the MySQL server:

$db = new mysqli("", "myUserName", "myPassword", "myDatabase");

I recently changed the string to localhost (for reasons I don't recall):

$db = new mysqli("localhost", "myUserName", "myPassword", "myDatabase");

This change yielded the aforementioned 1-second delay. But why? The hostname localhost simply resolves to, so where is the delay coming from? The answer, as it turns out, is that IPv6 handling is getting in the way and slowing us down.

I should mention that I'm running this application on a Windows Server 2008 system, which uses IIS 7 as the web server. By default, in the Windows Server 2008 hosts file, you're given two hostname entries: localhost
::1 localhost

I found that if I commented out the IPV6 hostname (the second line), things sped up dramatically. PHP bug #45150, which has since been marked "bogus," helped point me in the right direction to understanding the root cause. A comment in that bug pointed me to an article describing MySQL connection problems with PHP 5.3. The article dealt with the failure to connect, which happily wasn't my problem, but it provided one useful nugget: namely that the MySQL driver is partially responsible for determining which protocol to use. Using this information in my search, I found a helpful comment in MySQL bug #6348:

The driver will now loop through all possible IP addresses for a given host, accepting the first one that works.

So, long story short, it seems as though the PHP MySQL driver searches for the appropriate protocol to use every time (it's amazing that this doesn't get cached). Apparently, Windows Server 2008 uses IPV6 routing by default, even though the IPV4 entry appears first in the hosts file. So, either the initial IPV6 lookup fails and it then tries the IPV4 entry, or the IPV6 route invokes additional overhead; in either case, we get an additional delay.

The easiest solution, therefore, is to continue using as the connection address for the database server. Disabling IPV6, while a potential solution, isn't very elegant and it doesn't embrace our IPV6 future. Perhaps future MySQL drivers will correct this delay, and it might go away entirely once the world switches to IPV6 for good.

As an additional interesting note, the PHP documentation indicates that a local socket gets used when the MySQL server name is localhost, while the TCP/IP protocol gets used in all other cases. But this is only true in *NIX environments. In Windows, TCP/IP gets used regardless of your connection method (unless you have previously enabled named pipes, in which case it will use that instead).

PHP and Large File Sizes

Published on March 28, 2011

It's incredible to me that in 2011, programming languages still have problems with files larger than 2GB in size. We've had files that size for years, and yet overflow problems in this arena still persist. At work, I ran into this problem trying to get the file size of very large files (between 3 and 4 GB in size). The typical filesize() call, as shown below, would return an overflowed result on a very large file:

$size = filesize($someLargeFile);

Because PHP uses signed 32-bit integers to represent some file function return types, and because a 64-bit version of PHP is not officially available, you have to resort to farming the job out to the OS. In Windows, the most elegant way I've found so far is to use a COM object:

$fsobj = new COM("Scripting.FileSystemObject");
$f = $fsobj->GetFile($file);
$size = $file->Size;

Uglier hacks involve capturing the output of the dir command from the command line. There are two bug reports filed on this very issue: 27792 and 34750. The newest of these was filed in late 2005; a little more than 5 years ago! It's sad to see a language as prolific as PHP struggling with a problem so basic. Perhaps this issue will finally get fixed in PHP 6.

Calls to system() in Windows

Published on March 22, 2011

I recently ran into a stupid problem using the system() call in C++ on Windows platforms. For some strange reason, calls to system() get passed through the cmd /c command. This has some strange side effects if your paths contain spaces, and you try to use double quotes to allow those paths. From the cmd documentation:

If /C or /K is specified, then the remainder of the command line after the switch is processed as a command line, where the following logic is used to process quote (") characters:

  1. If all of the following conditions are met, then quote characters on the command line are preserved:
    • no /S switch
    • exactly two quote characters
    • no special characters between the two quote characters, where special is one of: &<>()@^|
    • there are one or more whitespace characters between the two quote characters
    • the string between the two quote characters is the name of an executable file
  2. Otherwise, old behavior is to see if the first character is a quote character and if so, strip the leading character and remove the last quote character on the command line, preserving any text after the last quote character.

As you can see from this documentation, if you have any special characters or spaces in your call to system(), you must wrap the entire command in an extra set of double quotes. Here's a working example:

string myCommand = "\"\"C:\\Some Path\\Here.exe\" -various -parameters\"";
int retVal = system(myCommand.c_str());
if (retVal != 0)
    // Handle the error

Note that I've got a pair of quotes around the entire command, as well as a pair around the path with spaces. This requirement isn't apparent at first glance, but it's something to keep in mind if you ever find yourself in this situation.

Disliking Java

Published on September 21, 2010

If you were to ask me which programming language I hated, my first answer would most certainly be Lisp (short for "Lots of Stupid, Irritating Parentheses"). On the right day, my second answer might be Java. But seeing as hate is such a strong word, I'll opt for the statement that I dislike Java instead.

For the first time in probably 7 or 8 years, I'm having to write some Java code for a project at work. In all fairness, one of the main reasons I dislike the language is that I'm simply not very familiar with it. I'm sure that if I spent more time writing Java code, I might warm up to some of its quirks. But there are too many annoyances out of the gate to make me want to write stuff in Java for fun. Jumping back into Java development reminds me just how lucky I am to work with Perl and C++ code on a daily basis. Here are a few of my main gripes:

  1. It's a little ridiculous that the language requires the filename containing a class to exactly match the name of the class (so, a class named MyClass has to be placed in a file named ""). Other than making it easy to find where certain code resides, what's the benefit of this practice? The compiler simply translates your human-readable code into machine-specific byte code; filenames get lost in the translation!
  2. It pains me to have to write System.out.println("Some string"); to print some text, when in Perl it's simply print "Some string";. This leads me to my next major gripe:
  3. Java is way too verbose. I have to write 100 lines of code in Java to do what can be done in 10 lines of Perl. My time is worth something and I'm spending too much of it dealing with Java boilerplate code. In C++, I can use the public: keyword once, and everything that follows is public (until either another similar control keyword is reached or we come to the end of the block). It doesn't look like that's allowed in Java. Instead, I have to place the public keyword in front of each and every member variable and function. Ugh!
  4. Surprisingly, Java's documentation is pretty poor. Examples are few and far between and varying terminology makes it unclear when to use what function. For example, in some list-based data structure classes, getting a count of the items in said list might be getSize(), it might be getLength(), it could be just length(), or it might even be getNumberOfItems(). There's apparently no standard. Every other language manual I've ever used, be it PHP, Perl, or even the official C++ manual, has examples throughout, and relatively sane naming conventions. I can find no such help in Java-land.
  5. Automatic memory management can be handy, but it can also be a bother. I know for a fact that there are folks out there who make competent Java programmers who wouldn't last 10 minutes with C++ code. Pointers still matter in the world of computing. That Java hides all of those concepts from programmers, especially young programmers learning the trade, seems detrimental to me. It pays to know how memory allocation works. Trusting the computer to "just handle it" for you isn't always the best solution.
  6. Nearly all Java IDE's make Visual Studio look like the greatest thing on the planet; and Visual Studio sucks!

All that being said, the language does have a few redeeming features. Packages are a nice way to bundle up chunks of code (I wish C++ had a similar feature). It's also nice that the language recognizes certain data types as top-level objects (strings being one; again, C++ really hurts in this department, and yes I know about STL string which has its own set of problems).

I know there are folks who read this site that make a living writing Java code, so please don't take offense at my views. It's not that I hate Java; it's just that I don't like it.

Requiring Code Block Braces

Published on March 3, 2010

One of the things I most appreciate about Perl is that it requires code blocks to be surrounded by curly braces. In my mind, this is particularly important with nested if-else statements. Many programming languages don't require braces to surround code blocks, so nested conditionals can quickly become unreadable and much harder to maintain. Let's take a look at an example:

if (something)
    if (another_thing)
        if (yet_another_thing)

Note that the outer if-statement doesn't have corresponding curly braces. As surprising as it may seem, this is completely legal code in many languages. In my opinion, this is a dangerous programming practice. If I wanted to add additional logic to the contents of the outer if block, I would have to remember to put the appropriate braces in place.

Had I attempted to use this code in a Perl script, the interpreter would have complained immediately, even if warnings and strict parsing were both disabled! This kind of safety checking prevents me from shooting myself in the foot. Some may complain that requiring braces makes programming slightly more inefficient from a productivity standpoint. My response to that is that any code editor worth its salt can insert the braces for you. My favorite editor, SlickEdit, even supports dynamic brace surrounding, a feature I truly appreciate. It's a shame that more programming languages don't enforce this kind of safety net. Hopefully future languages will keep small matters like this in mind.

A Funny Look at Programming Languages

Published on May 11, 2009

An article entitled A Brief, Incomplete, and Mostly Wrong History of Programming Languages offers a very humorous glimpse into the world of programming. My absolute favorite snippet from the article:

1987 - Larry Wall falls asleep and hits Larry Wall's forehead on the keyboard. Upon waking Larry Wall decides that the string of characters on Larry Wall's monitor isn't random but an example program in a programming language that God wants His prophet, Larry Wall, to design. Perl is born.

It's funny because it's true. (Hat tip to Dustin for the pointer to this article.)

Replacement for Add_Delta_Days

Published on April 22, 2009

One of my Perl scripts here at work used the Add_Delta_Days subroutine from the Date::Calc module to do some calendar date arithmetic. I'm in the process of building a new machine on which this script will run, and I don't have access to an external network. Unfortunately, the install process for Date::Calc is fairly difficult. The module relies on a C library which must be compiled with the same compiler as was used to build the local Perl install. To make matters worse, the modules that Date::Calc is dependent on have similar requirements. As a result, I decided to skip installing this non-standard module, and instead use a home-brew replacement. It turns out that Add_Delta_Days is fairly straightforward to replace:

use Time::Local; # Standard module

sub addDaysToDate
    my ($y, $m, $d, $offset) = @_;

    # Convert the incoming date to epoch seconds
    my $TIME = timelocal(0, 0, 0, $d, $m-1, $y-1900);

    # Convert the offset from days to seconds and add
    # to our epoch seconds value
    $TIME += 60 * 60 * 24 * $offset;

    # Convert the epoch seconds back to a legal 'calendar date'
    # and return the date pieces
    my @values = localtime($TIME);
    return ($values[5] + 1900, $values[4] + 1, $values[3]);

You call this subroutine like this:

my $year = 2009;
my $month = 4;
my $day = 22;

my ($nYear, $nMonth, $nDay) = addDaysToDate($year, $month, $day, 30);

This subroutine isn't a one-to-one replacement, obviously. Unlike Date::Calc, my home-brew subroutine suffers from the Year 2038 problem (at least on 32-bit operating systems). It likewise can't go back in time by incredible amounts (I'm bound to the deltas around the epoch). However, this workaround saves me a bunch of setup time, and works just as well.

A Stupid Interview Question

Published on March 20, 2009

Back in the spring of 2005, after having graduating from college, I went looking for a job. I got the chance to interview for Microsoft, though I'm not sure what I would have ended up doing had I gotten the job (they never really told me). My interview was conducted entirely over the phone, and consisted of the typical "brain teaser" type questions that Microsoft is famous for. Needless to say, I performed very poorly and was instantly rejected. The guy on the phone said he'd let me know and, 10 minutes later via email, I knew.

One of the questions they asked me stumped me beyond belief, and I butchered my answer terribly. Not only was I embarrassed for myself, I was embarrassed for the interviewer, having to patiently listen to me. 😳 Anyway, here's a retelling of the question I was asked:

Given a large NxN tic-tac-toe board (instead of the regular 3x3 board), design a function to determine whether any player is winning in the current round, given the current board state.

I realize now that I misinterpreted the question horribly. The interviewer stated the question quite differently than I have it written above; I believe he used something along the lines of "given a tic-tac-toe board of N dimensions ..." I assumed that the bit about dimensionality meant delving into the realm of 3 or more physical dimensions; essentially something like 3-D tic-tac-toe. Obviously, solving such a problem is much more difficult than solving on an NxN 2-D board.

Tonight, for whatever reason, I recalled this question and the fact that I never found an answer for myself. Happily, I subsequently stumbled upon someone else's answer (see question 4), which is quite clever. It's good to finally resolve this problem.

I know interviewing candidates for a job can be tricky, but asking these kinds of questions is silly. Does someone's ability to answer this kind of question really prove they are a better programmer than someone who can't? In the end, I'm eternally glad I didn't get hired for Microsoft; I now realize they are one of the companies I would least like to work for. My current employer seemed much more concerned with real-world problems, my previous employment experience, and the (increasingly rare) ability to program in C++. For that, I am oh-so-grateful.

Randomizing MySQL Query Results

Published on March 18, 2009

When I added the favorite photos feature to my photo album software, I wanted a way to randomly show a subset of said favorites on the albums display page. I initially thought about implementing my own means of doing this through PHP. Ultimately, I wanted random selection without replacement, so that viewers would not see multiple copies of the same image in the 'Favorites Preview' section. Thankfully, MySQL saved the day!

When sorting a MySQL query, you can opt to sort randomly:

SELECT {some columns} FROM {some tables}
WHERE {some condition} ORDER BY rand()

The rand() function in PHP essentially gives you random selection without replacement for free! How great is that? It was an easy solution to a not-so-simple problem, and saved me a lot of programming time.

Update: I have since learned that the ORDER BY rand() call is horribly inefficient for large data sets. As such, it should ideally be avoided. There's a great article describing ways to work around these performance limitations.

Copyright © 2004-2020 Jonah Bishop. Hosted by DreamHost.