Born Geek
Posts Tagged "programming"

King Features Comics Feeds

October 6, 2018

Since I no longer subscribe to my local newspaper, I now primarily read daily comic strips through RSS feeds. comicsrss.com carries the vast majority of the strips I read, but several key strips are not included. It turns out that these missing strips are all owned by King Features which, frustratingly, doesn’t provide RSS feeds to their strips.

I have now fixed that.

My new project, comics-rss, is now available for users interested in creating RSS feeds to the comic strips provided by King Features. The project is admittedly brittle at the moment, but it has worked well for me so far. A number of improvements are planned:

  1. The script currently caches the comic strips locally, linking to the cached copy. I’d like to provide an option to use direct links instead, skipping the cache altogether.
  2. Cached strips are not currently cleaned up, so the folder into which they are stored will grow each day. I’ll be adding an “expired” configuration option to clean things up.
  3. Error checking in the configuration file isn’t very robust, and needs to be improved.

I would be interested in any feedback you might have on this project. If you find bugs or have suggestions for improvement, be sure to file them on the project issues board.

A Subtle Python Bug

February 23, 2018

I recently had a very subtle bug with an OrderedDict in my Python code at work. I constructed the contents of this object from a SQL query that was output in a specific order (can you spot the bug?):

qs = models.MyModel.objects.all().order_by("-order")
data = OrderedDict({x.id: x.name for x in qs})

My expectation was output like the following, which I was seeing on my development system (Python 3.6):

OrderedDict([(4, 'Four'), (3, 'Three'), (2, 'Two'), (1, 'One')])

However, on my official sandbox test system (which we use for internal testing, running Python 3.5), I was seeing output like this:

OrderedDict([(1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four')])

There are actually two issues in play here, and it took me a while to figure out what was going on.

  1. First, I’m constructing the OrderedDict element incorrectly. I’m using a dictionary comprehension as the initialization data for the object’s constructor. Dictionaries are (until recently) not guaranteed to preserve insertion order when iterated over. This is where my order was being screwed up.
  2. Second, the above behavior for dictionary order preservation is an implementation detail that changed in Python 3.6. As of 3.6 (in the CPython implementation), dictionaries now preserve the insertion order when iterated over. My development system, running on 3.6, was therefore outputting things as I expected them. The sandbox system, still running 3.5, did not. What an annoyance!

I’ve learned two valuable lessons here: (a) make sure you’re running on the same levels of code in various places, and (b) don’t initialize an OrderedDict with a dictionary comprehension.

Comments Off on A Subtle Python Bug Tags:

Born Geek on GitHub

March 21, 2014

I have uploaded the source of both CoLT and Googlebar Lite to GitHub:

This should make it way easier for folks to submit new ideas and bug reports for each extension, provide patches (if you feel so inclined), and view sample code for Firefox extension development. I’ve already posted a few issues to the CoLT repo, and a number should be appearing for Googlebar Lite as well.

Using the FormHistory Module

March 4, 2014

In recent times, Mozilla has deprecated the nsIFormHistory2 and nsIFormHistory interfaces (see bug 878677 for more information), replacing both with the FormHistory.jsm module. This module provides an asynchronous way to store form history items, which is good for performance. Like many of the Mozilla interfaces, however, documentation is nearly non-existent. I like learning by example, and I’ve figured out how the FormHistory module works. Here are a few examples showing how to use it:

// Import the module
Components.utils.import('resource://gre/modules/FormHistory.jsm');

// Remove all stored history for a specific field
// ('GBL-Search-History' in this example)
FormHistory.update({op: "remove", fieldname: "GBL-Search-History"});

// Add specific terms to a specific field
FormHistory.update({op: "bump", fieldname: "GBL-Search-History", 
                    value: termsToStore});

Update: This article previously indicated to use the add operation to add a term to a specific field. That function, however, will result in duplicate entries as of this writing. The bump operation is now what I recommend to use. End Update

As you can see here, the update() function is the one I most care about. This function has several operations (specified by the “op:” property above) available to it:

  • add
  • update
  • remove
  • bump

The header comment in the FormHistory.jsm file details the specifics for these operations, as well as other functions. Hopefully these simple examples will help someone out. It wasn’t immediately clear to me how to use this module when I first switched over.

Things I Learned Using Stack Overflow

February 2, 2012

In my last post, I complained about my initial experience with Stack Overflow. I decided to give myself 30 days with the service, to see whether or not I warmed up to it. Now that those 30 days are over, I will be posting several of my thoughts and observations. This first post won’t be about the site itself; instead, it will cover some of the things I learned during my 30 days. A second upcoming post will cover some problems I think exist with the Stack Overflow model, and my final post will provide a few suggestions for how I think things can be improved.

Let me first say that I learned a lot simply by browsing the site. Reading existing questions and their answers was fascinating, at least for the programming topics I care about. Some of what I learned came through mistakes I made attempting to answer open questions. Other bits of information just came through searching the web for the solution to someone’s problem (something that a lot of people at Stack Overflow are apparently too lazy to do). Without further ado, here’s a list of stuff I learned, in no particular order (each item lists the corresponding language):

C (with GNU Extension), PHP (5.3+)
The true clause in a ternary compare operation can be omitted. In this case, the first operand (the test) will be returned if true. This is a bizarre shortcut, and one I would never personally use. Here’s a PHP example (note that there’s no space between the question mark and the colon; in C, a space is necessary):

$a = $b ?: $c; // No true clause (too lazy to type it, I guess)
$a = $b ? $b : $c; // The above is equivalent to this
Regular Expressions (Perl, PHP, possibly others)
The $ in a regular expression doesn’t literally match the absolute end of the string; it can also match a new-line character that is the last character in the string. Pattern modifiers are usually available to modify this behavior. This fact was a surprise to me; I’ve had it wrong all these years!
Bash
I found a terrific article that details the differences between test, [, and [[.
Firefox Extensions (XUL, JS)
You can use the addTab method in the global browser object to inject POST data to a newly opened tab.
Perl
The way I learned to open files for output in Perl (over a decade ago) is now not advised. It’s going to take a lot of effort on my part to change to the new style; old habits, and all that.

# Old way of doing it (how I learned)
open OUT, "> myfile.txt" or die "Failed to open: $!";

# The newer, recommended way (as of Perl 5.6)
open my $out, '>', "myfile.txt" or die "Failed to open: $!";
Comments Off on Things I Learned Using Stack Overflow Tags: ,

Using the nsIFind Interface

October 3, 2011

In a recent update to Googlebar Lite, I made a number of improvements to the search term highlighting feature, fixing several bugs along the way. This feature uses the nsIFind interface available in Firefox, which is poorly documented in my opinion. Unable to find any decent examples, I picked apart another extension I found that uses this interface, and I now better understand how it works. As such, I thought I’d provide an example of how to use this interface so that future developers won’t have to dig down in the source like I did.

Read the rest of this entry »

Getting Form Data With PHP

September 19, 2011

A couple of years ago, I blogged about two helper functions I wrote to get HTML form data in PHP: getGet and getPost. These functions do a pretty good job, but I have since replaced them with a single function: getData. Seeing as I haven’t discussed it yet, I thought I would do so today. First, here’s the function in its entirety:

/**
 * Obtains the specified field from either the $_GET or $_POST arrays
 * ($_GET always has higher priority using this function). If the value
 * is a simple scalar, HTML tags are stripped and whitespace is trimmed.
 * Otherwise, nothing is done, and the array reference is passed back.
 * 
 * @return The value from the superglobal array, or null if it's not present
 * 
 * @param $key (Required) The associative array key to query in either
 * the $_GET or $_POST superglobal
 */
function getData($key)
{
    if(isset($_GET[$key]))
    {
        if(is_array($_GET[$key]))
            return $_GET[$key];
        else
            return (strip_tags(trim($_GET[$key])));
    }
    else if(isset($_POST[$key]))
    {
        if(is_array($_POST[$key]))
            return $_POST[$key];
        else
            return (strip_tags(trim($_POST[$key])));
    }
    else
        return null;
}

Using this function prevents me from having to do two checks for data, one in $_GET and one in $_POST, and so reduces my code’s footprint. I made the decision to make $_GET the tightest binding search location, but feel free to change that if you like.

As you can see, I first test to see if the given key points to an array in each location. If it is an array, I do nothing but pass the reference along. This is very important to note. I’ve thought about building in functionality to trim and strip tags on the array’s values, but I figure it should be left up to the user of this function to do that work. Be sure to sanitize any arrays that this function passes back (I’ve been bitten before by forgetting to do this).

If the given key isn’t found in either the $_GET or $_POST superglobals, I return null. Thus, a simple if(empty()) test can determine whether or not a value has been provided, which is generally all you care about with form submissions. An is_null() test could also be performed if you so desire. This function has made handling form submissions way easier in my various work with PHP, and it’s one tool that’s worth having in your toolbox.

MySQL and Localhost Performance

April 5, 2011

I ran into an interesting phenomenon with PHP and MySQL this morning while working on a web application I’ve been developing at work. Late last week, I noted that page loads in this application had gotten noticeably slower. With the help of Firebug, I was able to determine that a 1-second delay was consistently showing up on each PHP page load. Digging a little deeper, it became clear that the delay was a result of a change I recently made to the application’s MySQL connection logic.

Previously, I was using the IP address 127.0.0.1 as the connection host for the MySQL server:

$db = new mysqli("127.0.0.1", "myUserName", "myPassword", "myDatabase");

I recently changed the string to localhost (for reasons I don’t recall):

$db = new mysqli("localhost", "myUserName", "myPassword", "myDatabase");

This change yielded the aforementioned 1-second delay. But why? The hostname localhost simply resolves to 127.0.0.1, so where is the delay coming from? The answer, as it turns out, is that IPv6 handling is getting in the way and slowing us down.

I should mention that I’m running this application on a Windows Server 2008 system, which uses IIS 7 as the web server. By default, in the Windows Server 2008 hosts file, you’re given two hostname entries:

127.0.0.1 localhost
::1 localhost

I found that if I commented out the IPV6 hostname (the second line), things sped up dramatically. PHP bug #45150, which has since been marked “bogus,” helped point me in the right direction to understanding the root cause. A comment in that bug pointed me to an article describing MySQL connection problems with PHP 5.3. The article dealt with the failure to connect, which happily wasn’t my problem, but it provided one useful nugget: namely that the MySQL driver is partially responsible for determining which protocol to use. Using this information in my search, I found a helpful comment in MySQL bug #6348:

The driver will now loop through all possible IP addresses for a given host, accepting the first one that works.

So, long story short, it seems as though the PHP MySQL driver searches for the appropriate protocol to use every time (it’s amazing that this doesn’t get cached). Apparently, Windows Server 2008 uses IPV6 routing by default, even though the IPV4 entry appears first in the hosts file. So, either the initial IPV6 lookup fails and it then tries the IPV4 entry, or the IPV6 route invokes additional overhead; in either case, we get an additional delay.

The easiest solution, therefore, is to continue using 127.0.0.1 as the connection address for the database server. Disabling IPV6, while a potential solution, isn’t very elegant and it doesn’t embrace our IPV6 future. Perhaps future MySQL drivers will correct this delay, and it might go away entirely once the world switches to IPV6 for good.

As an additional interesting note, the PHP documentation indicates that a local socket gets used when the MySQL server name is localhost, while the TCP/IP protocol gets used in all other cases. But this is only true in *NIX environments. In Windows, TCP/IP gets used regardless of your connection method (unless you have previously enabled named pipes, in which case it will use that instead).

PHP and Large File Sizes

March 28, 2011

It’s incredible to me that in 2011, programming languages still have problems with files larger than 2GB in size. We’ve had files that size for years, and yet overflow problems in this arena still persist. At work, I ran into this problem trying to get the file size of very large files (between 3 and 4 GB in size). The typical filesize() call, as shown below, would return an overflowed result on a very large file:

$size = filesize($someLargeFile);

Because PHP uses signed 32-bit integers to represent some file function return types, and because a 64-bit version of PHP is not officially available, you have to resort to farming the job out to the OS. In Windows, the most elegant way I’ve found so far is to use a COM object:

$fsobj = new COM("Scripting.FileSystemObject");
$f = $fsobj->GetFile($file);
$size = $file->Size;

Uglier hacks involve capturing the output of the dir command from the command line. There are two bug reports filed on this very issue: 27792 and 34750. The newest of these was filed in late 2005; a little more than 5 years ago! It’s sad to see a language as prolific as PHP struggling with a problem so basic. Perhaps this issue will finally get fixed in PHP 6.

Calls to system() in Windows

March 22, 2011

I recently ran into a stupid problem using the system() call in C++ on Windows platforms. For some strange reason, calls to system() get passed through the cmd /c command. This has some strange side effects if your paths contain spaces, and you try to use double quotes to allow those paths. From the cmd documentation:

If /C or /K is specified, then the remainder of the command line after the switch is processed as a command line, where the following logic is used to process quote (“) characters:

  1. If all of the following conditions are met, then quote characters on the command line are preserved:
    • no /S switch
    • exactly two quote characters
    • no special characters between the two quote characters, where special is one of: &<>()@^|
    • there are one or more whitespace characters between the two quote characters
    • the string between the two quote characters is the name of an executable file
  2. Otherwise, old behavior is to see if the first character is a quote character and if so, strip the leading character and remove the last quote character on the command line, preserving any text after the last quote character.

As you can see from this documentation, if you have any special characters or spaces in your call to system(), you must wrap the entire command in an extra set of double quotes. Here’s a working example:

string myCommand = "\"\"C:\\Some Path\\Here.exe\" -various -parameters\"";
int retVal = system(myCommand.c_str());
if (retVal != 0)
{
    // Handle the error
}

Note that I’ve got a pair of quotes around the entire command, as well as a pair around the path with spaces. This requirement isn’t apparent at first glance, but it’s something to keep in mind if you ever find yourself in this situation.