Movable AND/OR Statements

Recently at work I wrote a messy bit of JavaScript (jQuery really) to create movable AND/OR statements. I learned how to code less than a year ago and haven't done any intense programming (yet, just wait), so this is probably the most difficult, headache-inducing, neck-wringing problem I've worked on to date. The last time I encountered this much trouble was when I learned about recursion and tries. I figure I'll outline my thought process here in case it's useful to anyone else.

The assignment

The goal of these AND/OR statements was to allow users to perform searches with custom parameters. On the right side of the page would be two lists: one containing the available parameters (Location, Category, Subcategory) and the other containing the two operators (AND, OR). On the left two-thirds of the page would be where the user would drag the parameters and operators to build a full search statement.

Full, well-formed search statements would have this format: Location OR Category AND Subcategory

Making elements sortable is fairly simple. Two words: jQuery UI. No, that wasn't the hard part at all. The hard part was following the rules given to me, which involved much tweaking of the sortable interaction and, as always, fighting with the DOM.

The rules:

  • Parameters should never be next to parameters, and operators should never be next to operators, e.g., it shouldn't be possible to build statements like Location Category AND OR Subcategory.

  • Operators should never be at the beginning or ending of a statement, e.g., AND Category AND Subcategory OR. Pretty much anything that doesn't make sense in natural language shouldn't be possible.

  • When a parameter or operator (for brevity's sake I'll call them "items" from now on) is dragged from the sidebar into a statement, the sidebar should repopulate with the missing item.

  • Parameters can be dragged into statements, and operators can be dragged into statements. All other dragging between lists should not be allowed. In other words, a parameter can't be dragged into the list of operators, an operator can't be dragged into the list of parameters, and blocks in statements can't be dragged back into the lists (to be honest, I implemented this last rule to make it slightly easier for me). Also, items can't be dragged from one statement to another.

  • Since items in statements can't be dragged back into the list, there needs to be a way to clear a statement and start over.

  • Each statement begins with an empty dotted box, to indicate that a new item can be placed there. As items are placed in the statement, the dotted box should move over to the right until the statement reaches its item limit. The item limit is five, as there are only three parameters and two operators.

  • Items can be dragged within a statement, but again, parameters should never be next to parameters, and operators should never be next to operators.

  • Within a statement, if an item is dragged over a like block, it should switch with the like item. In other words, if "Location" is dragged over "Category," the two items should switch places.

  • When parameters are placed into a statement, they should become editable input boxes, so that the user can enter her preferred location, category, and subcategory.

  • When a user types into the "Location" input box, this should trigger an autocomplete place search. (I'd been playing with the Google Places API the week prior to this for my WeHo Eat Mo project, so this was fairly easy to implement.)

That's about it. I created some of the rules as I went along based on the issues I faced while interacting with the interface. Bugs just kept on cropping up. (In fact, while writing this, I discovered a few more. Code is never finished.)

If, else, and nothing else

The code I ended up writing is almost entirely made up of if–else blocks. Here's the nested conditional monster that controls the "sortability" of items (variables are defined at the beginning of the sort):

function sortRules(event, ui) {
        targetParent = ui.item.parent();
        targetParent.children('.ui-state-dotted').remove();

        // If the to-list is a statement and (1) the from-list is a item list or (2) the to-list and from-list are identical
        // Items can be rearranged within a single statement and dragged from a item list to a statement
        // Items cannot be dragged from one statement to another or from a statement to a item list
        if (targetParent.hasClass('fullStatement') && (originalParent.hasClass('piece'))) {
                // Add a dotted box (implying that another item can be added) if the total statement length is shorter than five items
                // If the total statement length is equal to five items, don't add another dotted box, but don't cancel the sort, either
                if (targetParent.children('li:not(.delete):not(.ui-state-dotted)').length < 5) {
                        targetParent.append('<li class="ui-state-dotted"></li>');
                // Cancel the sort if the statement length has six or more items (including the current item)
                } else if (targetParent.children('li:not(.delete):not(.ui-state-dotted)').length >= 6) {
                        $(this).sortable('cancel');
                }

                // Add a delete button if the statement does not already have it
                if (targetParent.children('.delete').length === 0) {
                        $('<li class="delete">x</li>').insertBefore(targetParent.children('li:first-child'));
                }

                // Prevent like items from being placed next to each other
                // i.e., operators should never be next to operators, and parameters should never be next to parameters
                // Check if the item has the same class as the item before it or the item after it
                // or if the item is an operator and (1) the previous item is not a parameter (e.g., operator or undefined),
                // (2) the next item is an operator, and (3) the next item is undefined
                // This prevents operators from being placed next to each other, at the beginning or end of a statement,
                // but allows them to appear before a dotted box
                if ((ui.item.attr('class') === ui.item.prev().attr('class')) || (ui.item.attr('class') === ui.item.next().attr('class')) || (ui.item.hasClass('operator') && (!ui.item.prev().hasClass('parameter') || ui.item.next().hasClass('operator') || ui.item.next().attr('class') === undefined)) || ui.item.prev().hasClass('ui-state-dotted')) {
                        $(this).sortable('cancel');
                        checkCanceled(targetParent);
                // Otherwise, if the sort arrangement is allowed, continue with the sort
                } else {
                        if (ui.item.hasClass('parameter')) {
                                ui.item.html('<form class="handler"><input type="text" class="search-input" name="search-input" value=""></form>');
                                ui.item.css('padding', '5px');
                        }
                        if (originalText === 'Location') {
                                ui.item.children('form').children('input').attr({
                                        'id': 'locationInput',
                                        'placeholder': 'Enter your city'
                                });
                                locationSearch();
                        } else {
                                ui.item.children('form').children('input').attr('placeholder', originalText);
                        }

                        originalParent.append('<li class="' + ui.item.attr('class') + '">' + originalHTML + '</li>');
                        deleteParameter(targetParent);
                }
        } else if (targetParent.attr('id') === originalParent.attr('id')) {
                // Grab the new target index of the item (after the sort ends)
                newIndex = ui.item.index();
                // Swap the current item with the item at its target index
                // if the target item is an even number away from the current item,
                // so that only like items are swapped with each other
                if (Math.abs(originalIndex - newIndex) % 2 === 0 && (ui.item.hasClass('operator') && !ui.item.prev().hasClass('operator'))) {
                        ui.item.next().insertAfter(originalParent.children('li').eq(originalIndex));
                } else {
                        $(this).sortable('cancel');
                        checkCanceled(targetParent);
                }
        // Prevent items from being dragged from statements back into item lists
        } else {
                $(this).sortable('cancel');
                checkCanceled(targetParent);
        }
}

Yes, this function consists of one if–else block with...I don't even know how many nested if–else blocks inside of each of those conditions. It's terrifying. But it does the trick. Eventually I'd like to refactor this using a hash table, if that's possible. To see my progress on this project, check out my source code on GitHub.


Read-Only Text: How to Write Every Day

Three months ago I decided to start writing consistently.

Since then, I've worked on my novel every single day without fail. In the past, I would have used any sort of negative feeling – laziness, exhaustion, simply "not wanting to write" – as my cue to step back and let writer's block take over, but now I fight through it. All I have to do is start.

Before, I thought of writing every day as some monstrous, impossible feat. But I've found that it's just the opposite. Writing only when you're inspired is ridiculously hard compared to writing daily. Inspiration doesn't strike often, and if you rely on it to dictate your writing schedule, then you're not going to write very often. And when you don't practice, you stagnate.

So if you want to write, don't let yourself stagnate. Here are the rules I came up with to jumpstart my writing habit.

RULE #1: Write a sentence every day

When creating a habit, start small. One sentence is easy to do, and it's a fluid unit: it can be anywhere from one to thousands of words. It doesn't even have to be grammatically correct.

In seven days, up your daily number of sentences by one. Repeat for each week that passes. When you've found your ideal number of daily sentences, stop adding to it, and use that number as your permanent minimum.

Reasoning: This rule helps you to build a writing habit, however small, into your daily schedule. The key is that you don't skip any days. Skipping even one day makes it easier to skip the next.

RULE#2: Don't even think about editing

This is extremely important. Unless you're Thomas Pynchon, you will write terrible sentences. And it's okay. That doesn't mean you're a terrible writer – it's just a natural consequence of writing so frequently. Some days your output will be brilliant. Some days you'll look at what you wrote before and want to kill it with fire. But whatever you do, don't touch any of your previous days' work.

You're free to edit what you've written today. However, when 24 hours is up, so is your editing access to those sentences. I like to think of work from previous days as being read-only.

What if you've written a bunch of sentences in the past few days that don't fit with your new vision of the story? Forget about them! Go forth and write new sentences that do fit with your vision. You can align them when you're done with the entire story.

Reasoning: One of the fastest ways to kill your writing habit is to get caught in a line-edit loop. I had this problem for many years. I would write something, hate it, and rewrite it ad nauseum until it was "perfect." I couldn't finish my stories because I kept destroying my ideas right after creating them. For more on this topic, read Scott H. Young's excellent How to Fuel a Creative Flow.

This rule helps you to reach the finish line. When you've finished an entire draft – and only when you've finished an entire draft – you can edit to your heart's desire.

RULE #3: Begin where you left off the previous day

Treat your scenes as if they were read-only, too. You can't write a new sentence in the middle of a paragraph from a previous day. If you choose to do this anyway, then that sentence doesn't count.

Reasoning: Rule #2 prevents destructive editing; this rule prevents constructive editing. It ensures that each new sentence is subsequent, which increases the likelihood that the new sentence will push the story forward.


I've been following these rules for twelve weeks. Today is the first day of week 13, which means I'll write a minimum of 13 sentences every day this week. At this rate, I should have written 546 sentences by now. Instead I've written 34,043 words, which, at an average of 15 words per sentence, translates to about 2270 sentences, over four times what I projected.

For reference, before I started this habit, the novel clocked in at 49,091 words. It took me well over a year to write that much. Within the past three months, I've written 70% of that amount, and despite my focus on quantity, I've also made strides in quality.

I highly recommend this sentence-based system to anyone who wants to write. For more ideas on how to write every day, check out these articles:


Analyzing Rap Lyrics with Python

On Friday, my company held a personal project hack day. I used the opportunity to run a quick-and-dirty experiment based on a question I've had on my mind for months: What's the most beloved car brand in hip-hop?

In this post, I'll explain how I came up with a possible answer to that question. To skip to the results, click here.

Why hip-hop / why cars

Hip-hop is my favorite genre of music. On the road, I tune the radio to 93.5 KDAY; otherwise, I'm plugged into my Pandora account, which is heavy on Golden Age, Midwestern, East Coast, Bay Area, and (recently) indie rap. Rap is also lyrically dense, which makes it a great source for text mining.

I got curious about cars because rappers talk about them--a lot. So I decided to find out once and for all which brand they love the most, with Chevy (the brand I kept hearing) as my hypothesis.

The lyrics mining process

To find the answer, I needed a sizeable sample of car brand names, a searchable database of hip-hop lyrics, a script to search for the brands within the lyrics database, and a way of displaying the results. Fortunately, all the necessary tools were readily available:

Gathering the data

Wikipedia blocks web scrapers, understandably. So I manually downloaded all of Wikipedia's lists of car manufacturers and brands by country. To reduce the amount of HTML I would have to parse, I selected the Mobile View for each page.

I downloaded ten files in total: a general list of car manufacturers plus individual lists for China, France, Germany, Italy, Japan, Spain, Sweden, the United Kingdom, and the United States.

Cleaning the data

Wikipedia's Mobile View pages contain search bars at the top and the usual "See also," "References," and "Read in another language" links. I started by erasing these (manually again), reducing the files to structural markup and the lists themselves.

With the Python Beautiful Soup library, I extracted the brand name tags by selecting all list elements. Then I used NLTK to remove the HTML tags and regular expressions to remove excess whitespace and text (such as notes about the brand, the brand's years of operation, etc.). This is what I came up with:

def clean_wikilist(filename):
    # open saved html file
    html = open(filename).read()

    # collect bulleted items only
    bullets = SoupStrainer("li")

    # make soup out of the bulleted items
    soup = BeautifulSoup(html, 'lxml', parse_only = bullets).prettify()

    # remove html from soup
    raw = nltk.clean_html(soup)

    # remove extra lines
    raw = re.sub(r'\n \n \n \n \n', r'\n', raw)
    raw = re.sub(r'\n \n \n', r'\n', raw)

    # create and clean tokens
    tokens = raw.split('\n')
    tokens = [re.sub(r'^\s+(?=[\S]+)', r'', token) for token in tokens]
    tokens = [token for token in tokens if not re.findall(r'\[[0-9]+\]|\([\S\s]+[\(\)]?|^\s+$|^[\s\[\]\(\)0-9]+$', token)]
    tokens = list(set(tokens))

    return tokens

After running my script on all ten lists, I had a whopping 2599 brand names. So I decided to limit the set to Germany, Japan, the UK, and the US. The pages for Germany, the UK, and the US separate current brands from defunct brands, so for those countries I used current brands only. The Japan page mixes current and defunct brands into one list; to save time, I used all of them.

These four pages have slightly different structures. The clean_wikilist() script worked nicely for Japan, but captured too much information on the others, so I wrote three additional scripts. Here's the one for Germany:

def autos_ge():
        # open saved html file
        html = open('autos-ge.html').read()

        # create soup object
        soup = BeautifulSoup(html)

        # select current major manufacturers
        majors = soup.select('span.mw-headline')
        majors = [w for w in majors if w.parent.parent.previous_sibling.contents[0]['id'] == 'Current_major_manufacturers']
        major_tokens = [nltk.clean_html(str(w)) for w in majors]
        major_tokens = [re.sub(r'\[\s\S\s\]', r'', token) for token in major_tokens]

        # select current minor manufacturers
        minors = soup.select('li')
        minors = [w for w in minors if w.parent.parent.previous_sibling.contents[0]['id'] == 'Current_minor_manufacturers']
        minor_tokens = [nltk.clean_html(str(w)) for w in minors]
        minor_tokens = [re.sub(r'\s\(\S+\)', r'', token) for token in minor_tokens]

        # combine lists
        tokens = list(set(minor_tokens + major_tokens))

        return tokens

One notable difference between this and the original script is the usage of Beautiful Soup. In clean_wikilist(), the desired elements are selected first, with SoupStrainer, and then used to create a list of the matching HTML tags in unicode format. autos_ge(), on the other hand, creates a Beautiful Soup object out of the entire page; the desired elements are selected via DOM traversal.

The number of brand names from this limited dataset? Just 178.

Analyzing the data

The Rap Lyrics Database contains lyrics for all of Billboard Music's rap songs from 1989 through 2009. It's the only searchable database of hip-hop lyrics (exclusively).

The result pages share the same URL, with the search term appended to the end: http://research.blackyouthproject.com/raplyrics/results/?all/1989-2009/. This made it much easier to automate the search and saving process.

def rap_search(auto_list):
    # search for each brand name
    for brand in auto_list:
        url = 'http://research.blackyouthproject.com/raplyrics/results/?all/1989-2009/' + word

        # save the search results page
        results_html = urllib2.urlopen(url).read()

        # save it as a file named after the brand
        results = word + '.html'

        with open(results, 'w') as results_file:
            results_file.write(results_html)

This saved 178 HTML pages, each named after the appropriate brand search term, onto my computer. I also searched for known nicknames of the brands (e.g., "Bimmer/Beemer/Beamer" for BMW and "Chevy" for Chevrolet).

I used Beautiful Soup again to count the number of results on each page:

def count_rap_results():
        # for all html files in current directory
        for filename in os.listdir('.'):
        if filename.endswith('html'):

        # select song titles
        html = open(filename).read()
        soup = BeautifulSoup(html)
        songs = soup.select('.title')

        # count number of song titles
        count = len(songs)

        # write brand names and number of songs into a text file
        with open('count_rap_autos.txt', 'a') as counter_file:
                counter_file.write('%s%15d\n' % (filename[:-5], count))

This got me a pretty messy-looking text file of each brand and the number of songs in which it was mentioned:

Lea-Francis               1

Ewing      0

Efini         0

Scion       0

Tommy Kaira            15

...

It turned out that the Rap Lyrics Database doesn't recognize spaces. So, a search for "Art and Tech" became a search for "Art"--which of course is a popular word that is often used in a non-automobile context. I removed ambiguous names from the list and combined the results from brands and their nicknames. LibreOffice Calc was helpful in fixing the columns and sorting the results.

The final number of usable brands came out to 153.

Who won?

Mercedes-Benz, with 93 song mentions--and remember, that's only counting a small segment of rap songs between 1989-2009.

Jeep came in second, at 34 songs. Then came Cadillac with 25 songs, and finally Chevy at 24.

So, Chevy isn't as popular as I expected. But the biggest shocker is Jeep. I can't recall a single song that mentions Jeep.

To display the results, I made a graph with Google Charts. (I'd add them here, but I've yet to learn how to embed JavaScript into reStructuredText.)

Pain points

I'm new to programming. While I enjoy it very much, I spend about 90% of my time immersed in pain. The text analysis took me four nights (Monday through Thursday) to complete. On the actual hack day, I made the graph and web page, with a lot of help from Bootstrap. Along the way, I encountered many problems:

  • Selecting specific subsets of HTML tags with no classes or ids.

  • Accidentally passing list items with the newline character to the rap_search() script, resulting in 178 filenames split off from their extensions. Fortunately there was an easy fix:

    for filename in os.listdir("."):
            if '\n' in filename:
            os.rename(filename, re.sub(r'\n', r'', filename))
  • Reformatting the final list of brands into an HTML table.

  • Reformatting the final list of brands into a list of dictionaries to create a graph with JavaScript InfoVis Toolkit.

  • Not knowing how to build a non-stacked bar graph with the InfoVis Toolkit.

  • Switching to Google Charts and reformatting the final list of brands into a list of lists to create a Google Charts graph.

Notes for the future

More complete data

I slashed the set of brand names to less than 7% of its original size and conflated "car manufacturers" with "car brands." Wikipedia also has a lengthy list of automobile marques, which I didn't even touch.

I'd like to go deeper than brands, into the actual names of cars, and match them against an even bigger database of lyrics. RapGenius and the Last.fm API are possible alternatives to the Rap Lyrics Database. RapGenius has an excellent database, but it contains a significant amount of lyrics from non-hip-hop artists as well.

Semantic orientation

I equated "beloved" to "number of songs mentions." This is obviously not always the case, as rappers name-drop plenty of things they dislike. It's true that rappers generally mention cars in a positive manner, but a more accurate experiment would take into account not just how many times the brand was used, but in what way the brand was used--i.e., the semantic orientation of the brand. A sentiment analysis might be the way to go.

Multi-word brands

The Rap Lyrics Database turns up blank if you search for, say "Aston Martin" (with the quotes and the space), even though Aston Martin is mentioned in a few songs. So multi-word brands with spaces in them turned up short. (Mercedes-Benz doesn't have this issue because it has a hyphen, not a space.)

If I were to use the Rap Lyrics Database again, I'd have to search for "Aston" and "Martin" separately and compare the songs on each results page. Otherwise, RapGenius seems to do spaces nicely.

Nicknames and duplicates

I searched for "Mercedes-Benz" as well as just "Mercedes" and "Benz." However, again, because I didn't compare song names, I ended up nixing the counts for the nicknames to minimize the possibility of duplicates. I also missed some nicknames--for instance, I completely neglected "Caddy" and "Lac" (sorry, Cadillac), "Lex," etc.

Misspellings, plurals, etc.

I did search for "Beamer" and "Beemer," but there's also "Bima" and probably countless other misspellings of "Bimmer" and other car brands. I ignored plurals, whether spelled correctly ("Bimmers") or not ("Bimaz").

Disambiguation

Many car brands double as common words or unrelated proper names, e.g., Prince, Radical, Ram, MINI, Oakland. I discarded these instead of determining whether or not they were referring to the car brand.

Last thoughts

A few months ago, when I first got the idea for this project, I thought it would be ridiculously hard. I envisioned building a large corpus of hip-hop lyrics and determining the classification, meaning, and orientation of each word to uncover the truth.

After going about this somewhat backwards, I think my initial impression remains correct. I'm happy I finally took a stab at the project, and I'm excited to continue working on it. This is just the first of a long series of experiments (and blog posts)!

Source files on GitHub


GitHub for Linux Mint

Version control terrifies me.

But you know what else terrifies me? Losing files. I've lost enough of them and endured enough filename headaches (story-1.doc, story-2.doc, story-final, story-new.doc, story-newEST.doc, etc.) in my lifetime to realize that version control is extremely important, and I need to develop better version control habits.

So, I added the files for this site to my GitHub account. The following guide details the steps I took, minus the legion of errors I made. I've named it "GitHub for Linux Mint," in honor of my operating system and the fact that GitHub offers GitHub for Mac and GitHub for Windows, but no GitHub for GNU/Linux (even though, hey, some of us GNU/Linux users could use a little point-and-click help over here).

Note: Afterward, I realized I could have had a much easier time cloning the repository. This command is identical to the "Clone in Windows" and "Clone in Mac" buttons in the corresponding GitHub GUI applications:

git clone <URL>

Connect your computer to GitHub

First, you need a secure method of connecting your computer to GitHub. You can do this through SSH (generate a new public/private SSH key pair and add it to your GitHub account) or through HTTPS (configure the Git credential helper).

Prepare the branches

Create a new repository on GitHub. This is the remote branch.

Fire up your command line. Move to the directory containing your version control files (if you're not there already), e.g., cd sah.

Initialize a Git repository in this directory. This is the local branch.

git init

Check the status of your new repository.

git status

(or git st if you've set up aliases). This is optional, but I like checking the status to see which branch I'm working on and which files will be included in the next commit.

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       .doit.db
#       1.txt
#       README.txt
#       cache/
#       conf.py
#       conf.pyc
#       conf.py~
#       files/
#       galleries/
#       listings/
#       new_site/
#       output/
#       posts/
#       stephanieahiga.wordpress.2013-05-05.xml
#       stories/
nothing added to commit but untracked files present (use "git add" to track)

Right now there's "nothing added to commit," meaning the local repository is empty. This is because you've just initialized it.

Start tracking your files

Next, add version control tracking to your files. To add individual files, write git add and then the filename. To add everything in the directory to your commit:

git add .

Now it's time to commit the change. The basic commit command is git commit, which will launch your preferred text editor. The -m flag allows you to skip the text editor and type your commit message inline; the -a flag tells Git to automatically stage your tracked files, skipping the staging area.

git commit -a -m "Added Nikola site files"

See Git Basics - Recording Changes to the Repository for more information on committing changes.

Merge and sync your branches

Merge the remote branch (the GitHub copy of the repository) with your local branch (the local copy of the repository). This consolidates the two branches and ensures that they are identical.

git pull https://github.com/sahiga/sah.git

You should see a message like this:

From https://github.com/sahiga/sah
 * branch            HEAD       -> FETCH_HEAD
Merge made by the 'recursive' strategy.
 README.md |    4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 README.md

Note: git pull automatically merges commits from the "pulled" branch into your current branch. This works fine on an empty GitHub repository, but it could cause merge conflicts.

Push your commit on the local branch to the remote branch:

git push origin master

If you're using HTTPS, GitHub will prompt you for credentials. To switch to SSH, follow these instructions.

Note: I originally attempted git push origin master before git pull, resulting in a "non-fast-forward" error. This error occurs when the local branch is behind the remote branch, and therefore Git can't push the local commits without losing commits on the remote branch. GitHub has an article on non-fast-forward errors that I wish I'd found when I made this mistake.

The two branches are now in sync! Check the commit history in your local branch with git log. The most recent commit should have a "Merge" attribute, identical to what you'll find in the "Commits" tab in your GitHub account:

    commit e9bcc38554dd930b4bd1f557e45c92f8f65e0a98
    Merge: 3e653b4 96d590b
    Author: sahiga
    Date:   Sat May 4 23:51:39 2013 -0700

Merge https://github.com/sahiga/sah

Tutorials for Git and GitHub:


Goodbye, WordPress! Hello, Nikola!

It's official: I've finally migrated my personal website from WordPress to Nikola, a static site generator. In my 53432099809th attempt to return to writing, I'll use this bit of nerd news to jumpstart what I hope will become a regular blogging habit.

The quest for static

WordPress gets a lot of hate in the development community. Let me say upfront that this is absolutely not the reason I decided to leave it (I have the softest of spots for WordPress). It's just that one day I woke up and realized I didn't need a database and a gazillion server queries to post text on my website.

So I started looking into static site generators. A static site generator does exactly what's written on the box: it creates a static site. It combines the best of Web 1.0 (fast, portable, secure HTML pages) with the best of Web 2.0 (templates). I considered:

  • Jekyll + Octopress: Hooks directly into GitHub Pages and is apparently very good. But I don't know Ruby, so I kept looking.

  • Hyde: A Python port of Jekyll. Ultimately, I passed on it because of the lack of documentation. When it comes to technology, this Luddite needs a lot of hand-holding.

  • Pelican: Also Python-based, with complete, well-written documentation. I installed it and got lost while trying to generate files. Pelican looks like a great choice, but I think it's meant for people who are much more tech-savvy than I am.

Shortly after I installed Pelican, someone posted a link to Nikola on Hacker News. I'm not sure why it didn't come up in all my searches for Python-based static site generators, because so far it's proven to be a dream come true. The documentation is amazing. It's easy to use and in active development. And it was named after Nikola Tesla. (Huge Nikola Tesla fan here.)

Goodbye, WordPress

Nikola has an import_wordpress function to ease the transition to WordPress. I grabbed an XML dump of my site (Tools > Export > All Content in the WP admin panel) and saved it in my Nikola site folder.

I installed the Python requests package (import_wordpress depends on requests):

sudo pip install requests

Then I ran import_wordpress on the file:

nikola import_wordpress stephanieahiga.wordpress.2013-05-05.xml

This created a new folder, new_site, within the Nikola folder. I had no posts and only a handful of pages in my WordPress database, so I recreated the pages manually using the page command: nikola new_post -p. Similar to WordPress, Nikola differentiates between posts and pages: posts are in the posts folder and pages are in the stories folder.

Finally, I uninstalled WordPress from my domain via SimpleScripts on my cPanel.

import_wordpress saves all posts and pages with the .wp extension (read by my computer as WordPerfect files). The XML dump, and therefore the folder created by import_wordpress, save the contents of the WordPress database, but not the theme files. So it's important to backup any custom themes before uninstalling WordPress.

Tutorials on importing WordPress posts into Nikola:

Hello, Nikola

I assume that the next step is to upload the Nikola site folder to my domain server. If you're reading this post, you can assume that my assumption turned out to be true.