Steam Gauge - A Technical Overview

A Quick Note: This is the second of a two-part post. This post covers the technical details of Steam Gauge. The other post covers my personal experience and observations building it and can be found over here.

Occasionally I get emails asking about how Steam Gauge is put together, so I've put together this (non-exhaustive) overview/retrospective on how it was originally built as well as how it grew/changed over the years.

The Host and Server

Steam Gauge runs on a shared hosting plan with DreamHost. The benefit of this is a fixed hosting cost - my bill is the same every year (no matter how many apps I spin up), so I know what to plan for and expect. The downside to this is that the server resources are capped, which sometimes results in inconsistent performance. Since I do not currently generate revenue from Steam Gauge, I consider this an acceptable trade-off. The shared server runs Apache with various useful software packages preinstalled (but not generally changeable by a given user). Steam Gauge runs as a dedicated user on its server; initially, I did not realize that DreamHost allocated shared server resources on a per-user basis, so when one of my sites would go down, all of my other sites would go down as well (fortunately, this was easily remediated, and all of my domains are now isolated from each other in that regard).

By default, the Python version available to your server is locked to whatever happens to be installed (typically 2.5 or 2.7). Additionally, your user permissions are, of course, limited to your directory on the server. This means that things like the Python package manager/installer, pip, may not always function correctly. This limitation can be almost entirely mitigated by the use of virtualenv, which allows you to install whatever software you may need in an isolated virtual environment (I cover this in a tutorial here).

Python

Steam Gauge now runs on Python 3, but at launch was running 2.7.5. Because Python 2 handled Unicode strings and non-Unicode strings as distinct datatypes, many early Steam Gauge bugs/headaches arose from improper handling of special Unicode characters that sometimes appeared in app titles and developer/publisher names. I originally attempted to coalesce non-Unicode characters to Unicode, but that approach resulted in many malformed data entries. The complexity of handling strings in Python 2 (as well as the abandonment of support for many Python 2 packages) was one of the primary motivations for rebuilding the app in Python 3. Migrating to Python 3 (where all strings are Unicode) did solve these issues, but many of the supporting packages used in the previous iteration of the Steam Gauge app did not have support, did not have equivalent packages, or changed their functionality significantly in Python 3. While it required a bit more work, navigating these dependencies in Python 3 actually allowed me to shrink the footprint of Steam Gauge's dependencies significantly by replacing them with hand-written code or removing them entirely.

Flask and Bootstrap

Steam Gauge makes use of (the often under-appreciated) Flask. Flask is a Python-based web app framework similar to Django or Rails (if Ruby is more your thing), but much more minimal. I like it for it's ease of use and lack of pre-solving problems you may not even have. Flask makes use of the Jinja templating language which is not terribly dissimilar to Embedded Ruby and so is pleasantly unsurprising and easy to use without sacrificing utility.

The original design of Steam Gauge was also fairly minimal and bare, but did attempt to mimic the feel of Steam's own site. Eventually I scrapped the old design, which had become a bit inconsistent and cumbersome, and implemented Bootstrap as a base style for the site. This simplified maintaining support for mobile devices and allowed me to focus on further backend improvements while giving me a good starting point when I'm ready to tackle design of the site again.

Structure

As I've mentioned before, Steam Gauge is probably more accurately described as three apps. The first two are presently entangled in the same codebase and serve to 1) route traffic and render the website and 2) fetch data (from APIs and the database) and deliver it to the site templates. One of the downsides of this configuration is that if one of the data requests fail for some reason, the whole Steam Gauge request fails and the user is returned a (typically generic, vague, and unhelpful) error page. This is because all API requests are handled as one job - if one fails, the whole request fails. Moreover, if the template logic fails to handle the data correctly, the whole request can fail. If your account library has a large number of games in it, a Steam Gauge request can work for quite some time before failing, which can be incredibly frustrating. My fix for this is to separate the API calls currently working in a batch in the backend into client-facing endpoints. Once the account ID is resolved, I have enough information to render an account page. At which point, subsequent calls can be made to update page information as needed. This would reduce the wait (to some degree) for the user before a page load happens and as well as making it easier to return more specific and meaningful error messages.

The third "app" scrapes data from the Steam store pages to close the information gaps in Valve's Steam APIs. This scraper (which is not open-source) initially utilized the urllib2/mechanize packages to load store pages, and BeautifulSoup to parse the page HTML, but now uses Selenium and PhantomJS to achieve the same goal more consistently (and with better support). Because the scraper represents actual traffic on the Steam website, I've attempted to minimize the number of page loads with various optimizations including only retrieving data from apps that are "new" (i.e. apps that have been added to Steam since the last scraper update), although it can also update individual apps on-demand as needed. I've begun working on adding columns to the database tables to keep track of the "freshness" of the retrieved data so the scraper can also ensure older titles that have been updated in the store get checked as well (a task which is currently done manually when reported by users).

The Database

Steam Gauge originally did not use a database - all the app entries were stored in a plaintext static document as comma-separated values (cue horror). I suspect this actually minimized app overhead in regards to memory, but it was actually done that way because I had had no expectation of much attention at the time, and naively thought it would be easier to throw up a static file than to build everything out in a proper database. This approach predictably had many issues, not least of which was poor ease of maintainability (never mind a complete disregard for the ACID data model). The static file was quickly scrapped in favor of MySQL, which happened to be the relational database system I was most familiar with at the time.

What's Next?

With a Python migration, refactors, and redesigns in the rearview, Steam Gauge's next improvements will likely be Quality of Service updates. I'd like to get the database updates (via the scraper app) to a place where I feel comfortable automating them with a cron job, and maybe get some much needed unit testing up and running so I can utilize some test-driven development methodologies. Of course, what people have asked for is completely another thing. In addition to cleaning up the account data summaries, I'd like to do some data visualizations to enhance the #DataAreBeautiful vibe of the app; I've worked with d3.js to create some fun, interactive visualizations for my various employers over the years and would love to so some of that work in an open-source environment.

All said, however, despite having a lot of ambition for features I'd like to add to Steam Gauge, I don't have as much time as I used to, so I'll probably not have the opportunity to do everything (unless some investors out there want to fund development? </crazy_grin>). Even so, I plan to continue to make updates to Steam Gauge when and where I can for as long as people continue to use it.