Assume the Opposite: January 2012

Monday, January 30, 2012

RAID redo

I spent some time this weekend rebuilding the family file server, which is a Linux box. The last time I rebuilt it was several years ago, and at that time I figured I should set it up with (software) RAID 5 to avoid the hassle of having the recover from backup if a disk failed. This worked great. A few years ago a disk did fail, I bought a new one, plugged it in, it rebuilt and everything was good.

Similarly, a few years back at work I configured our new VMware ESXi box with an eight disk RAID 5 array (hardware RAID). Last year a disk failed, and on that machine I didn't even have to power it down. I yanked out the old disk, hot-plugged the new, and the machine didn't miss a beat.

So, RAID 5 is wonderful, right? Well, the time between the disk failure and disk replacement was somewhat stressful. In both cases, the disk couldn't be replaced immediately. The disk in my home server failed the night before I left on a trip, so I couldn't replace it for two weeks. And the new disk for the work machine had to be ordered and took some time to arrive. There was this gap where there was no redundancy. In both cases there were backups, but restoring from backup takes a lot more time than just plugging in a disk, and I realized that I really, really didn't want to waste my time setting up machines when simply providing a little more redundancy would have removed the need. “You can ask me for anything you like, except time.”

So the home server needed a bit of maintenance (for example, the root volume was low on space) so I figured while I was doing that I would reorganize the server and take some extra time to fix the redundancy problem, moving to RAID 6 on the four disks. RAID 6 would allow two disks to fail without loss of data. I'd lose some space but the extra redundancy would be worth it. Why RAID 6 over RAID 10? Well, RAID 6 provides better error checking at the expense of some speed.

This is what I did to prepare:

Took an LVM snapshot of the root partition and copied that snapshot as an image to an external drive. Why an image? Sometimes the permissions and ownership of files are important, I like to preserve that metadata for the root partition.
Copied the truly critical data on the root partition to another machine for extra redundancy. The existing backup process copies the data offsite, which is good for safety but not so good for quick recovery, so I wanted to make sure I didn't have to use the offsite backup.
Copied the contents of the other partitions to the external drive. The other partitions don't contain anything particularly critical so I didn't feel the need for redundancy there.
Zero'd out all the drives with dd if=/dev/zero of=/dev/sdX. Some sites suggested this was important, that the Linux software RAID drivers expected the disks to be zeroed. It seems unlikely but it didn't cost me anything to do it. There was an interesting result here, though: the first two drives ran at 9.1Mb/s, while the second two ran at 7.7Mb/s. If I recall correctly there are three identical drives and the one I replaced which is a different brand, so it isn't a drive issue but rather a controller issue: the secondary controller is slower.

Now that the machine was a blank slate, I set it up from scratch:

Start the Debian 6.0.3 installer from a USB key.
In the installer, partition each of the four disks with two partitions: one small 500M partition and one big partition with the rest of the space (~500G).
Set up RAID 1 across the small partitions (four-way mirroring).
Set up RAID 6 across the large partitions.
Format the RAID 1 volume as ext3, mounted as /boot.
Create an LVM volume group called “main” and added the RAID 6 volume to it.
Create a 5G logical volume for /.
Create a 5G logical volume for /home.
Create a 10G logical volume for swap.
Create a 20G logical volume for /tmp.
Create a 200G logical volume for /important.
Create a 200G logical volume for /ephemeral.
Tell the installer that this machine should be a DNS, file, and ssh server and let the installer run to completion.
Copy the important files to /important and the ephemeral files to /ephemeral.
Configure Samba and NFS.

So why this particular structure? Well, Linux can't boot from a software RAID 6 partition, so I needed to put /boot on something that Linux could boot from, therefore the RAID 1 partition. The separate logical volumes are primarily about different backup policies. The 5G size for / and /home is to limit growth (these volumes will be backed up as filesystem images) and 5G fits on a DVD for backup, in case I want to do that at some point. Swap of course needs to be inside the RAID array if you don't want the machine to crash when a disk fails: yes Linux knows how to efficiently stripe swap across multiple disks but a disk failure will cause corruption or a crash. The 20G volume for /tmp is so that there's lots of temp space and it's on a separate volume so backup processes can ignore it. The /important volume contains user files that are the important data and can be backed up on a file-by-file basis (as opposed to / which is backed up as an filesystem image). The /ephemeral volume contains files that don't need to be backed up. All filesystems have the noatime mount flag set, and they're all ext4 except for /boot which is ext3.

If you're counting you'll note that there is still a lot of empty space in that LVM volume group. There are several reasons for this:

Some empty space is required if I want to make an LVM snapshot, so I never want to use up all the space.
I frequently make additional temporary volumes for a variety of purposes.
If I need to expand any particular logical volume, there is room to do so.

Monday, January 16, 2012

Safe Facebooking with Chrome

I don't like the idea that every page on the web with a Like button will tell Facebook that I've browsed to that page. But at least that information is anonymous... so long as I'm not logged into Facebook.

I used to just not stay logged in, only logging into Facebook for “Facebook sessions” in incognito mode, use a different browser and clear the history, etc or similar such mechanisms. But I found this post about Chrome's certificate pinning where Chris describes how to “Twitter Like A Boss”. This inspired me to run a new Chrome process with a separate profile for Facebook, using this command line (via an alias):

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome

  --user-data-dir=$HOME/.mb/chrome-safe-browsing/facebook

  --disable-plugins

  --proxy-server=localhost:1

  --proxy-bypass-list='https://facebook.com,https://*.facebook.com,https://*.fbcdn.net,https://*.akamaihd.net'

  https://facebook.com/

With no line breaks, of course.

This isolates Facebook's cookies into a separate profile, which prevents my general web browing in another Chrome instance from being tracked under my Facebook login. It also disables browsing any other sites (following a link will fail, you need to copy/paste into another browser), forces all Facebook connections to use SSL, and disables plugins.

Note that this doesn't use incognito mode (I want to stay logged in now that my everyday browsing isn't affected) and it doesn't use certificate pinning. The main point was to stop leaking information to Facebook. I may get around to figuring out the certificates to pin at some point, but really I'm hoping that a better solution will arise to the problem that certificate pinning is addressing (which is not to say that certificate pinning hasn't already proved effective in critical scenarios).

It's harder to do the same thing with Google because I use so many Google services. The core principle is to use a separate browser instance for each login for corporations like Facebook and Google that have code on so many third-party web pages; this is one way of doing that which happens to have additional security benefits (forcing SSL etc).

Tuesday, January 10, 2012

Adventures in taxation: Apple's App Store versus Google's Android Market

I recently went through the process of registering to distribute paid applications in both Apple's App Store and Google's Android Market. I could talk about those processes, but that's not what this post is about. Wait, I couldn't talk about Apple's process anyway, as that agreement explicitly forbids discussing said agreement.

Anyway, this post is about taxes, not about how fast Google's approval process was (pretty fast) or how slow Apple's was (longer but not really that long), or about how Google did the bank account verification but Apple didn't, or about how Apple actually walked me through the banking information whereas Google just said "enter all the numbers"... ok, really, I'm not talking about those items.

I am not a tax lawyer or accountant, so I may well be completely wrong here, but my research so far indicates to me that if I put an app in an appstore (or sell a subscription to web-based service) I am required to charge GST/HST to customers based in Canada. If the customer is not in Canada, I do not need to charge any tax. So, how do I arrange to do that?

With Apple, the path is to fill out the appropriate paperwork such that Apple can collect the taxes and submit them appropriately. Once they have processed those forms, Apple will charge the GST/HST on any paid apps I sell and submit that tax to the Canada Revenue Agency. I no longer have to worry about it.

With Google, it isn't so simple. I need to tell Google the rules about how much tax to charge depending on the location of the buyer and I'll have to deal with submitting the tax to the Canada Revenue Agency. I suspect this works better for larger companies (where Apple's one-size-fits-all mechanism doesn't work) but as a simple one-location seller Apple's approach is simpler for me. I wonder how Apple deals with organizations that have multiple locations and therefore don't have simple rules for which tax to charge where?

Now I need to figure out whether I can just charge GST for Google or whether I need to actually have a different per-province rate. Since Google seems to support per-province tax rates, I'm thinking that it is the second. I mean, if you look at the rules, they're quite simple. Here's an excerpt (from GST/HST Technical Information Bulletin B-103 [53 pages]):

If the Canadian rights in respect of a supply of intangible personal property (other than a supply of intangible personal property that relates to real property or tangible personal property or a supply of intangible personal property that relates to services that is deemed to be made in a province based on the place of supply rule that is explained in Part IV of this section), can only be used primarily (more than 50%) in the participating provinces, the supply is proposed to be made in a participating province if an equal or greater proportion of the Canadian rights cannot be used in another participating province.

I applaud the Canada Revenue Agency for the crystal clarity of their online documentation. But I think I'll seek professional advice, just to be safe.

Saturday, January 07, 2012

Storage Spaces: Linux LVM for Windows?

When I first read about the Windows 8 Storage Spaces feature, I thought, awesome, Linux LVM capabilities for Windows! More specifically, a combination of the multiple device (MD) management in the kernel and the logical volume manager (LVM).

But then I realized that I hadn't read anything about atomic snapshot capabilities, which is my favorite feature of LVM2 on Linux. Atomic snapshots make consistent hot backups trivial. Snapshots are the top reason I set up LVM on every Linux box I create. Without snapshots, Storage Spaces seems significantly less interesting.

Hopefully Storage Spaces will have atomic snapshot capability and they just aren't talking about it yet.

Friday, January 06, 2012

Google gives itself a red card

So to prove that they are a fair referee for search ranking, Google has given the Chrome team a red card and effectively banned the main Chrome page from the first page of rankings for 60 days.

I approve. If this were some non-Google site the punishment would seem out of proportion to the offense — there was only one link that passed PageRank and it seems pretty clear that Google didn't intend the campaign to create such links — but since this is Google itself, they do need to hold themselves to a higher standard. And hey, they deserve to be given a penalty just for the incredibly poor quality of the campaign: articles that say nothing about Chrome or indeed anything at all.

Will this hurt Chrome in the short term? I believe so. As a computer geek, I already use Chrome most of the time and I'd never need to do a search to find a browser. But I'm not most people. Most people reading this blog aren't “most people”. “Most people” don't know a lot about browsers and may well discover Chrome through a search.

Will this do any real damage to Google or Chrome in the long term? I don't think so. I think Google has responded well overall. As for Chrome, Apple and Microsoft aren't particularly interested in the web as such, so Safari and Internet Explorer aren't keeping up with Chrome. Mozilla has lost its way, instituting policies such as rapid automatic updates (like Chrome) without accepting the corollary that the updates have got to be transparent (unlike Chrome). I think there is a culture issue at Mozilla. And Google understands security better than the rest, with silent automatic silent updates for Chrome from the beginning and extra security measures such as public key pinning. Chrome has growing mind share due to all this and I think Chrome's market share will continue to grow for some time, or at least until a competitor changes path significantly.