Wednesday, January 25, 2012

Oh dear oh dear O2

References

Today hasn't been a good day, all in all. No sooner than I had posted on the depressing state that is Google- than O2 disclose that they are sending my mobile number through to any website that I care to navigate to via my iPhone's data connection.

The fact that it was discovered today led to a veritable twitstorm on the O2 twitter feed, and the eventual blog post seemed fairly contrite as to the reasons behind it. They put it to a configuration error which accidentally led to the phone number being sent to more sites than they had intended.

This doesn't excuse the fact that there is no reason for O2 to be sending my phone number to anyone, whether they intended it or not. I don't see why my phone number is necessary – as they put it – to verify my age for certain sites. And it's not like such a header couldn't trivially be faked, nor of a phone number being any kind of guarantee of an age in any case (unless they're also sending a lot more data to those allegedly trusted providers as well).

They claim that this is "standard practice" in the industry, which makes you wonder how many other internet data providers are doing much the same thing, except they haven't cocked it up as badly as O2 have yet. It was also instructive to find out that O2 are mangling data that goes through their network; for example, this blog post highlights how O2 are in-lining style information causing semantic errors in the page itself. You can determine it for yourself by going to http://mnementh.csi.cam.ac.uk/atimport/ – if you see a red background with "Test" on it then it works as expected; however, if you browse it from an O2 equipped data phone then you'll see they have in-lined the referred CSS then it will read "If this text renders in a browser then something is very broken"

Perhaps the best thing to come out of this is the highlight on the practices that O2 are doing, and encourage the use of a VPN connection to a server which is not going to cause any further problems. O2, it seems, cannot be trusted with my business and therefore I will have to take it into my own hands.

You get what you pay for

References

There has been a disturbing trend in the force recently, of companies and their applications taking a significant turn for the worse without proper design and consideration. First, Google pulled a GMail and decided that White was the new Black, and in order to not spoil an otherwise minimalist white canvas threw in a few lines of text, before the screw up squad took out Reader and Blogger too.

Last weeks design-for-idiots trend was Twitter moving awya from the highly acclaimed (and successful) AteBits Twitter neé Tweetie application, onto a design which “Optimises for consistency” across different websites and devices.

Yes, that's right – they have managed to snatch defeat from the jaws of victory. Instead of a well designed and fluid iOS interface, it's now the same piece of junk you see on an Android phone.

Let's put aside for the moment flamewars on which device is best, and focus on the reality; they are different devices, with different styles and behaviours. Tweetie even introduced the concept of pull-to-refresh, which became such a signature gesture that it's now used on many iOS based applications. The thing is, the toolbar was optimised for the things the user does frequently; like tweets, direct messages, and the standard icon used to denote more options available.

Now, we have an application whose logo sits in the top ~10% of the screen and is immobile. That's right, it stays there like a wart on the bottom of humanity, refusing to budge. What good does it do there? You know you are using the official Twitter app, because it now looks like cack, and it looks like cack on other devices as well. There's consistency for you.

To seal the deal, the new “Discover” tab has affectionately been renamed the dickbar tab in homage to last time.

Fortunately, there are alternatives. Whilst Twitter is attempting to generate a revenue stream from waving its dickbar in everyone's face, other clients have quietly but politely focussed on the right kind of in-your-face design. Tweetie may be no more, but Tweetbot is on sale right now for $1, and has some nice user interface aspects to it; or rather, it doesn't have a dickbar. Other clients also exist but they're all better than the current Twitter app.

However, the point I wanted to make was that in this age of the internet, you get what you pay for. Google is going out of its way to push Plus in front of everyone's face (PLUS!) whether they like it or not (PLUS!). They've also made it glaringly obvious (PLUS!) just so that your eye is drawn away from the main purpose (the article) and towards the share button (which, let's face it, is not what you're there for).

The problem with Google is that, as the applications are hosted in the cloud, Google has the final call over which version of the software I am using. This is good in many ways – any security updates or upgrades will happen automatically, without my involvement.

The downside is that Google can also remove functionality which I use regularly, such as the ability to read posts without a glaring PLUS on the screen in the low-key icons bar.

However, the recent turn of Google ramming Google+ down everyone's throats is the real concern. Thanks to a upcoming policy change the ever present Google will tie all of your accounts and services together, and allow you to leak data. (Remember the whole buzz fiasco a few years back, when they introduced the cheesy like symbol and then subscribed all of your private messaging contacts to your publicly visible profile?) Only not content with world social domination, they are now tying in all the videos you have ever watched on YouTube with documents you have written on Docs to blog posts you have commented on Reader.

The new policy comes into effect on the 1st of March, and by that time, my intention is to wean myself off any Google services that I may be using. I've already stopped using their search – after all, Focus on the User shows that Google results on their own are no longer to be trusted – and DuckDuckGo is my go-to search engine (remember when Google just did search? Like that.)

The other key service I need to replace is Reader. It has – until the UIdiots moved in – been the best news feed aggregator on the web, mainly because it allowed me to keep a track of what I'd read from across multiple computers. However, these days I mostly consume it from the iPhone (it's the only one they've not let the UIdiots near yet) so being able to track from multiple devices is no longer the need it once was.

Anyway, as the title suggests, you get what you pay for. And, by a happy coincidence, Google doesn't pay me to use their services. So Google – thanks for the memories. We had a great time together. But things have changed, and we've drifted apart. I still respect some of the good things you're trying to do for the web, and as an advertising behemoth you are not likely to falter any time soon (unless Apple finds a bit of loose change behind the sofa and buys you out). But I can no longer continue to support you by providing you with all my data whilst you data-rape me and splash it across externally visible profiles. Instead of gaining a social network participant, you've lost a customer. And you know what? If ever I need to find something again – I'll just ask my friends.



Tuesday, December 20, 2011

Git Tip of the Week: Finale

References

Over the past nine months, I've been writing a series called the Git Tip of the Week series, where I write about a Git-related article every week. Part of this has been a desire to learn more about the way the Git internals works; part is to provide a reference for others to find out about as well.

However, all good things must come to an end, and writing a weekly post, whilst keeping it fresh, is a non-trivial task. In addition, finding something new (or even vaguely interesting) to write about is increasingly difficult once you've covered the standard cases and a number of esoteric ones.

So, for my final post in the series, rather than writing about something new, I thought I'd link back to the ones that I've written before, in order. Thus, if you want to share the series with others, you can refer back to this index page as a means of finding out what I wrote when. The search list is all well and good, but it only shows the top 20 most recent posts. In addition, I've added some of my other Git related articles that may be of interest, even though they weren't part of the Git Tip of the Week series.

I'd like to thank you for your time and interest in reading this series, and wish you a happy Christmas and a prosperous New Year.

Tuesday, December 13, 2011

Git Tip of the Week: Forking and Pulling vs Pushing

References

This week's Git Tip of the Week is about the GitHub generation. You can subscribe to the feed if you want to receive new instalments automatically.


Distributed Version Control Systems have really taken off in the last few years, though they've been around for over a decade. Probably the biggest growth spurt happened because of the controversy that launched Git back in April 2005, providing a rock solid distributed version control system, modelled on a filesystem. In only a few months, Git began hosting the 2.6.12 Linux kernel source.

However, as popular as Git may have been then, it wasn't until the birth of GitHub until Git really took off. Founded in February 2008, GitHub brought Git to a much wider audience and provided a free hosting site for public Git repositories (as well as commercial plans for private repositories.) It has been argued that GitHub is one of the reasons why Git has taken off faster than others, like Hg and Bzr.

What GitHub brought was a focus on a new model; instead of creating patches (as covered last time), GitHub encouraged universal forking of the repository. So, if you want to add a change to an existing repository, you can fork it (and create your own clone), make the changes, and then send a pull request.

There's nothing significant about pull requests in the Git workflow as compared with other DVCS tools. Pushing and pulling are two key primitives in a DVCS workflow, after all. But what was novel about GitHub's approach was the way that pull requests could be sent, as an out-of-band message to the upstream repository owner suggesting the idea.

Not only that, but the upstream owner would then get a notification and be able to view the request in situ, and with the diffs as appropriate (outside of a mail client, via the web interface). Subsequent advances, such as the ability to fork-to-fix-typos, meant that anyone could suggest changes via the web without even needing to compile the code locally.

Pushing, Pulling, Patching or Proposing

As a result, Git repositories can end up with different workflows depending on the type of project and hosting environment you are using. They are:

  • Pushing: You have access to directly write into the repository, so you just push your changes
  • Pulling: Someone has changes locally and asks you to pull the change from their repository
  • Patching: You send the diffs/patches by a transport mechanism (bugzilla, email) for consideration
  • Proposing: You use a tool like Gerrit to propose changes to which can then subsequently be merged

Each project has potentially a different style of operation, and there isn't a "right" way to use a Git repository. GitHub, for example, strongly favours the Pulling model when consuming changes from others (though of course, the repository owner can do pushing directly). The Linux Kernel, both for historic reasons and also for transparency and open discussions, chooses the patching (by e-mail) model.

The final one – proposing – is a combination of both the pull, push and patch models. They're similar to GitHub's pull mechanism, in that the project's owners can see a list of all incoming changes and decide which ones to use; but the push-based upload means that the original repository doesn't have to be forked on the remote server. And finally, tools like Gerrit (which I've mentioned before) can be used to generate patches, host in-situ discussions, and even act as a Git repository for consuming by standard git fetch protocols.

GitHub's pull-based approach has certainly had a wide impact on the number of users willing to try that method. They have a note on collaborative development models on the subject:

  1. The Fork + Pull Model lets anyone fork an existing repository and push changes to their personal fork. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination.
  2. The Shared Repository Model is more prevalent with small teams and organizations collaborating on private projects. Everyone is granted push access to a single shared repository and topic branches are used to isolate changes.

Certainly, if there are minor changes (like a typo in documentation) the fork-and-pull model, when combined with a web-based interface, can make things dramatically easier for contributors. Instead of having to need to create accounts on bug tracking systems (or tools like Gerrit), the repository can be forked, fixed, and a pull request fired off to the repository maintainers. With the merge button in GitHub, it can often be the case of allowing the fix to be merged in without the maintainer having to check the code out at all, if it's sufficiently simple. Reducing the barrier to accepting changes helps keep an active open-source project alive and open to all.

The only problem with the Fork + Pull model is being able to attribute changes by user. For example, some open-source foundations want to ensure that any changes are granted against an existing open source license (Apache or Eclipse, for example). Other projects tend not to be as strict and will happily accept contributions from anyone, with the assumption that any contributors have agreed to the license. One additional service that patches-to-bugzilla or gerrit push provide is in the acceptance of a contributor agreement, which normally states that the individual is entitled to grant the code under the specific licence. One of the side-effects of creating an account often implies (explicitly or implicitly) the agreement to follow that foundation's licensing rules.

So, there's no "right" way to do Git; different teams, foundations and projects will have their own preference for working with a particular strategy, and may evolve over time. Instead, it's useful to know what's available so that the right choice for that project can be made, understanding the different flows available.


Come back next week for another instalment in the Git Tip of the Week series.

Tuesday, December 06, 2011

Git Tip of the Week: Patches by Email

References

This week's Git Tip of the Week is about how git handles patches by email. You can subscribe to the feed if you want to receive new instalments automatically.


One of the main benefits of a distributed version control system is that code changes can be pushed from one repository to another clone, and all dependent changes are pushed as well. However, that only works when you have write access to the remote repository, which in many cases you do not. One way of getting changes is by providing a patch, or a set of changes which can be applied to a remote repository at the other end.

Git started life as a distributed version control system for the Linux project, which actively uses mail lists both as a discussion mechanism and also as a distribution mechanism for patches (changes) for an existing codebase. (New features are just a special case of patching nothing to add the new code.)

To speed the processing of patches by mail, git developed tight integration with both (command-line) mail clients and of the generic Unix mbox format. Patches can be generated in the form of mail messages, and the remote end can process them with a specific command to reconstitute the changes in the git repository.

Whilst the majority of projects don't use patches by mail as a change distribution mechanism, it is useful on occasion where either a patch needs to be generated and attached to a bug tracking system, or where changes need to be sent to a remote developer who doesn't have direct access (such as through a firewall).

The convention adopted by the git developers is to format one patch per e-mail message. The subject of the message then has the first line of the git commit, prefixed with a prefix that can be overridden on the command line but which defaults to [PATCH x/y] as a means of threading them together. (Amongst other reasons, this is why the initial line of a Git commit message is suggested to be relatively short, so that it fits with a mail client's view of the subject and suggested prefix.)

Generating and sending patches

How do we generate these patches? The git format-patch will generate a patch-file-per-commit in the range required, formatted ready to go as mail messages in mbox format. The --to can be specified for which mail address the patches should be sent to – but the sending is done separately.


(master) $ git format-patch --to cdt-dev@eclipse.org HEAD~..HEAD
0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch
From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001
From: Andrew Gvozdev …
Date: Sat, 16 Jul 2011 15:16:21 -0400
Subject: [PATCH] bug 333001: Description Scanner Info doesn't release
 ICProjectDescription
To: cdt-dev@eclipse.org

---
 .../cdt/internal/core/model/CModelManager.java     |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
⋮

If the commit message has more detail than a single line, the detail will be included below the mail's subject headers. It's possible to add additional commentary below the commit message, before the patch is shown, and any text up until a combination of --- or >8 or 8< (AKA 'scissor lines') is ignored by the patch application at the other end.

These patch files can then be transmitted via mail using the git send-email command. This connects to the given SMTP server (either the one from your global ~/.gitconfig or the project's .gitconfig, or the one specified on the command line) and then sends each patch file as a separate e-mail:


(master) $ git send-email --smtp-server=smtp.gmail.com *.patch

As well as using format-patch in a separate stage, it's possible to use send-email to generate the patches and then send them immediately. (You can also configure send-mail to prompt to open an editor so that you can customise the messages before they are sent.)

Applying patches

Once the patches have been created, how do you apply them into a local clone? If you have a patch file, you can apply it with git apply:


(master) $ git apply 0001-bug-333001-Description-Scanner-Info-doesn-t-release.patch

Note, however, that this approach does not recreate the state of the world as it was on the sender's repository. Instead, the patch is applied but it only makes local changes to the repository's content instead; it does not recreate the commit (and more specifically, the hash of that commit). You can specify git apply --index and git apply --cached to get the changes put into the staging area, but this does not recreate the same commit as before.

To recreate the commit as it was exactly requires the use of git am, which stands for apply mailbox. This runs through a mailbox (which may have one or more patches in it) and recreates a commit for each one of those patches.

Fortunately, the output generated by git patch is already in mbox format; it's the purpose of the otherwise dummy From 9c9c692df50e5a9eb91b41cc86f57212afd78ef9 Mon Sep 17 00:00:00 2001 line at the top of the patch file. As a result, the patches can be treated as one message per mbox, and then applied in batch to the changes which get sent.

In fact, since mbox elements can be concatenated together, this permits patch files to be concatenated together to form a larger patch file, which can be sent as a single unit via another transfer mechanism and then applied on the remote side.

Bundles

Patches provide a way of reconstituting a repository over a not directly connected mechanism, but the purpose of patches are to enable humans to investigate the set of changes as much as getting the change there. If however the desire is to move commits from one machine to another without direct connectivity, a better alternative is to use git bundle.


(master) $ git bundle create changes.bundle HEAD~..HEAD
Counting objects: 23, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (12/12), 935 bytes, done.
Total 12 (delta 5), reused 6 (delta 0)

The format of the bundle uses the same format as the network transmission that Git uses over the network when cloning. As a result, the references contained are only those listed in the reference list.

Typically, a tag will be used to mark where the last known point was for the remote source; then, the difference between HEAD and that tag is used to build up the bundle for the remote end. Alternatively, branches can be used to simulate the branch on the remote end.

Once the bundle file has been generated, it can be sent over any transport to the remote host for reconstitution. This might involve burning to a CD, via a USB stick or some other network protocol.

On the client side, the client can run git verify to determine if all required parent commits are present in the local repository. This must be run from the client git repository that you want to fetch into.

The client views the bundle as a remote that it can pull from, much like a path to a directory can be used to pull from a local file-based repository. You can add it as a remote (e.g. git remote add changes /tmp/changes.bundle) or you can fetch from the path to the bundle itself:


(master) $ git verify /tmp/changes.bundle
The bundle contains 1 ref
b707c559636bf8e6dffb3145bd44b03de18868b3 HEAD
The bundle requires these 1 ref
3580c1087c2860fbe6ca4c1a7a6d6e1eb1669aa3 Bug 333599 - [C++0x] Initializer lists & return without type
/tmp/changes.bundle is okay
(master) git fetch /tmp/changes.bundle
From /tmp/foo.bundle
 * branch            HEAD       -> FETCH_HEAD

Once the references have been fetched into the repository (which can be referred to as FETCH_HEAD) you can then inspect the changes, fetch/merge them into the local branches or reset your master branch to that of FETCH_HEAD.

Summary

It's not always possible to have write access to the repository you want to send changes to. In those cases you can send changes out of band, either via mail (if you want human reviews) or as a bundle (if you just want to send the commits).


Come back next week for another instalment in the Git Tip of the Week series.