Where I End and ADHD Begins

There’s a gap in between
There’s a gap where we meet
Where I end and you begin

And I’m sorry for us
The dinosaurs roam the earth
The sky turns green
Where I end and you begin

Where I End and You Begin – Radiohead

ants on the ham sandwich of the inner self

The ADHD brain goes where it pleases. I have very little control of the meanderings of my stable-state brain and as a result my actions can sometimes surprise even me. I always assumed that this was just ‘me’, after all what are we other than how we behave? Then I was diagnosed with ADHD and all that got called into question. When you get diagnosed with something like ADHD a tonne of questions come flooding in like ants into a picnic basket and crawl around the crevices of your consciousness demanding to be swatted lest they eat the ham sandwich of your inner-self. If ADHD suppresses my impulse control, what does that say about my free will? What the hell even is free will (I may tackle this at another point)? Which of my personality attributes are ‘me’ and which are symptoms of my chronic neurological disorder? What does ‘me’ even mean (another one for the backburner)? Do I get to feel less shame about stupid stuff that I’ve done? Can I still take credit for the awesome stuff that I’ve done when some of them were obviously the result of ADHD? What the hell is going on?!

There are some weighty questions there, and ones that concern every human being (if only notionally for many people) and one presumes other sentient beings. These Questions are too weighty to cover in anything less than a car weighted treatise, so I’ll try and tackle a slightly leaner question – where do I end and consequently where does ADHD begin?

The thing is, ADHD affords me a few superpowers – a very broad knowledge base created by my ever wondering attention, the ability to make obscure connections, outspokenness (even when not necessarily welcome), a rabid, insatiable appetite to learn, boundless energy, and many others. There are, of course, flipsides to all these which are far from ‘super’ and various other residual difficulties that I won’t burden you with – I could take either case to make my point, but let’s stay positive eh? My superpowers have lead me to do lots of interesting and wonderful things, but some of these are clearly a direct result of classic ADHD behaviours. So if you discount all of that what is left that I can feel proud of? What exactly is left of me? If you could treat all the ADHD away what would I be left with?

It’s a theme I’ve tackled alone, and with others, on many occasions since I had my diagnosis. It’s not often that you get told that the features by which many that know me (including myself), those that friends and family would most strongly associate with Alex-ness, are symptoms of a neurological misconfiguration. It’s also an (unfounded) worry when you first start to take medication – will I suddenly stop being me and metamorphose into some dull automaton? I don’t think I’ve quite got to the bottom of it, indeed I don’t think you can without tackling the “what is ‘me’?” question, but I think I’m in a reasonably sure state of mind on it. It goes a little something like this.

There are all kinds of brains, and every degree of paisley patterned, technicoloured gradiation between them, but various notable edge cases emerge within the spectrum of broadly normal brain (by which I mean excluding brain damage and profound brain disability etc.). Some of these are ADHD, Autism Spectrum Disorder, Dyslexia and Bi-polar to name a few. Broadly speaking, people with these conditions (at least, the lucky ones that aren’t at the extreme edges) can cope reasonably well in society and rarely get singled out as in any way different. You could describe the difference between the ADHD/Neurotypical/ASD/Dyslexic brain as like the differences between computer operating systems like Windows vs Mac vs Linux – they all do roughly the same thing but have different strengths e.g. Windows is better for productivity, Mac is creative and Linux is for going deep. All the similarities are there though – windowed applications, mouse/trackpad control, random rebooting and missing files – but experiencing one tells you only a certain amount about experiencing any of the others. If you’ve only every used Windows, jumping on to a Mac will feel reasonably familiar for about 10 seconds, until the moment you try and do anything meaningful, at which point you will feel very much like you are losing your mind. The situation is worse with brains since you can only ever get to try out your own, or some version of it (you might try to hack the ‘OS’ with some drugs, or do an upgrade via a midlife crisis). From an objective perspective ADHD can only ever be observed and never experienced. The neurotypical brain is just as big an experiential enigma to me as my brain is to a neurotypical person, as is an autistic or dyslexic or colour blind brain to me and Madame Nuerotypical. Yet we find ourselves, as humans, philosophising and pontificating about the bat brain and what it must be like to ‘see’ in sound as if we’ve uncovered something profound. The experiential disparities between two identical twins must seem pretty profound if you’re one of those twins.

Then there’s the question of how I got this way in the first place. There is strong evidence that ADHD is hereditary (anyone who knew my dad could attest to this), but like all hereditary attributes, there is likely an environmental factor. So perhaps something happened to me at the age of 6 or 7 that took what was thus far destined to be normal brain and made it get all mixed-up. It’s plausible and there’s some evidence to back it up. But the possibility that my brain was normal at 6 and then went through some change, from a subjective viewpoint, is neither here nor there. The meds may stabilise my moment to moment experience, but my brain is still structured how it is, they’re not turning back the clock, just adjusting the time a little. No amount of drugs and conditioning and browbeating and guilt and shame is going to change that. If I did manage to change my brain (for better or worse) it is my current baseline that I start from, not that 6 year old brain. The person I could have become, if ADHD were purely environmental, never existed and never can exist. So why concern myself with him?

It’s further arguable that I didn’t get to any state. It’s unlikely that neolithic man had a specific label for people like me. In fact, this is a 20th century disorder and some would argue is only as widespread as it is, from a diagnostic perspective, because the hyper-connected, always-on 21st century world draws it out and exacerbates it. People with my wiring probably had a great time of it roaming the plains as part of a hunting party, only to be stifled by the advent of agricultural society with its relentless monotony and structure. That my brain struggles with the constant noise-dressed-up-as-signal that permeates every corner and crevice of contemporary existence says more about the external world which I inhabit, than my internal state and any perceived problem with it. But even before we all got plugged into the matrix, society was torturing the ADHD brain. Schools are a classic example of this. ADHD people thrive on participatory, practical, visual and direct methods of teaching. “Over there, that’s a gazelle, here’s a spear and a knife, watch this and follow!” The ideological obsession with teaching via structure, theory and rote learning is not particularly great for kids with normal wiring, but it is a special form of torture for the ADHD personality. These kids are climbing the walls because society base-lined on a teaching paradigm that is the antithesis of what ADHD kids need. To add insult to injury, lack of conformity, outspokenness, high levels of energy, and impulsiveness, all very ADHD behaviours, amount to nothing less than aberrant misbehaviour and are punished with an iron fist and worse, exclusion (let me give you a clue here, ADHD kids thrive on human interaction, so what exactly are you trying to achieve?). These behaviours are the bedrock of creativity and invention, and ADHD kids are frequently the provide the energy that is the antidote to the grey, monotonous reality of the modern classroom environment. These behaviours are actively (if often unknowingly) suppressed in the modern educational system. The system is stacked against people like me. I failed GCSE English twice. I would produce intricate works of creativity and maturity, and be cut down because of poor grammar, handwriting and spelling, things that are classic tripping-up points for ADHD kids (and which I still struggle with). Do I write like someone who is poor with the English language? (yes, there are probably typos and errors in this very piece, it comes with the territory, I assume that you can find it in your heart to forgive me?) For years I assumed I just couldn’t do ‘writing’. It took me years longer to realise the inverse was true. In that sense, it is not me but the system that has the problem.

And this is where things bottom out. I am not ADHD, in the same sense ADHD is not me. I do not define this condition and it does not define me. It is just a collection of attributes that a section of the population share to a highly consistent degree. These attributes can be highly beneficial in some situations, and profoundly detrimental in others. Unfortunately the situations in which it is detrimental are extremely common in the modern world. Behaviours attributable to ADHD can’t be compared to a bad mood as the result of a hangover. This disorder, for better or for worse, pervades every part of my internal and external persona. There is no “it and me”, there is just me. I have achieved the things that I have both despite, and because of my allegedly faulty wiring. Put another way, despite modern society’s tendency the frustrate and impede me, I still stand tall. I still manage. I still succeed. Despite the fact that for me it takes harder work and greater levels perseverance, and emotional energy than it does for most people. That I can be proud of. I have a superpower that most people don’t, it’s just hard to use sometimes. However, I am one of the lucky ones, most people with ADHD are considerably less fortunate.

Since I always assumed, in good faith, that I took various actions because I meant to, and still maintain the self-delusion of free-will, all the ADHD diagnosis does is give me is a framework for explanation – many behaviours that seemed out of the ordinary now make more sense. It wasn’t that I just wasn’t trying hard enough to fit in, or to succeed, or to pay attention or to follow through, but rather I am wired differently from most other people. This is me, all of it, for all the good, bad and bat-shit crazy.

So where do I and end and where does ADHD begin? The ADHD starts where I start and ends where I end. ADHD is dead, long live ADHD.

Do Small Things

You have an aunt, or a sibling, or a bezzie don’t you, who you love to pieces but is basically an anal douchebag? You know, the one that, when they walk into your house, tries to conceal palpable distaste at the general disorder and disarray. Like they’re suppressing the mental gag-reflex. You kinda don’t want them to come round, and actively avoid the situation if possible, but they keep inviting themselves because apparently they delight in bringing to your attention the physical manifestation of your chaotic ADHD mind – stuff in places where it has no right to be; evidence of daily/weekly chores procrastinated and postponed into festering piles and dusty sheens; the general sense that, while there is clearly a place for everything, that place is EVERYWHERE. This person thinks that they hide their disdain while making cheerfully wistful comments about the “shabby chic” of your “charmingly lived in” house, perching on the edge of sofa like it might swallow them up, and surreptitiously tidying things away to places from which they may never emerge.

You don’t want to live like this, but there is so much to do, and you have that assignment to do for the qualification you decided to take on a whim, and you need to buy some cheese making equipment because, you know, cheese making, and then there’s that short story to finish writing, all those magazine you bought at the weekend that won’t read themselves and, and, and, and…


You know all these boring chores need doing. You don’t need Lord/Lady Meticulous to project this down their nose at you. It’s like telling a homeless person to get off their lazy arse and get a job – patronising, ignorant and superfluous. Now, no one knows more than you that sorting out the kitchen would make your life so much easier (you have to wash a bowl and spoon EVERY MORNING just to be able to eat breakfast) but it is a BIG JOB. Clearing and thinning down that bookshelf really needs doing (books keep falling off onto your head) but it will TAKE AGES and risks making even more mess when you inevitably fail to finish the job. And then there’s the house work. Don’t mention the housework. It never ends. Every time you get some done more turns up! It’s best to just leave it and do it all as one BIG JOB once a week and then it’s out the way. But that’s such a bloody CHORE.

I hate chores (even the word is tiresome, boring and a bit grotty, like verruca or nasal). They suck. But I also don’t like living in squalor, regardless of how it might have appeared for most of my life. But how do you get round to doing all these BIG JOBS? The answer is actually quite simple – don’t. Don’t do the BIG things, just do small things.

You just thought “ah I see, you’re just another one of those patronising smart-arses like my mum/sister/cat”, didn’t you? But bear with me, I’ll rejoin you in the seething resentment in a short while. But first, here’s something you need to understand.

Stop lumping all the vaguely related tasks into giant unwieldy categories like CHORES, or SH*T THAT I DON’T WANT TO DO. By bundling all the small things into BIG CATEGORIES you conflate them and increase their collective intensity. Wasps are pretty much just annoying on their own, but if you’ve got a swarm of them, THEY WILL STING YOU TO DEATH. It may seem sensible to batch things up into tidy categories, and less stuff is always tidier right? That may be so from comfortable perspective of observing these categories from the outside, but once you delve into any one of them all you’ll find is a assemblage of vaguely related junk that’s gaining entropy and somehow breeding. It’s a bit like that drawer in the the kitchen that’s used to store “stuff”, there’s some things in there you’re certain you never owned in the first place. From whence came they?

Take a simple task – Cleaning a kitchen surface. Cleaning a kitchen surface is just that, purging a worktop of debris and grime. It is not the same as “make all the kitchen clean”. You may claim that “if I just clean that surface, then it’ll look weird and I JUST MUST clean the rest, so best not to start at all.” This is a valid objection, especially for an ADHDer. Not a lot of people know this, but ADHDers are perfectionists, it’s why they never get stuff finished, they set their sights too high. But in this scenario you need to take control of your inner obsessive and calm the voice that screams “I MUST CLEANING ALL THE KITCHEN WITH UNHINGED INTENSITY!” and instead, paradoxically, think about all the other stuff you’d rather be doing. The key here is that cleaning the kitchen surface is easy. It’s small. You can handle small things, right? Don’t conflate it with other small things unless you have reason to. Ask yourself instead, “why must I clean the kitchen surface?” and the answer you will find is “because I didn’t clear it down when I made that BIG SANDWICH earlier”. The small “kitchen surface” task is not related to the big MUST CLEANING KITCHEN task, it’s related to the MUST EATING BIG SANDWICH task from earlier. If you’d cleaned up after yourself you wouldn’t have this task getting in the way of whatever wacky adventure you’re on right now (probably just making dinner). “Ah!” you’re now screaming at your tablet/laptop/phone/cucumber, “you’re telling me to stay on top of stuff, no sh*t Sherlock, but I still MUST CLEANING ALL THE KITCHEN, before I can get myself into the position of staying on top of that task.” This is indeed an astute observation, and so we need some defining principles to get past this apparent impasse. Here’s what you need to do, and do habitually for the rest of your life (seriously, as long as you live. It’s not that big a deal though, keep reading, pleeeease):

  1. Do exactly one more action than you need to achieve any given task, every time you do a task
  2. Break down BIG JOBS into small tasks and only concern yourself with these tiddlers
  3. Make a specific time that is free to do stuff you don’t want to do, and work through your small tasks at that time
  4. Make a specific time to is free to do the stuff you do want to do, and use it frivolously, with impunity and without guilt

Continued after this short digression from my brain

Chronicles of an ADHD Brain Part 1

Let’s break those down a little shall we?

Item 1: Do something extra

Using the scenario stated above you would

  1. Clear/clean the surface
  2. Make the tasty treat
  3. Clear up after yourself

See what happened there? You got a task for FREE! Where you would usually only do 2 tasks (clean and make food) you now did three. So what happens next time you need a BIG SANDWICH? Regard:

  1. Make the tasty treat (since the surface is already clear)
  2. Clear up after yourself
  3. Put the dirty item you just used in the dishwasher

There, another job done – filling the dishwasher! Here’s some other examples of “buy one get one free” productivity magic.

  • Take the rubbish out to the bin when you take the dog for a walk
  • When you read an email, file it or delete it
  • Clear your desk while you’re on a boring conference call
  • When you make dinner, fill a sink of hot soapy water, and wash up as you finish using stuff

Now, you may be tempted to do a cheeky few extra tasks for each main task, and that’s cool, but be careful you don’t accidentally slip into “MUST DO BIG JOB WITH UNHINGED INTENSITY” mode. There will be plenty of other opportunities for claiming your free extra things, no need to gobble them all at once. Patience is needed. Stay calm. Mind like water. Etc.

Once you get into the swing of things, you’ll find that everything is done all the time and you’ll be free to explore the vagueries of crochet. Except, hold on…

Item 2: Make small numbers of monolithic BIG JOBS into a proliferation of small jobs

But what about those jobs that don’t sit snugly around daily routine, like clearing out the garage or, god forbid, THE GARDEN (shivvers cascade down spine). This is what item #2 is for. You’re never going to get around to the BIG JOBS, at least not until forced to (in a moment of weak will and unbridled enthusiasm you invited most of the office around for a dinner party, and now you have to purge the dining table of last year’s Warhammer obsession, not to mention that your attempt at a Banksy style mural on the adjacent wall looks like the faecal smears of a deranged, captive chimpanzee). Assuming this isn’t the case (you don’t really own a chimpanzee do you?) then you’re better off not attempting the doomed project all at once, but instead break it down into lots and lots of quick little tasks of which you can do a couple of a day. It’s like breaking down a big immovable iceberg into cute little ice cubes that you can pop into your vodka and Diet Coke. Do this for ALL of your big projects. Let’s use the dining room situation as an example. The way you could break it may look something like this (do one a day):

  • Buy some sort of storage for your Plague Orcs and Blurgg Marines
  • Put half the little models in said storage (the ones you got around to painting)
  • Put the other half of the little models in said storage (the ones you will NEVER get around to painting)
  • Clear away all the manuals, boards and 19 sided dice and stuff
  • Buy a poster depicting an actual Banksy mural
  • Put said poster on top of chimpanzee scrawl

Write all this down as a list before you attempt doing any of it. Make a plan for doing each item, and then do them sequentially. Merge this list in with the equivalent ones for all your BIG JOBS. Make some time every day to do a few of these tasks. Which brings me swiftly onto items 3 and 4, which I’ll tackle together, since they’re intrinsically related.

Items 2 & 3: Make special times to be productive and frivolous

Here’s the thing, as an ADHDer, you’re actually good at making time for stuff, and you’re frequently weirdly effective at planning and using your time to get obscene amounts of stuff achieved. It’s just that that version of you turns up unpredictably, and only if you’re immersed in one of your focus fits. But here’s the good news, you actually posses those magical delivery skills ALL OF THE TIME. Seriously, you do. You just have to accept the fact that you can only engage them for short periods in situations where you’re not interested in the task at hand. That’s cool, because you only really need to engage them to short periods, but you need to do it consistently, habitually, quasi-religiously. Every. Single. Bloody. Day. Find thirty to sixty minutes a day to do the snoresville tasks. What I don’t mean by that is ‘allocate’ thirty minutes at 9 pm when you usually ‘waste’ your time watching Stranger Things and thus will probably continue to exactly that. I mean carve out that time at a point in the day that you’re likely to be available to do some boring stuff, in the location where it needs to be done, and when nothing else is expected of you. This is not necessarily an easy task in itself, but it’s important. There will be trade-offs and compromises, but believe me, it will be worth it. Find the time, make sure you will not be distracted (thrust some Taylor Swift or Morbid Angel or Kenny G through your earphones) and get cranking through the stuff that needs doing. Start with the routine stuff, then chip off a couple of the ice cubes you carved out of those big icebergs.

And here’s the reward, go through the same exercise to carve out some time for doing ALL THE OTHER FUN STUFF. I know what you’re thinking right now. I do. You’re thinking “but as it is right now, I can use ALL my time to work of all that tasty shiny fun sh*t”. This may well be the case, but how is that working for you eh? Do you really feel relaxed and guilt free? You’re sure you’re not feeling a little torn, guilty, shameful, lest I say it, inadequate, at not having done the stuff that you think you’re actually supposed to be doing? The stuff that needs doing? If you do what I say, you can get on with building that aquarium complex GUILT FREE, knowing that you’ve done exactly enough of ALL OF THAT OTHER BORING CRAP to relieve yourself of the nagging burden of inadequacy. Make the time to do both the fun and the frustrating. Make more time to do the fun. You can do that, it’s OK. It really is. Set yourself free. I DARE YOU.

Here’s what it boils down to: if you’re forcing yourself to constantly trade-off BIG JOBS, you’re having to make BIG DECISIONS which is stressful and tiring and you’re unlikely do the “overhaul the kitchen” project and instead do the much shinier “learn how to make cheese” project (which is actually going to make the kitchen project even more arduous). Do both projects, do all your projects, but for the kitchen project (and its never ending multitude of interbreeding siblings), break them up, divide and conquer, habitualise them into submission.

Back to the unencumbered spite and contempt I promised earlier. Unfortunately, all those annoying adages proffered by those annoying, self-satisfied, meticulous douchebags gain a little credence at this point – “a stitch in time” and “if you look after the pennies, the pounds will look after themselves”, blah, fecking, blah. As with most metaphorical memes, despite the fact that they’re trite, over-worn and generally lame, there’s almost always a grain of reality in there, no matter how irritatingly phrased and asserted. Consequently your sister/vicar/gimp was at least in part correct. If you find her/him/it’s whiny voice echoing around your skull reciting these pithy one-liners and saying “I told you so!” daring you to tell him/her/it to p*ss off, then simply do that. Go on, DO IT. Tell those voices where they can stick their condescending dribble. It will feel good. Then get on with what you need to do because, quite frankly, actually getting your sh*t done is a much bigger smite-to-the-cobblers to people like that than continuing the disarray that they derive so much perverse titillation from deriding you about. Your success is their pain, remember that. INFLICT THE PAIN.

So in summary. Break stuff apart, make time to gratify yourself, inflict pain. Goddit? “When you put it like that,” you’re saying out loud, “what’s not to like?!” And you are correct.

Chinese Whispers Graph Clustering in Python

I needed a simple and efficient unsupervised graph clustering algorithm. MCL is a bit heavy for my needs and I was after something that was available in pure Python (because of environment access and compatibility issues) pretty much immediately. There isn’t exactly a lot of choice! I stumbled across Chinese Whispers an elegant and simple solution. I couldn’t find a simple implementation in Python so I created one myself using the formulas on the original paper. It uses NetworkX  (for convenience – you could easily implement without this) and is incredibly fast.

import networkx

# build nodes and edge lists
nodes = [
edges = [
    (1,2,{'weight': 0.732})

# initialize the graph
G = nx.Graph()

# Add nodes
# CW needs an arbitrary, unique class for each node before initialisation
# Here I use the ID of the node since I know it's unique
# You could use a random number or a counter or anything really
for n, v in enumerate(nodes):
    G.node[n]['class'] = v

# add edges

# run Chinese Whispers
# I default to 10 iterations. This number is usually low.
# After a certain number (individual to the data set) no further clustering occurs
iterations = 10
for z in range(0,iterations):
    gn = G.nodes()
    # I randomize the nodes to give me an arbitrary start point
    for node in gn:
        neighs = G[node]
        classes = {}
        # do an inventory of the given nodes neighbours and edge weights
        for ne in neighs:
            if isinstance(ne, int) :
                if G.node[ne]['class'] in classes:
                    classes[G.node[ne]['class']] += G[node][ne]['weight']
                    classes[G.node[ne]['class']] = G[node][ne]['weight']
        # find the class with the highest edge weight sum
        max = 0
        maxclass = 0
        for c in classes:
            if classes[c] > max:
                max = classes[c]
                maxclass = c
        # set the class of target node to the winning local class
        G.node[node]['class'] = maxclass

Given its simplicity it’s a remarkably effective algorithm. The image below shows a Gephi visualisation using the ForceAtlas2 algorithm. The node colours show the clusters identified by CW.


As you can see, the two algorithms broadly agree. CW took seconds to run whereas Gephi taxed my CPU to the max for many minutes (actually I think it was 10’s of minutes).

If you can fit you data into a graphical form, this is a very viable alternative to K-means style clustering, made even more attractive (for certain tasks) by the fact that it’s parameter free and thus you don’t need to pre-define the number of clusters (the bane of many a data scientist). It just finds the clusters that are there. This is, of course, also a draw-back in some circumstances. For example, if your data is heavily interlinked (a high degree to cardinality ratio) with no natural subgraphs, CW may just find a single cluster where you can demand K-means go find some. You can get around this to extent by relaxing your edge weight threshold (i.e. induce a subgraph with only edge weights greater than a threshold, then cluster that) an approach prone to graph fragmentation, which may or may not be desirable. It’s also prone to finding micro-clusters which in many cases could be construed as noise.

For my purposes it works incredibly well and I assume scales well. So let’s all turn to the left and tell the next person all about it.

Does Word2Vec Dream of Semantic Sheep?

I played with Google’s magical word2vec neural network some time ago. I found it interesting but I had no immediate use for it, so I filed it in the ‘must remember to check this out further’ section of my disorganised brain. More recently I found myself wrestling with topic modelling resulting in a near terminal headlock. I have a data set of very short (say, 2 – 50 word) documents that I want to group thematically. LDA and it’s various cousins were struggling with this task. There are few of reasons for this. Firstly, even though LDA, LSI etc. perform a sort of vector dimensionality reduction, this only works to a point. A three word sentence contains barely enough information to derive a single topic from, let alone a distribution, and consequently my topic distributions were too sparse to do much with. Secondly, and exacerbating the first point, the topic space across the corpus is pretty limited and somewhat homogeneous. Thirdly, and exacerbating the other two points, there is marked lack of variety in the language used across the corpus, and from one topic to the next. Humans were struggling to distinguish one theme from another, so what chance did a computer have? I was just about to give it up as a lost cause when I remembered that word2vec has some similarity measures, and a vague recollection of someone suggesting it could be used for topic modelling. My basic theory here is that if I can compare sentences for similarity I should be able to group them via that similarity (I’m using a graph clustering model to do this). So as a last ditch effort I ran word2vec over my corpus and started playing around to see if it could make sense of my data. The results were phenomenal! The similarity graph created from a simple, untuned word2vec model outperformed the other models at unsupervised classification 10 fold at least – where before I saw only loose semantic groupings with many mis-grouped items, I now saw empirically cohesive and accurate groups. As pleased as I was by this turn of events, I didn’t understand why Google’s simple neural network worked at all for my purposes, let alone outperformed everything else. So I bathed myself in warm, welcoming, buoyant sea of word2vec’s vector space. As I did so, I started to appreciate word2vec’s spooky action.

It’s not my intention to repeat what’s already been said about word2vec but merely to state my own findings. I’ll start with the crux of my main confusion about the model. I’m using to to interpret and group customer feedback. My understanding of word2vec is that it groups words that are semantically similar, or at least proximal. You can explore the most similar words to any given word or words easily in word2vec. My model was trained on a million or so items of customer feedback from a website survey. I should be swift here to mention that they have an excellent site that works well for the vast majority of customers, but like every website, doesn’t preform well for everyone all the time. So two words that occur in close proximity a fair amount are “site” and “slow” (it’s also worth pointing out that “site” actually co-exists with “fast” more often in the same corpus, however, we’re looking for problems to solve, not praise to lap up). However, when I looked at the top 50 words in closest proximity with “site”, “slow” was nowhere to be found. I get loads of similes of the the word “site” (e.g. website). And and all the words most closely related to “slow” are other words that loosely mean slow (e.g. sluggish). It was obvious to me at the point what word2vec was actually doing in this instance, and that my expectation was not aligned with how it works, but this lead me to a bit of an epiphany – holy shit, word2vec understands actual semantic relationships between words without any formal teaching; purely by inference! To put it another way, it groups together words it rarely if ever sees together, even in the same document. That’s pretty clever for a simple neural network with only a single hidden layer. I also picked out similarities in misspelled words. This is deceptively helpful since one of my core frustrations with the data sets (at least for the purposes of supervised learning and approach I’m also using) was that people don’t tend to put much effort into spell checking what they enter into an online survey. That’s some pretty spooky action!

So, on the surface at least, it seemed that the reason my similarity scores were so on the mark was that word2vec was cleverly able to pair of similar words in a sentence and thus was able to create robust similarity scores. However, this explanation is a bit Newton to Einstein – a good explanation, but not the whole story. Word2vec’s spooky action is a lot more abstract and, dare I say it, mysterious. This deserves a little more probing. The model that word2vec actually produces is actually the single hidden neural network layer previously referenced. It consists of a n x m vector space where n is the number of words in your corpus (optionally pruned to get rid of infrequent or too frequent cruft) and m is the an arbitrary number of floating point dimensions usually some where between 100 and 700. These dimensions are somewhat intractable from a human perspective. They are quanta of a continuous abstract vector space that maps a territory of words in a sort of semantic relief map. Words of similar meaning exist in the same general area of the map. Tribes of words exist in a single area just like tribes of people do in the real world. This extends past similes to words of the same type however. I ingested the prebuilt vector space on the word2vec home page (the 300 dimension Google news one) and did some exploring. I discovered various bits of spooky action:

  • Names of musicians (Alice Copper, Ozzy Osbourne, and David Bowie) coexist together with bands (Metallica, Motorhead) all of which bare no relation to, say “cheese”
  • Names of US presidents occupy the same general space, with small offsets that seem to suggest political affiliations (more research needed here), as do scientists with a little evidence that they cluster with their respective disciplines
  • Parts of the brain (and indeed neurotransmitters) all occupy the same space, and a cursory appraisal suggests that closer proximity exists form those parts closer to each other (hypothalamus, nucleus accumbens and midbrain all cluster very closely). One assumes that this is the same for all anatomical parts
  • It has no sense of opposites – fast and slow cluster very closely together, black and white even more so
  • Words cluster together when they are notionally similar rather than the same type of word, so “black” and “blackest” cluster close together. There seems no definable continuous space for, say, nouns, proper nouns, adjectives etc.

There’s a sense that it clusters words when they seem interchangeable to a greater lesser degree. The mathematical offset described with the famous “king – man + woman = queen” example seems to reinforce this. The spacial significance comes further to light when you consider the two main ways to interrogate the vector space. The standard way (al la Gensim and others) is to scan the entire vector space for vectors with the closest cosine similarity to the vector of the target word (which is usually something in the same proximity). When you’re dealing with multiple words (e.g. n-grams, sentences or even whole documents) the approach is simply to find the exact vector for each word, then take a column level average across those words to create a new vector which then goes through the same cosine similarity malarchy. When comparing one sentence/document to another, we take the same average and get the cosine between the two. The second approach looks to take an actual numerical offset from one words, or collection of words, to another as per Kusner et al. In combination these two approaches make the topological aspect of the model more salient still suggesting that the word embeddings exist in some intangible semantic space-time of numerous dimensions and geometry which, in a very real sense, is exactly what it is. A simile generator may be a convenient way to describe it, but the reality is much more complex and elegant. So back to my original idea that word2vec was pairing off words in a sentence. This is not the case at all. Using the cosine similarity approach as I was, what was actually  happening is that I was generating vector point constructed from an average of a collection words, then go find other word or words that are proximally (spatially) close. Words are never compared, we actually just go find the tribe that has the most DNA in common with my sentence, as it were.

So how does it figure this stuff out? Well, some much better mathematicians than me already concluded that they “don’t really know”, so what hope have I got? I can just try and make some sense of what I observe. Much has been made of the difference between CBOW and Skip-gram as way to evaluate the text, however there seems little to suggests that either contributes to the overall spooky action, rather than build upon the the mysterious workings of a simple neural network. The information is in there, in all the written text, and nothing is inferred that isn’t visibly available – there’s no extrapolation going on here, no logic. Word2vec doesn’t read it or understand the text, it just picks up patterns. Interpretation as an adjunct to semantic awareness is a job for a future, much more sophisticated algorithm or model or AI. The best analogy I can think of for word2vec is the very mechanism that neural networks try and emulate – the human brain. In particular long term memory. It’s easy to imagine that, as information flows in through our senses, it is brokered into similar abstract representations in the cerebral cortex, then either reinforced (learned) or forgotten. We know that the best way to remember something is to relate it to something to something we already have a good sense of. Then when you recall that thing, you also recall a sense of the other stuff that you squirrelled it away with. Thus when you recall Alice Cooper from memory, Ozzy Osborne sometimes emerges with him, along with a bat or a chicken maybe, but never a block of stilton.

Internet Advertising Ethics – the gorilla in the room

Much has been said of late about the rise of the adblocker and what it means for the future of the advertising industry and, more worryingly, the internet.  I’ve largely kept my gob shut on the subject of advertising ethics up until now since it’s not very fashionable to stand up for the evil advertising community. But, after reading this call to arms for ethics academics, my resolve has been shattered. Bear with me caller, I shall explain.

Now let me start by saying I have no reason to besmirch Williams’s character, or single him out for my wrath – his article is well written and well argued and is clearly placed in the public domain for debating, and that’s exactly what I’m doing. Why respond to this article in particular? Simply, it belongs to the extreme edge of a movement whose point of view I have some (soon to be explained) objections to. Also, it arrived in my world at the point where my silence on the subject was already faltering.

Let me quote the final statement of the article as a flavour of the overall thrust of the debate:


Given all this, the question should not be whether ad blocking is ethical, but whether it is a moral obligation. The burden of proof falls squarely on advertising to justify its intrusions into users’ attentional spaces—not on users to justify exercising their freedom of attention.


Lofty ideals there. However, what Williams entirely ignores (as do the comments that I have read) is that most websites and the adverts on them exist to offer some sort of product or service that people actually want (either directly or by later fulfilment). What advertising does is draws people’s attention to those things, while also forming an essential part of the business model of the site displaying the ad. And you cannot blame any given company for desiring that any given consumer receives the aforementioned product/service from them rather than someone else; after all, most businesses earnestly believe their product/service is superior, whether it is or not, as otherwise, why bother? If people find themselves on mediums that consume their attention, or assailed by ads that distract them from doing the things that they supposedly desire (I think many people would admit to wanting to spend time playing Xbox games as much as, if not more than, spending time with family – hey, why not combine the two!) then maybe they want/need to be distracted.

Secondly, the cognitive bias/behavioural economics argument is a red herring. As the legendary Harvard Gorilla Experiment demonstrated, people are spectacularly good at missing bleeding obvious stuff when they are engaged in a task, and that’s assuming you can get them engaged in the first place, as if they’re not interested in something, they simply won’t engage. If a person is so unengaged in the task of absorbing some web content that they get distracted by an advert, it suggests that the content isn’t much cop anyway and probably doesn’t deserve the attention it was getting. If a site places an ad that’s so invasive that makes it hard for the consumer to consume their content, then they can’t have much confidence in that content, and the consumer should certainly consider clearing off. But this even misses the key flaw to the argument – the biases alluded to evolved precisely so that we, as humans/mammals/animals, can focus our attention on what matters while also remaining alert to potential threats or, dare I say it, more interesting stuff. Saying that it is somehow unethical to appeal to these so called “biases” (I prefer the term heuristics) is like saying that “blue cars should not be manufactured since people a drawn to blue and that would distracts them from the car’s overall ‘carness’” or that “people shouldn’t dress nice lest people might fancy them” (that last one is exploited quite a lot in certain religions). We’re built to desire stuff (food, sex) – if we didn’t we’d (literally) die as individuals and as a species. If people spend too much time on Facebook, or are distracted by ads, or get obsessed by Candy Crush and forget to collect their kid from school, it says more about the psychology and evolution of human nature than the medium itself. We were designed to do what natural selection designed us to do. Facebook, Twitter, Daily Mail Online, the internet are symptoms of that, not the cause.

Further to this, these mediums for supposed attention corruption (the sites that house the adverts) are pretty damn good at keeping our attention. Williams states “A product or service does not magically redesign itself around your goals just because you block it from reaching its own”. But that’s precisely what they do. Facebook (for example) is AMAZING at holding attention, ads or otherwise. This is the case because they collect usage data about billions of people and their site optimises itself, in real time, around what people respond well to. Everyone has their own goal when using Facebook (frequently to spend “time” with absent family), Facebook’s “product” is that goal. Facebook spends a hell of a lot of money on making their product as good as it can be, and they know that they are successful when people spend lots of time using it! That cost is accounted for by your advertising eyeballs. So by negating Facebook’s revenue stream, Williams is denigrating their ability to do the very thing he’s (paradoxically) getting antsy about them not doing (building a customer-centric experience). No doubt sites like Facebook could be better, but starving them of cash ain’t gonna help them in this endeavour!

So when the legions of ethics academic rise up and block the sorry arse out of internet advertising, which subsequently results in the news sites where they get their celebrity gossip going out of business, leaving them only with the Murdoch funded, reactionary corporate propaganda-media (with all their ethics and stuff), they only have themselves to blame. Perhaps then, they will offer a better alternative to just “blocking” the problem out of view!

I feel a little like I’m defending the devil here, but if the free internet is to be maintained there is a balance to be struck. Advertisers need to work harder to build better online experiences, and consumers need to continue to put up with their attention being corralled a bit. It will be a rocky road to the equilibrium where both advertiser and consumer are happy, but the forces of reciprocal value exchange demand that that day must come.

Now, were Williams to make the broader argument about how those exploitations of attention lead to unhealthy lifestyles by tempting us with what we innately desire – e.g. fat and sugar and sex, and lots of all of it – which some people are largely powerless to resist, then I would fully support it as an ethical debate. If we’re here to debate the ethics of rampant non-concented data collection and abuse, I’m all ears. But the ethics of trying to get people’s attention? Get the gorilla out of here!

17 seconds

I don’t know exactly how long it takes to read the 50 or so words attributed to me in a recent Guardian article, but I doubt that it equates to 5 minutes, probably more like 17 seconds, meaning that I still have the larger portion of my 5 minutes of fame to come. What wonders await is anyone’s guess, but in the meantime I will juxtapose those 17 seconds of written text with a note of clarification.

On enthusiastically posting a snippet of said article on Twitter (and while I sat back and basked in the adulation) an old friend and colleague and data guru @hankyjohn responded to one of my points with a contradiction. Specifically, I said:

But Loveless said he associated the idea of having a single customer view with “big, monolithic, old school, relational databases, which are horribly hard to manage and incredibly expensive”. Just collecting data on customers for its own sake is useless unless you can do something useful with it, he said: “You don’t need to understand everything about the customer, you don’t need to collect and structure everything about the customer, you just need to have a sense about them.” He said the new data management platforms do not promise a single customer view, just a general view of what that person likes and does.

To which @hankyjohn responded (quite correctly):

@alexmloveless good work. Can I disagree though? False dichotomy for me. Traditional data warehousing can coexist nicely with other stores.

There followed a brief exchange in which I heroically clarified my point. Rather than subject you to those stilted 140 character info-barks, I’ll summarise the crux of my points here.

Although I completely stand by the point illustrated in that article, it sits removed from a broader context that would have been apparent were you in the room at the time. The wider point is this: Since the days of when advertising was first invented (by the people on Mad Men) marketers and the like have endeavoured to understand their customers. Such understanding, for the vast majority of the intervening period, was derived from stuff we can learn from any detail we can collect about them (name, address, demos etc.) and performance data (what works and what doesn’t). The former data probably existed on bits of card in filing cabinets for a long time before eventually being diligently transmogrified into their digital equivalent when computers became a thing. These digital equivalents eventually required a structured form so that they could be easily accessed and queried for the purposes of selling us stuff that we don’t need. The medium for this structure was the humble database, of which for a long time there was really only one form worth talking about, the RDBMS – relational databases. Relational databases are marvellous. They impose structure on unruly data and make it easy to access, analyse and aggregate. Thus, modern marketing became used to using these things to store their customer data, which needed to be kept clean and tidy. This was how you knew who your customers were – you kept records of them in a big old RDBMS called “Customer DB” or “CRM Store” or something equally as enticing. Problem is, since there were many different sources of data, companies frequently ended up with multiple stores, often storing overlapping data sets. Quite rightly at some point marketers and IT people alike started saying things like “wouldn’t it be great if all this data was deduplicated and stored in once place” and thus was born the dreaded Single Customer View.

Roll on a decade and SCV projects that were started on the back of wishes from marketers are still incomplete and running up legacy costs of tens of millions. Meanwhile, while failing to deliver on the meagre requirements of the time, we now have all these bloody channels and social networks and mobile devices and internets-of-things and Bigness of Data. Asking IT to justbloodywell get me a dataset I can trust is trouble enough let alone incorporating twitter handles and cross device awareness. Yet marketers are still asking such things of an SCV thinking that this once-so-called magic data bullet is actually the right place for such things.

The belief is still widely held that customer data really only can live in a big ole monolithic relational data store. This comes from a lack of distinction perpetuated on both the marketers and IT people’s part. The distinction is between Master Data Management (MDM) and, well, all the other types of data. It’s a distinction between hard, indelible customer data for the purposes of hard, lofty uses, vs the sort of fuzzy profiling that proliferates across the web and haunts you with depressing display adverts for TV’s you had briefly considered buying before that whopping council tax bill came in.

Modern marketing data is not about coherent customer information, it’s about cookies and inferred data. When marketing to (or at) someone it’s more useful to know their gender than their name. A mobile geolocation is better than a postcode. A constantly evolving stream of inferred preference data is better than a mosaic classification. This is all achieved by a web of data collection technologies and services that use the humble cookie as their primary currency and couldn’t give a hoot what your name is. You could try and mash all this lovely data into your SCV but you’d end up changing your schema every two weeks and probably hit performance/scale issues pretty quickly. Plus it’ll take 6 years and countless more million quid when you could have invested in one of those mystical unicorn DMP thingies. In such a circumstance your beloved SCV data would mostly be flowing in the other direction and consequently making an anonymised cookie store the most complete view of your customer data. God forbid!

Now don’t go rushing off to Adobe or Oracle while instructing your IT team to delete that pesky SCV. You probably need it. Email comms would not be possible without it. And if you have a more tangible relationship with your customer (like, you sell stuff to them) you need a master record with accurate, non-volatile information about them that’s nicely structured, secure and private. This is Master Data Management, and only relates secondarily to marketing. And as the learned @hankjohn correctly points out, it sits happily and harmoniously in a mature data ecosystem with anarchic jonny-come-latelys like DMPs (and a bunch of other sinister data entities).

This was the thrust of my grumpy diatribe at the Guardian offices, which perhaps doesn’t come through too well in the article. I wasn’t misquoted as such, just underquoted. The moral of this story? Write more about me.