Designing for Mobile Pt. 3
This morning I sorted out the viewport and the little prototype for my mobile site's landing page is ready. I've put a screenshot in this post but if you'd like to see the real thing, check it out here. (or if you're on a mobile, here).

Mobile Prototype V1
Next: the other pages!
Designing for Mobile Pt. 2
It has occurred to me that perhaps I should name this series "Designing for Touch" but I've begun so shall continue with the current name.
Anyway, I posted some designs to my deviantArt account in the hopes of getting some feedback, and I asked for some critique at work from some workmates. It started with this, but someone pointed out that the original landing page was too wordy. They were right, so I moved onto this, which can be seen below.
I have made a few changes now, and I'm happy with the result so I am working on a prototype; I'm working on the final touches but I'm sleepy now so I'm off to bed. Hopefully, an update, tomorrow
EDIT: Critique credit goes to Jake, Mo and Pete (who has yet to finish his site; I'll post it when he's done).
Designing for Mobile Pt. 1
So here's what I decided today: I'm going to design a mobile version of my site. It's going to be rad, and it's going to be usable. I've been thinking about it for a while and was finally pushed to do it today when I saw a tweet by Jeffrey Zeldman earlier today from An Event Apart:
"Your mobile is with you all the time, so designing for mobile means designing something that can be used all the time."
And I think that pretty much sums up one of the main things we should be thinking about when design the mobile web. That, and trimming it down to the bare essentials.
So in order to decide what the bare essentials are I need to establish what it is I'm trying to achieve and what I'm trying to get across with the mobile site. It's a personal/business site, so what do I want it to do?
- Briefly summarise who I am and what I do
- Allow anyone to get in touch easily
- Be usable on touch devices
So what I've decided to do is try to view it as a web business card, because those three things are essentially what a business card does (well, not quite the third one).
I'm going to post progress here - mockups, thoughts, ideas, etc. For now I'll say that I'm going to prototype it using this brilliant frame to make my life easier and maybe include some of these elements.
Woop!
NASA is on your side
I simply can't get enough of this song, it's brilliant.
So I've been listening to Everything Everything quite a lot in the last few days. They are my geeky indiepop falcetto obsession at the moment. In fact, I'm excited about an upcoming slice of awesome this Wednesday. Below where I work at Thin Martian, for the last two months some serious (relentlessly ongoing and kinda annoying) building work has been going on in preparation to open a venue called Xoyo, which, for its opening event will be hosting EvEv. Can't friggin' wait, basically just going to roll down the stairs and into the bar. So be there, should be a lot of fun!
If you haven't listened to Everything Everything yet, do it now.
Illustrations
Before I went on my travels to Korea and then Europe (which I will finish writing about soon, I promise!) I made a couple of little illustrations as part of my improving my skills at Illustrator, and just for a bit of fun I guess. They're kind of potentially part of a series, but I'm not sure yet about the details of that.
The first one I called Fancy a Fly? (click for the full version)
And the second is called, quite simply and unimaginatively, Para-Whale.
I hope you like 'em
I'm working on some more stuff at the moment - one or two things in the same vein and something else totally unrelated.
Soju, Maekju, Galbi and Kimchi (S. Korea Day One)
From now on, I'll be updating from this blog with brief descriptions and photos of what I've been up to on my travels this summer - finally something more interesting than my dissertation!
My sister, Natalie, and her boyfriend, Nick, have been staying in Dangjin, Chungcheongnam-do in South Korea, working as English language teaching assistants at a Korean school. After finishing all my Uni work, I took a couple of flights over to see them for a couple of weeks, so that's what I'll be blogging about for a while. I'll be doing a blog for each day of my trip and write it as if from the end of that day and post-date them to catch up.
You know in films when they play stereotypical Asian music to set the scene when our protagonist arrives in an Asian country? That's what Incheon airport is like. It looks, feels and sounds like a film, and probably even smells like one.
Outside of the airport isn't much different, but not as beautiful today as I'd been hoping for. The weather isn't great. But the few people I've spoken to so far have been extremely helpful and friendly; the lady at the information desk; the bus driver; the person on the bus who stopped the bus driver from driving further, preventing me from going too far!
So on my first night, I met N&N's friends, Steph, Danny and Dave. They're all teachers, from the US and South Africa. All lovely people, very welcoming and very good at breaking the ice with a bit of Soju (a vodka-like Korean liqueur) in my Maekju (beer). Very effective. We hit a restaurant called Don S Top (or Don's Top? or Don Stop? who knows) ate Baechu Kimchi (spicy marinated cabbage) and Galbi (marinated beef you cook on a grill at your own table!). Pretty awesome, tasted amazing and a really fun way to eat: just stuff it all into your mouth at once.
One of the many odd things (at least to an Korea newbie like me) about Korea is it's obsession with things like "wellbeing" - to symbolize this, they have a tree - a plastic tree - in bars, ironically. A plastic tree, symbolizing wellbeing, in a bar full of booze. Oh, and a lady with a baby! Go figure.
Oh, and by the way, I saw a Starcraft game on TV. What.
This Blog Has a Video In It.
Onda: Tutorial & Demo from Daniel Hough on Vimeo.
It's beginning to feel pretty complete
The Website
So for the past few weeks the website has been up. I'm not going to rant and rave about the URL yet because I'm not entirely convinced it's ready.
The clustering is reasonably accurate, and the interface is coming along very well. There are a few issues with efficiency though. For some of the pages, particularly the ones which involve deeply-joined queries (finding all the clusters used by a given source requires the sources table linked to the articles table linked to the clusters table, for example). I have come up with better ways to do it though, and I've implemented a little "be patient, loading!" screen for particular pages using javascript.
All of the compulsory requirements for the project have been fulfilled to a decent extent, though there are a few more insights I wish to develop for looking at diversity. However the optional requirement (a bit of an oxymoron you'd think) of a sentiment detection is not yet fulfilled unfortunately. It probably would not be a particularly difficult thing to do, but since there are more pressing issues I thought I'd leave it to until I've done the report to a better extent.
There is one persistent problem which bothers me: sometimes (at least 2 times a week) there seems to be duplicate clusters. One article will be just below the similarity threshold for a cluster, so it'll create a new one instead of joining with it.
As I see it there are at least two solutions to this problem:
- 'Candidate clusters' created by the normal method, and any new clusters which are particularly similar (higher threshold than the standard one) to an existing cluster can be merged with that cluster.
- A supervised method, making use of collective intelligence. Users can specify when two clusters are, as they see it, on the same topic. Then, they will be either merged together after a number of votes or they will be flagged for an administrator (me) to merge them.
However, I'm not entirely sure I'll get any of this done before the report is written. If not, it'll make good stuff to discuss in the report.
Success!
So the training corpus has been counted and the clustering functions are in place. There are a few options:
- Cluster method: single link, group average or complete link
- Cluster type: agglomerative hierarchical or flat
- Threshold: between 0 and 1, for the level of similarity required for two vector models to be considered on potentially the same topic
- Normalize? Should term frequencies be normalized or not?
- Title weight: the amount of weight the terms in the title are given for the counts and normalized term frequencies
- Leading section percentage: the percentage of the start of the article which is considered the "leading section"
- Leading section weight: same as title weight. This hasn't proved to had much of an effect, actually - in fact, at some levels it totally screws up classification
Group average clustering works well and is more efficient than both single link and complete link, and forming an 'average' model for a cluster will be used regardless of the actual method chosen eventually, since this will be used to determine the way a cluster is represented in human-readable terms. The article in the cluster which is closest to the average will be used to describe the cluster.
So I ran 144 tests which all used flat clustering, originally with no variation on leading section or percentage, and then once I'd determined the best settings to use, I had some slight variation, but unfortunately it actually didn't help results particularly. Thus, I doubt I'll be putting any weight on the leading terms.
Conversely, putting a weight on the title is very helpful, and works especially well with a high threshold of around 0.60. There were 415 articles being evaluated in a number of ways described briefly below. Anyway, the full results are here.
The main things to look at are purity and f-measure. Purity is a measure of the crossover between classes (training clusters) and the generated clusters. F-measure places importance on a particular parameter, and so the higher it is the better. I placed importance on the true positive parameter - i.e., I wanted the system to generate as many true positives as possible and as few false positives as possible. I did this because I figure that it's less detrimental to the accuracy of the system if articles are accidentally given their own cluster than it is if they are accidentally clustered with unrelated articles, which is the false positive measure. Rand is the Rand Measure which is essentially a linear combination of the confusion matrix (true, false positives & negatives). NMI is the normalized mutual information. It tells us how our information about the pre-determined classes improves as we are told what the clusters are - this a high NMI is better. Of all the tests, NMI tends to stay around 0.27 mark. For more information on all of these measures see this page.
My 'favourite' configuration is number 108 - Flat, Complete Link clustering with a 0.6 similarity threshold and a title weight of x19, and no weight on the leading text. This gives only 6 false positives and has an excellent purity score - NMI is around 0.28, one of the higher values.
You may notice that none of the configurations use agglomerative clustering. This is because in the time it took for 16 tests using flat clustering, one full agglomerative test over the 415 articles wasn't even finished. In other words, it's incredibly slow - so slow that I certainly will not be using it in the final system, since a requirement is to update frequently. If this is how long it takes over 415 articles, what about 1200 (the local corpus) or 7000 (the corpus on the server).
Later on I will probably run a agglomerative tests for measurable proof but only if there's time.
In the meantime, I've run the clusterer over the 1200 locally-stored articles and will soon work on some pretty graphs and measures for them. Until then, adios!
Monitoring the Feeds
I've finally finished the RSS Parsing & HTML Parsing section of the project. Since about 0:00 this morning (26/01/2010) the system has collected 180 unique articles from the Daily Mail, the Guardian, the Telegraph and the Express.
I'm going to self-cluster these articles as they come in and soon enough will begin developing the modular system which represents articles as (for the time being, just) vectors, and the methods needed to compare and cluster them. Then accuracy can be measured, settings tweaked and algorithms debugged until I find the best configuration.
After that, the mammoth task of just letting it run for ages begins, while in the meantime I a) begin a report about this crazy adventure and b) work on some rad visualisations for the data collected.
That's the plan at least. Wish me luck!




