<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Basically, Dan &#187; Dot plot</title>
	<atom:link href="http://danielhough.co.uk/blog/tag/dot-plot/feed/" rel="self" type="application/rss+xml" />
	<link>http://danielhough.co.uk/blog</link>
	<description>One long adventure.</description>
	<lastBuildDate>Sun, 01 Aug 2010 14:47:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Progress so far</title>
		<link>http://danielhough.co.uk/blog/2009/10/progress-so-far/</link>
		<comments>http://danielhough.co.uk/blog/2009/10/progress-so-far/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 17:06:41 +0000</pubDate>
		<dc:creator>Dan</dc:creator>
				<category><![CDATA[Dissertation]]></category>
		<category><![CDATA[Dot plot]]></category>
		<category><![CDATA[Experiments]]></category>

		<guid isPermaLink="false">http://danielhough.co.uk/blog/?p=5</guid>
		<description><![CDATA[Last week I chose the topic of my dissertation, so immediately I began to do some research into the techniques I'll need to understand in order to complete it. Essentially, I need to design and implement a system which can investigate diversity in a number of ways between and within sources of news, including (but [...]]]></description>
			<content:encoded><![CDATA[<p>Last week I chose the topic of my dissertation, so immediately I began to do some research into the techniques I'll need to understand in order to complete it.</p>
<p>Essentially, I need to design and implement a system which can investigate diversity in a number of ways between and within sources of news, including (but not necessarily limited to)  <strong>language</strong>, <strong>topics</strong>, <strong>attention to detail</strong> and <strong>reuse of text</strong>. It will do so by crawling through the RSS feeds and possibly corpora of older news material, detecting topics and comparing articles within topics.</p>
<p>There are a wide varieties of techniques and technologies required for this, most of which I have not fully investigated yet, but naturally I will be soon. I will be using Python for the project, since it seems to be very suitable for the project. Python has comprehensive standard library of modules including modules for reading RSS files, and a number of string functions which should be useful for text processing, and is widely used among text processors.</p>
<p>Furthermore, it is cross-platform, and although development will be mostly done on a Windows PC, it will actually be running on a Linux-based server.</p>
<h2>Experiments with Python</h2>
<div id="attachment_10" class="wp-caption alignright" style="width: 310px"><a href="http://danielhough.co.uk/blog/wp-content/uploads/2009/10/20091005-dotplotprogress.PNG"><img class="size-medium wp-image-10" title="Dot Plot Progress" src="http://danielhough.co.uk/blog/wp-content/uploads/2009/10/20091005-dotplotprogress-300x202.PNG" alt="A screenshot of my progress with a simple dotplot program" width="300" height="202" /></a><p class="wp-caption-text">A screenshot of my progress with a simple dotplot program</p></div>
<p>I'm quite new to Python, but so far I've found it simple to pick up, and most things that I've needed so far have been built-in. I've read about reading RSS files with Python and created a very simple RSS reader, and covered <a title="Dot plot on Wikipedia" href="http://en.wikipedia.org/wiki/Dot_plot_%28bioinformatics%29" target="_blank">Dot plot</a>, a technique for comparing DNA sequences which has been used in the past to compare text by Ken Church and Jonathan Helfman. I have begun work on a text-based Dotplot program too, which was also very simple. Eventually both these things will make their way into the final product, but for now it's just learning and getting into the mindset of text processing.</p>
]]></content:encoded>
			<wfw:commentRss>http://danielhough.co.uk/blog/2009/10/progress-so-far/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
