<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6633921349872006305</id><updated>2012-01-23T22:56:40.167-08:00</updated><category term='postgresql'/><category term='graph'/><category term='traversal'/><title type='text'>jdobbie</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://jdobbie.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6633921349872006305/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://jdobbie.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jonathan Dobbie</name><uri>http://www.blogger.com/profile/07517816127646855492</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6633921349872006305.post-1448952740400283526</id><published>2011-09-29T18:18:00.000-07:00</published><updated>2011-09-29T19:30:29.731-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='traversal'/><category scheme='http://www.blogger.com/atom/ns#' term='graph'/><category scheme='http://www.blogger.com/atom/ns#' term='postgresql'/><title type='text'>Graphs in Postgres</title><content type='html'>A long time ago, I came across a youtube video describe some of the more interesting features in PostgreSQL. Since I am stuck using MySQL for most of my paid projects, I waited until a side project grabbed my interest enough to devote a small bit of time to it.&lt;br /&gt;&lt;h2&gt;The Problem&lt;/h2&gt;&lt;br /&gt;Given a list of words, use each word as a node. An edge exists between words if the levenshtein distance between them is 1. Find paths between words and count how many words are connected to a root word.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;The "Solution"&lt;/h2&gt;&lt;br /&gt;&lt;tt&gt;grep "^[[:lower:]][[:lower:]]\+$" /usr/share/dict/words | sort -R | head -n 10000 &amp;gt; small_list.txt&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Doing the full word list takes quite some time, but things are fairly fast with just 10000 words. The words file has a lot of proper nouns, and words with punctuation, so I filter all of those out. I also filter out one letter words, as they are only one distance away from a blank character.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;CREATE TABLE base_words&lt;br /&gt;(&lt;br /&gt;word text NOT NULL,&lt;br /&gt;CONSTRAINT "Primary Key" PRIMARY KEY (word)&lt;br /&gt;);&lt;br /&gt;COPY base_words FROM '.../small_list.txt';&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Pretty basic here, just stuffing the words into a simple table.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;CREATE TABLE edges (a text, b text);&lt;br /&gt;CREATE index b_idx ON edges (b);&lt;br /&gt;CREATE index a_idx ON edges (a);&lt;br /&gt;INSERT INTO edges ( SELECT a.word, b.word FROM base_words a INNER JOIN base_words b ON a.word &amp;gt; b.word AND levenshtein(a.word, b.word) = 1 );&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Here we create a list of edges based off of the levenshtein distance. You will need to install contrib/fuzzystrmatch.sql onto your db if you get the error "HINT: No function matches the given name and argument types. You might need to add explicit type casts"&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;CREATE VIEW all_edges (a,b) AS SELECT a,b FROM edges UNION ALL SELECT b,a FROM edges;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;The second method is much cleaner if we have both directions of the edges.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;CREATE VIEW brute AS (&lt;br /&gt;WITH RECURSIVE w(a, b) AS&lt;br /&gt;(&lt;br /&gt;SELECT 'week', ''&lt;br /&gt;UNION&lt;br /&gt;SELECT CASE WHEN ( wi.a &amp;lt; wi.b) THEN wi.a ELSE wi.b END,&lt;br /&gt;CASE WHEN ( wi.a &amp;lt; wi.b) THEN wi.b ELSE wi.a END&lt;br /&gt;FROM w&lt;br /&gt;JOIN edges wi&lt;br /&gt;ON w.a = wi.a OR w.b = wi.a&lt;br /&gt;OR w.a = wi.b OR w.b = wi.b&lt;br /&gt;)&lt;br /&gt;SELECT w.a, w.b FROM w&lt;br /&gt;);&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Here we recursively walk through the tree, starting with the word 'week', and store all of the links that we come across.&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;SELECT COUNT(*) FROM (SELECT a FROM brute UNION SELECT b FROM brute ) AS s3;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;With my words, I have 440 connected words.&lt;br /&gt;&lt;br /&gt;While cool, it would be nice to be able to see the path through the graph&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;CREATE VIEW path AS (&lt;br /&gt;WITH RECURSIVE w(a, b, path) AS&lt;br /&gt;(&lt;br /&gt;SELECT '', 'week', ARRAY['week'] AS path&lt;br /&gt;UNION ALL&lt;br /&gt;SELECT wi.a , wi.b, w.path || wi.b&lt;br /&gt;FROM w&lt;br /&gt;JOIN all_edges wi&lt;br /&gt;ON w.b = wi.a&lt;br /&gt;AND wi.b &amp;lt;&amp;gt; ALL (w.path[2:array_upper(w.path,1)])&lt;br /&gt;)&lt;br /&gt;SELECT w.a, w.b, path FROM w&lt;br /&gt;);&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;This isn't really the efficient way to do things. It covers every possible route away from 'week', and lists them. It won't get caught in a loop, but it will list out every possible path between each word that is connected to 'week'&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6633921349872006305-1448952740400283526?l=jdobbie.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdobbie.blogspot.com/feeds/1448952740400283526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdobbie.blogspot.com/2011/09/graphs-in-postgres.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6633921349872006305/posts/default/1448952740400283526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6633921349872006305/posts/default/1448952740400283526'/><link rel='alternate' type='text/html' href='http://jdobbie.blogspot.com/2011/09/graphs-in-postgres.html' title='Graphs in Postgres'/><author><name>Jonathan Dobbie</name><uri>http://www.blogger.com/profile/07517816127646855492</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6633921349872006305.post-134683238153664622</id><published>2009-07-08T21:01:00.000-07:00</published><updated>2009-07-08T22:19:45.478-07:00</updated><title type='text'>Closure Trees</title><content type='html'>The code here is PHP and MySQL. It should be trivial to translate into other languages and databases.&lt;br /&gt;&lt;br /&gt;I needed to store a tree in a database (mysql) the other day, and I stumbled across this: (It starts on slide 68)&lt;br /&gt;&lt;br /&gt;&lt;div style="width: 425px; text-align: left;" id="__ss_1319559"&gt;&lt;a style="margin: 12px 0pt 3px; font-family: Helvetica,Arial,Sans-serif; font-style: normal; font-variant: normal; font-weight: normal; font-size: 14px; line-height: normal; font-size-adjust: none; font-stretch: normal; display: block; text-decoration: underline;" href="http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back" title="Sql Antipatterns Strike Back"&gt;Sql Antipatterns Strike Back&lt;/a&gt;&lt;object style="margin: 0px;" height="355" width="425"&gt;&lt;param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sqlantipatternsstrikeback-090421005946-phpapp01&amp;amp;stripped_title=sql-antipatterns-strike-back"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sqlantipatternsstrikeback-090421005946-phpapp01&amp;amp;stripped_title=sql-antipatterns-strike-back" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="355" width="425"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;"&gt;View more &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/"&gt;presentations&lt;/a&gt; from &lt;a style="text-decoration: underline;" href="http://www.slideshare.net/billkarwin"&gt;Bill Karwin&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The basic idea is to store paths to all ancestors and all descendants in a separate table.  This makes a lot of actions very fast at the cost of space.  Many of the basic modifications of the tree are covered in the slides, but some of them took me some thought to work out.&lt;br /&gt;&lt;br /&gt;To represent the tree of (1,2(3,4(5))), we'll do:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;tt&gt;&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (1,1);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (2,2);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (1,2);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (3,3);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (2,3);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (1,3);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (4,4);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (2,4);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (1,4);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (5,5);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (4,5);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (2,5);&lt;br /&gt;INSERT INTO closure (ancestor, descendant) VALUES (1,5);&lt;br /&gt;&lt;/tt&gt;&lt;/pre&gt;&lt;tt&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-size:130%;"&gt;Building the Tree in Memory&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If we want to build the tree in memory, first we grab a list of all of each node's ancestors:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;tt&gt;&lt;br /&gt;SELECT&lt;br /&gt; descendant AS id,&lt;br /&gt; GROUP_CONCAT(ancestor) as ancestors&lt;br /&gt;FROM&lt;br /&gt; closure&lt;br /&gt;GROUP BY (descendant);&lt;br /&gt;&lt;/tt&gt;&lt;/pre&gt;&lt;tt&gt;&lt;/tt&gt;&lt;tt&gt;&lt;pre&gt;&lt;br /&gt;+----+-----------+&lt;br /&gt;| id | ancestors |&lt;br /&gt;+----+-----------+&lt;br /&gt;|  1 | 1         |&lt;br /&gt;|  2 | 2,1       |&lt;br /&gt;|  3 | 3,1,2     |&lt;br /&gt;|  4 | 4,1,2     |&lt;br /&gt;|  5 | 5,4,1,2   |&lt;br /&gt;+----+-----------+&lt;br /&gt;&lt;/pre&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;When we bring it into our language of choice, the first thing to do it sort the list by number of ancestors.&lt;br /&gt;Top level nodes will only have themselves as an ancestor, and any node with two ancestors is dependant solely on a node with one ancestor.  After we have sorted our ancestor list, we either create the first node with the first item in the list, or we create an empty node to start as $tree.  I used an empty node, as I have many top level nodes.  If you have only one node without ancestors other than itself, use it for the first node.&lt;br /&gt;&lt;br /&gt;To traverse the tree, the next node will always be the intersection between the list of ancestors, and the available ancestors.&lt;br /&gt;&lt;br /&gt;If there is only one ancestor left, it is to be a new child of whatever node you end up on.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;tt&gt;&lt;br /&gt;function add_node($ancestors, &amp;amp;$tree) {&lt;br /&gt;  if (count($ancestors) == 1) {&lt;br /&gt;    $tree-&gt;add_child($ancestors);&lt;br /&gt;    return;&lt;br /&gt;  }&lt;br /&gt;  $next_node = array_intersect($ancestors, $tree-&gt;get_children_ids());&lt;br /&gt;  $this-&gt;add_node(&lt;br /&gt;      array_diff($ancestors, $next_node) ,&lt;br /&gt;      $tree-&gt;get_child_by_id(array_pop($next_node))&lt;br /&gt;      );&lt;br /&gt;}&lt;br /&gt;&lt;/tt&gt;&lt;/pre&gt;&lt;tt&gt;&lt;/tt&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:180%;" &gt;Moving a Segment&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Moving a segment of the tree is probably the most painful task.  It is divided into two parts.&lt;br /&gt;1) Remove all the links from the node and its descendants to all of its ancestors&lt;br /&gt;2) Insert all the links from the knode and its descendants to the new ancestors&lt;br /&gt;&lt;br /&gt;So, if we want to move node 4 under node 3, first we:&lt;br /&gt;delete these descendants:&lt;br /&gt;  &lt;tt&gt;SELECT descendant FROM closure WHERE ancestor=4;&lt;/tt&gt;&lt;br /&gt;from these ancestors:&lt;br /&gt;  &lt;tt&gt;SELECT ancestor FROM closure WHERE descendant=4 AND ancestor != 4;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;With some help from FurnaceBoy on #MySQL, the query is:&lt;br /&gt;&lt;tt&gt;&lt;pre&gt;DELETE q FROM closure&lt;br /&gt; q NATURAL JOIN (&lt;br /&gt;   SELECT a.ancestor, d.descendant&lt;br /&gt;   FROM closure a&lt;br /&gt;     CROSS JOIN  closure d&lt;br /&gt;   WHERE a.descendant=4 AND a.ancestor != 4 AND d.ancestor=4&lt;br /&gt; ) t;&lt;/pre&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;Now, to insert the new links.&lt;br /&gt;&lt;br /&gt;First we get the ancestors of the new node (3)&lt;br /&gt;&lt;tt&gt;SELECT ancestor FROM closure WHERE descendant=3;&lt;/tt&gt;&lt;br /&gt;Then we get the descendants of the old node (4)&lt;br /&gt;&lt;tt&gt;SELECT descendant FROM closure WHERE ancestor=4;&lt;/tt&gt;&lt;br /&gt;Then we cross join them.&lt;br /&gt;&lt;pre&gt;&lt;tt&gt;&lt;br /&gt;SELECT a.ancestor, d.descendant&lt;br /&gt;FROM closure a&lt;br /&gt; CROSS JOIN closure d&lt;br /&gt;WHERE a.descendant=2 AND d.ancestor=1;&lt;/tt&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;&lt;pre&gt;&lt;br /&gt;INSERT INTO closure (ancestor, descendant)&lt;br /&gt; SELECT a.ancestor, d.descendant&lt;br /&gt; FROM closure a CROSS JOIN closure d&lt;br /&gt; WHERE a.descendant=3 AND d.ancestor=4;&lt;/pre&gt;&lt;/tt&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6633921349872006305-134683238153664622?l=jdobbie.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdobbie.blogspot.com/feeds/134683238153664622/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdobbie.blogspot.com/2009/07/closure-trees.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6633921349872006305/posts/default/134683238153664622'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6633921349872006305/posts/default/134683238153664622'/><link rel='alternate' type='text/html' href='http://jdobbie.blogspot.com/2009/07/closure-trees.html' title='Closure Trees'/><author><name>Jonathan Dobbie</name><uri>http://www.blogger.com/profile/07517816127646855492</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry></feed>
