What is GeoDNA?

GeoDNA is a way to represent a latitude/longitude coordinate pair as a string. That sounds simple enough, but it's a special string format: the longer it is, the more accurate it is. More importantly, each string uniquely defines a region of the earth's surface, so in general, GeoDNA codes with similar prefixes are located near each other. This can be used to perform proximity searching using only string comparisons (like the SQL "LIKE" operator).

Example: here is a map showing the area represented by the GeoDNA code "wtatgccagcctattgttt" (the blue box), which happens to be some fields in a Buenos Aires park where I used to play soccer:

Now, if you chop a character off the end of that, you get "wtatgccagcctattgtt". That represents an area that's about 4 times bigger:

Chop another character off the end, and you get "wtatgccagcctattgt", which represents an area that's about 4 times bigger than that:

Each smaller area is contained within the larger one.

What's With The Name?

Other than the first character (which is a 'w' or an 'e', designating Western or Eastern hemisphere), we use the four DNA letters, "G", "T", "A", and "C", to represent each further division of space, so the codes resemble DNA chains. Plus there's something nice about thinking of the Earth's surface as a big inter-related organism.

Where Does it Come From?

The origin of this idea is Gustavo Niemeyer's "Geohash" system, which you can find more info about at http://www.geohash.org. The system that Geohash uses results in a much shorter, tighter "code", but it's not conducive to text-based proximity searching. The "GeoDNA" system is unrolled so that each character is exactly one level of detail further. We developed this system a few years ago at Action Without Borders to allow us to perform fast proximity searching using the Xapian text indexing system.

How Can I Use It?

Once you understand how the geometry works (it's easy!) then you will start to see how you can use it to perform fast proximity searching using only simple text-based means.

For example, let's say you have some address data, complete with latitude and longitude information, stored somewhere. For example:

10 Downing Street, London W1,       UK,     ( 51.50339, -0.1275444 )
221B Baker Street, London,          UK      ( 51.52399, -0.1584434 )
145 Fleet Street,  London EC4A 2BU, UK      ( 51.51437, -0.1072883 )
Getreidegasse 9,   5020 Salzburg,   Austria ( 47.80001, 13.0435609 )
Domgasse 5,        1010 Vienna,     Austria ( 48.20817, 16.3749504 )
...

You can preprocess all of that data and add a GeoDNA code to each record:

10 Downing Street, London W1,       UK,     ( 51.50339, -0.1275444 ) wccttcttcttacaaaccag
221B Baker Street, London,          UK      ( 51.52399, -0.1584434 ) wccttcttctcgggccagcg
145 Fleet Street,  London EC4A 2BU, UK      ( 51.51437, -0.1072883 ) wccttcttctcgttgaacct
Getreidegasse 9,   5020 Salzburg,   Austria ( 47.80001, 13.0435609 ) eaagtggcacaaactacgaa
Domgasse 5,        1010 Vienna,     Austria ( 48.20817, 16.3749504 ) eaagtgcttatgatggtttt
...

And now you can easily see how you can use that data to perform a rudimentary proximity search. Look at the three London GeoDNA numbers: they all share a common prefix of 10 characters. The two addresses in Austria (the house of Mozart's birth in Salzburg, and his residence in Vienna, respectively) only share 6 characters in common, which makes sense as they're in different cities.

But What About Advanced Searching?

The basic text search as described in the section above will generally work OK, but it will fail miserably in certain circumstances. For example, let's add another address to our list:

Boleyn Ground, Green Street, London E19 9AZ, UK ( 51.531887, 0.0393104 ) eaaggaggagaggctcgg

Note the GeoDNA code is radically different than the other London codes. This is because of the way the GeoDNA algorithm divides up the surface of the earth to generate the GeoDNA code. So how can these GeoDNA codes be useful then?

Advanced Searching

You can greatly improve the accuracy and efficacy of your searching by pre-calculating the GeoDNA codes that represent the neighbouring regions to the one containing your target coordinates. You can use the "neighbours" API to get a list of neighbouring GeoDNA codes at any level of precision. Depending on the distance you want to search, you should remove a few levels of precision, derive the neighbours, then use the neighbour GeoDNA codes to locate items. In our example, let's say we want to find records in London, around the "10 Downing Street" address. So we take that location's GeoDNA code, "wccttcttcttacaaaccag" and remove a few levels of precision, leaving us with "wccttcttctt", and then get the neighbours:

        var neighbours = GeoDNA.neighbours( "wccttcttctt" );
        ["wccttctttca", "wccttctttcc", "eaaggagggaa", "wccttcttctg",
         "eaaggaggagg", "wccttcttcta", "wccttcttctc", "eaaggaggaga"]
      

You can now use those eight, plus your original (at the same precision level as the eight), as prefixes to search. Every data record whose GeoDNA code starts with one of those prefixes is in the Greater London area. You can see these features on the Google Maps Demo page.

Can I Improve On That?

Yes. Although the GeoDNA codes themselves represent rectangular-ish regions on the earth's surface, you can build up complex areas using these rectangles of different sizes. For example, check out the radius searching facility.

Who's Doing This?

Contact us at geodna -at- shoffle.com.