As promised (I know a couple days late, but I had a brainfart on some of the non-codable data) here is the analysis of the Geocoder.us data vs. the Free San Francisco data from Navteq. The data I used to prove this was 75 records for the Starbucks with a San Francisco address.
The first thing to discuss is the extra time it took for the Geocoder.us data. When I geocoded the data against the Navteq data that was loaded inside of the Oracle database it took me approximately 1:15 seconds to geocode the 75 records on my laptop. The same data through the Geocoder.us csv webservice took almost 10x as long, it was ~10 minutes to took to code the same 75 rows. OK, definitely a major hit there, the latency was due to the UTL_HTTP call I was making out of the database was taking 12 seconds per record to return. The Geocoder.us website discusses how to setup your own local Geocoder.us server, maybe in the future I will see what the difference in time is using their interface locally.
On a good note, there were 3 addresses that the Navteq data would not recognize, but the Geocoder.us data recognized all but one row. The one row that it did not recognize had ‘ONE’ for the street number instead of the actual number 1. When replacing the number for the string in the data, the site was successfully coded. Simple enough to fix, but still a limitation.
OK, now onto the data. The Navteq data was coded down to 12 decimal places, while the Geocoder.us data only gets coded down to 5 decimal places. This might not seem like a big deal, but take this into account.
So basically we are talking a difference in less than inches. While this might not be a big deal for the address is off by 4″ (I think we’ll be able to find it) but if it was a missle defense system, we may have some issues that need to be discussed. Since normal GPS devices are only accurate to about 15 feet on the best days this pretty darn good for free.
The first problem we run into is that the Geocoder data from Navteq is in SRID 8307 format, aka Longitude/Latitude (WGS 84) and the Geocoder.us data is in SRID 8265 format aka Longitude / Latitude (NAD83). So for most people this means nothing, but its actually very important. Remember back in elementary / middle school where they showed all the different maps of the world; one where greenland was really big, one where there were rips in the map, etc. Well this all comes back to us when we are talking about Coordinate Systems in Oracle Spatial. All the different types of map projections; coordinate, catesian, geodetic, projected, geodetic datum, and authalic sphere are all present in the Oracle Spatial database predefined for our use. These projections follow the defined standard projections defined by the OpenGIS consortium ( http://www.opengeospatial.org/specs/?page=specs ).
Very simply I wrote a quick SQL statement that uses the built in SDO_CS package to transform the data from 8265 to 8307 and then calculates the distance between the two points in meters.
, o.location navteq_location
, g.location geocoder_location
, sdo_geom.sdo_distance ( o.location, sdo_cs.transform(g.location, 8307), .005, 'unit=METER') distance
from sf_starbacks o,
order by distance;
Below is the results table that was returned from the function.
|Store Name||Distance in Meters|
|SF Courtyard Marriott Lobby||2.71659553129724|
|Market & Fell – San Francisco||3.8057058105241|
|1750 Divisadero Street||4.69942861653962|
|California & Battery – SF||5.29755965611037|
|505 Sansome Street||6.08548442574047|
|3727 Buchanan – San Francisco||7.21260972749949|
|Kearny @ Bush||7.22469004513458|
|123 Mission Street||7.4188927049375|
|1231 Market Street||7.42953327611775|
|4094 18th St.||8.08740578188483|
|3rd & Howard||8.64205159112492|
|Grant & Bush – San Francisco||10.0439995350524|
|1800 Irving Street||10.0918780314643|
|398 Market St.||10.0937804181894|
|425 Battery – San Francisco||10.2121900661491|
|333 Market St.||10.3904072176329|
|50 California St.||10.812256443162|
|Fillmore & O'Farrell (UCO)||11.235605600453|
|Masonic @ Fulton – S.F.||11.2570582622603|
|901 Market St.||11.383558585046|
|565 Clay St.||11.500375430594|
|675 Portola – Miraloma||12.0143457537109|
|4th & Brannan – WFB||12.0215273447908|
|199 Fremont @ Howard – SF||13.2578337955333|
|390 Stockton @ Sutter (Union Sq)||15.419906040731|
|36 Second Street||15.6140523231353|
|74 New Montgomery||15.9682235898887|
|Safeway-San Francisco #1490||16.1190250104597|
|Safeway-San Francisco #2606||16.1745772800324|
|Mariposa & Bryant||16.2510104423628|
|44 Montgomery @ Market St.||16.4293094494693|
|King & 4th Street – San Francisco||16.5061673580469|
|Levi's Plaza @ Sansome||17.097438240954|
|Kansas & 16th St. – San Francisco||17.4991251104624|
|9th & Howard||17.6327431473884|
|Sony Metreon SF (UCO)||18.213309935106|
|120 4th Street||18.3932339959993|
|27 Drumm Street||19.067431537748|
|201 Powell Street – San Francisco||19.2714989034239|
|Beach & Hyde – San Francisco||19.3904276298555|
|Van Ness & California – WFB||19.5861907980684|
|15 Sutter St.||19.9180139893338|
|Grand Central Market – Mollie Stone||21.2170151879307|
|Albertsons – San Francisco #7146||21.217250792467|
|Jones @ Jefferson – San Francisco||21.3270789286301|
|Cyril Magnin @ O`Farrell – Nikko||21.4961567925423|
|24th & Noe||22.2641624424109|
|Geary & Taylor – San Francisco||24.0074384547142|
|555 California St.||27.5157348868899|
|Safeway – San Francisco #667||28.7814007353958|
|Bush & Van Ness – S.F.||28.9662827730563|
|100 West Portal/Vicente||31.8815087423281|
|4th & Market – S.F.||49.9397278381422|
|Church & Market – S.F.||71.3778706089687|
|Safeway-San Franscisco #1507||74.3198277064713|
|Albertsons – San Francisco #7128||182.468554131262|
|5455 Geary Blvd. – WFB||N/A|
|Albertsons – San Francisco #7137||N/A|
Out of the 75 Starbuck stores in the table, 3 could not be compared because one of the two distances did not have a geocoded value for the address. The minimum distance difference was 2.71 meters and the maximum was 182.47 meters. The average distance difference was 19.44 meters. So on average the difference is alright, 20 meters average is fine if your trying to find a stores location on the street, but to do anything that requires more precision than driving directions you’d probably want to be assured that your data was more accurate. Even in the worst case it is only off by a tenth of a mile. Which data is more correct? Within the industry its typically accepted to be the Navteq data. Navteq spends many dollars a year to validate their data and make sure its the most accurate around.
The Yahoo uses both Navteq and TeleAtlas data for their geocoder. Maybe I should run some analysis against their service to see what comes out…anyone interested?