Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had the need to geocode 10s to 100s of thousands of US addresses weekly, with the ability to accept slightly-reduced accuracy vs. the parcel-level accuracy of Google Maps.

I rewrote the geocommons geocoder in Java to speed up the loading and geocoding process, and wrapped a REST api around it. I used a minimal perfect hash function to map zips/streets (metaphone3'd and ngramfingerprint'd) to data stored in a key-value structure. The key-value structure is small enough to fit in memory of a decent sized EC2 instance, but I haven't tested the throughput except from a slow disk--which got me about 100-150 results/sec.

The results include parsed address, lat/lng in WGS84 datum, and associated US census region info (state, county, block group, block, msa, cbsa/csa, school district, legislative district, etc.).

I'd considered open sourcing it, and I was trying to architect it such that one could plug in various data sources beyond TIGER when higher-accuracy info is available (e.g., from SF's address parcels, Massachusetts has lots of E911 parcel data available, etc).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: