Whereabouts
Whereabouts is a geocoding package in Python that implements some clever record linkage algorithms in SQL using DuckDB. The package itself is available at whereabouts and can be installed via
pip install whereabouts
Installation of reference databases
Once the package is installed you will need to install a geocoding database, which has been built from a country's or region's address data. This repo contains a collection of these databases for different countries and regions. Currently it has files for
- Australia (whole of country)
- Victoria, Australia
- New South Wales, Australia
More are being added as I get around to cleaning the data and creating the corresponding databases. The file format is
<country_abbreviation>_<states>_<size>
where <size>
is either sm
or lg
depending on whether the inverted index has been created using
pairs of consecutive tokens or trigrams. The large models can handle lower quality address data at the expense of speed.
Example (install the small Australian geocoding database)
python -m whereabouts download au_all_sm
Start geocoding
Once you have installed the package and a database you can start geocoding your data.
from whereabouts.Matcher import Matcher
addresslist = ['122 station st fairfield vic', '643-645 sydney road brsunwick', '504 sydney rd brunswick']
matcher = Matcher(db_name='au_all_sm')
matcher.geocode(addresslist, how='standard')
References
The algorithm is based on the following paper https://arxiv.org/abs/1708.01402