Now, geocoding a few addresses manually may seem fun. But geocoding thousands of addresses everyday manually? Yeah I know you're probably thinking that it is crazy. And yes, it is. But that is exactly what our task was. There used to be like thousands of lines of Covid-19 affected patients' data that came everyday to us which needed to be geocoded. The task used to be distributed among several people equally and then everyone would go to Google maps, look up the address and then copy the latitude and longitude data from the url and put it again into a csv file.
It is infact highly inefficient, takes a lot of time, it's tedious and in some cases, wrong input got into the data. So, I sought out an alternative way to find a solution for this problem. That's when I started searching for Google's Map API that can geocode. I googled the Google Geolocation API, found a wonderful documentation with it, but the problem was limitation. Google's Geolocation API has daily limit of 1500 requests per user per day. Us getting more than 1500 patient data made it impossible to geocode that much data with Google's Geocoding API.
So I found out the next best option in my hand, Openstreetmap! It's free and open source, has an amazing community as well as some amazing API. Openstreetmap provides Nominatim API which has less restrictions that Google's Geocoding API and it doesn't charge you anything.
So, I fired up jupyterlab, imported a few libraries and then made a simple function that takes in user's location and retrieves latitude and longitude data from the API. But, we got our data in a csv file and we needed to add two new columns and then put our latitude and longitude data into those columns. So I used pandas to read the csv and created two new columns and passed the addresses into my fuction. Here's how my code looked:
import requests
import pandas as pd
def geocode(address):
api = f"https://nominatim.openstreetmap.org/search.php?q={address}&format=jsonv2"
try:
response = requests.get(api).json()[0]
return [float(response['lat']), float(response['lon'])]
except IndexError:
return [0, 0]
df = pd.read_csv('file.csv')
df['Latitude and Longitude'] = df['address'].apply(lambda x: geocode(x))
df['latitude'] = df['Latitude and Longitude'].apply(lambda x: x[0])
df['longitude'] = df['Latitude and Longitude'].apply(lambda x: x[1])
df.drop('Latitude and Longitude', axis=1, inplace=True)
Basically, the Nominatim API responds with a JSON file when a request is sent to that endpoint. What I did is just used requests library to make a request to that endpoint, retrieved the data, parsed only latitude and longitude data and put it to two seperate columns in the csv file. Here's the hosted web app:
But in only problem with this was only I and another senior from my team knew how to code. So I had to think of an alternative way in which other members from my team could also use the Geocoder. So I built a Streamlit app and gave the geocoder a UI and hosted it in Streamlit share that didn't require any others to install a seperate software or know how to code. They can just drag and drop the csv or select the file from the File Explorer and start Geocoding without hassle. Here's the final outcome of the website. It's easy to use and I made the UI look a little prettier with added progressbar that any user can see the progress. Yeah! So that's how I built the geocoder. I even open sourced it so other people can see it from my repository and perhaps add more functionality.