Catalunya's Amenities import¶
This jupyter notebook (source) contains the script for importing different types of amenities in Catalunya into OSM, as well as the documentation of the whole process in a single file, making it easier to review both the process and the results as well as the decisions taken.
The goal is to manually merge and import all the amenities information provided by Generalitat de Catalunya, while testing the scripts for data preparation.
Data Sources¶
License¶
Data is released under CC0 (Public domain)
Import type¶
This import will be done manually, using JOSM to edit the data. Consider using Task Manager.
Data preparations¶
All data preparations will be made automatically in this notebook.
import numpy as np
import pandas as pd
import geopandas as gpd
import geopy
from osmi_helpers import data_gathering as osmi_dg
# Define Data Sources
DATA_RAW = 'data/raw/Equipaments_de_Catalunya.geojson'
CSV_PARSER = 'fields_mapping.csv'
Data gathering and exploration.¶
Run the code below to download original datasources and convert them into a dataframe and explore its contents.
# Download a file and convert it into a dataframe.
gdf_raw = gpd.read_file(DATA_RAW)
gdf_raw
Data cleanup¶
Fields' mapping.¶
# Create a copy
gdf = gdf_raw
CSV_PARSER
variable.
# Read CSV file with fields' mapping and description.
fields_mapping = pd.read_csv(CSV_PARSER)
# Display table.
fields_mapping
# Selects and renames fields according to CSV parser.
gdf = osmi_dg.csv_parser(gdf, CSV_PARSER)
gdf.head(10)
Calculate some fields¶
The following code calculates some fields that are needed in OSM.
# Fix uppercase.
gdf['name'] = gdf['name'].str.title()
# Addresses' cleanup.
gdf['addr:full'] = gdf['addr:full'].str.title()
# Split address.
gdf['addr:street'], gdf['addr:housenumber'], gdf['addr:unit'] = gdf['addr:full'].str.split(',', 2).str
gdf['addr:street'].replace({'C/': 'Carrer'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Ctra.': 'Carretera'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Pl.': 'Plaça'}, inplace=True, regex=True)
gdf['addr:housenumber'] = gdf['addr:housenumber'].replace(regex = 'S/N', value = '')
gdf['addr:housenumber'] = gdf['addr:housenumber'].replace(regex = 'Nº ', value = '')
# Filter out entries without category
gdf = gdf.dropna(subset=['tmp_category'])
# Remove pharmacies, because they have already been imported
gdf = gdf[gdf.tmp_category != 'Salut|Farmàcies||']
# Create amenity column according to `CATEGORIA`
# Health
gdf.loc[gdf.tmp_category.str.contains("Centres d'atenció primària"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'emergency' ] = 'yes'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'amenity' ] = 'social_facility'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'social_facility:for' ] = 'social_facility'
gdf.loc[gdf.tmp_category.str.contains('Hospital'), 'amenity' ] = 'hospital'
# Other
gdf.loc[gdf.tmp_category.str.contains('Museus'), 'amenity' ] = 'museum'
gdf.loc[gdf.tmp_category.str.contains('Teatres'), 'amenity' ] = 'theatre'
gdf
type(gdf)
Export clean data¶
If the attributes above are correct, we have to proceed to export them into a CSV
and geojson
files that can be used in the Task Manager's project.
# Drop unnecessary fields.
gdf = gdf.drop(columns=['tmp_category'])
# Split dataframe into different dataframes
health_amenities = ['clinic', 'hospital']
gdf_health = gdf.loc[gdf['amenity'].isin(health_amenities)]
gdf_health
# Generate a CSV File.
gdf_health.to_csv('data/processed/health.csv', index = False)
# Export to geojson.
#gdf_health.to_file('data/processed/health.geojson', driver='GeoJSON')
data/processed
folder:
data/processed/health.geojson
: file containing hospitals, and clinics.data/processed/health.csv
: CSV file containing hospitals, and clinics.