

Catalunya's Amenities import¶

This jupyter notebook (source) contains the script for importing different types of amenities in Catalunya into OSM, as well as the documentation of the whole process in a single file, making it easier to review both the process and the results as well as the decisions taken.

The goal is to manually merge and import all the amenities information provided by Generalitat de Catalunya, while testing the scripts for data preparation.

Data Sources¶

https://analisi.transparenciacatalunya.cat/Urbanisme-infraestructures/Equipaments-de-Catalunya/8gmd-gz7i

License¶

Data is released under CC0 (Public domain)

Import type¶

This import will be done manually, using JOSM to edit the data. Consider using Task Manager.

Data preparations¶

All data preparations will be made automatically in this notebook.

import numpy as np
import pandas as pd
import geopandas as gpd
import geopy
from osmi_helpers import data_gathering as osmi_dg

# Define Data Sources
DATA_RAW = 'data/raw/Equipaments_de_Catalunya.geojson'
CSV_PARSER = 'fields_mapping.csv'

Data gathering and exploration.¶

Run the code below to download original datasources and convert them into a dataframe and explore its contents.

# Download a file and convert it into a dataframe.
gdf_raw = gpd.read_file(DATA_RAW)

gdf_raw

	sufix_via	comarca	utmx	telefon1	email	poblacio	cpostal	categoria	alies	longitud	...	nom	fax	data_modificacio	propietats	via	telefon2	utmy	tipus_via	idequipament	geometry
0	None	Segrià	0.0	973032744	ot.lleida@gencat.cat	Alguaire	25125	Turisme\|Oficines de Turisme de la Xarxa\|Altres...	OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...	0.0	...	OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...	None	2020-03-20T07:54:39	Marca_Turistica\|TERRES DE LLEIDA	Ctra. N-230 qm. 14,5	None	0.0	None	11443807	None
1	None	Barcelonès	427996.7316643046	93 400 69 00	None	Barcelona	08021	None	Direcció General d'Innovació, Recerca i Cultur...	2.139648279	...	Direcció General d'Innovació, Recerca i Cultur...	None	2020-03-20T07:37:02	None	Via Augusta, 202-226	None	4583094.806658751	None	10242296	POINT (2.13965 41.39798)
2	None	Barcelonès	427996.7316643046	93 400 69 00	None	Barcelona	08021	None	Sub-direcció General de Centres Privats	2.139648279	...	Sub-direcció General de Centres Privats	None	2020-03-20T07:36:10	None	Via Augusta, 202-226	932 415 342	4583094.806658751	None	3041655	POINT (2.13965 41.39798)
3	None	Baix Llobregat	410754.0	93 683 27 38	cultura@vallirana.cat	Vallirana	08759	None	Centre d'Interpretació del Patrimoni Masia Mol...	1.93261908797568	...	Centre d'Interpretació del Patrimoni Masia Mol...	93 683 28 97	2020-03-20T07:15:37	None	C. del Molí, 2 - 4	None	4581937.0	None	28561	POINT (1.93262 41.38401)
4	None	Tarragonès	352259.786	977 24 70 36	None	Tarragona	43005	None	Servei Territorial de l'Agència de l'Habitatge...	1.240220699	...	Servei Territorial de l'Agència de l'Habitatge...	None	2020-03-20T07:36:17	Horari\|<br /><b></b><br />de dilluns a divendr...	Carrer del Cardenal Vidal i Barraquer, 12-14	None	4553291.179	None	3041757	POINT (1.24022 41.11748)
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
33033	None	Baix Camp	340788.637	977331806	pba.imac@reus.cat	Reus	43202	Cultura\|Teatres, auditoris i espais escènics e...	la Palma	1.1023539633548318	...	la Palma	None	2020-03-20T07:27:04	Any de construcció\|1902\|Any del cens\|1998\|Supe...	C/ Ample 75	None	4558298.564	None	10629895	POINT (1.10235 41.16040)
33034	None	Baix Camp	0.0	977834353	lateneu@tinet.cat	Duesaigües	43773	Cultura\|Centres culturals: ateneus, centres cí...	Ateneu de Duesaigües	-1.4887438843851004	...	Ateneu de Duesaigües	None	2020-03-20T07:27:05	Any de construcció\|1974\|Any del cens\|2005\|Supe...	Pl. 15 d'agost 15	None	0.0	None	10630412	POINT (-1.48874 0.00000)
33035	None	Segrià	308327.075	973190117	ajuntament@corbins.cat	Corbins	25137	Cultura\|Centres culturals: ateneus, centres cí...	Patronat Sant Jaume de Corbins	0.696756024266889	...	Patronat Sant Jaume de Corbins	None	2020-03-20T07:26:53	Any de construcció\|1940\|Any del cens\|2000\|Supe...	Pl. Carnisseria 6	None	4618120.126	None	10630164	POINT (0.69676 41.69179)
33036	None	Segrià	301511.775	973266303	None	Lleida	25006	Cultura\|Centres culturals: ateneus, centres cí...	Centre Cultural Vallcalent	0.6175989338249559	...	Centre Cultural Vallcalent	None	2020-03-20T07:26:53	Any de construcció\|1968\|Any del cens\|1996\|Supe...	C/ Vallcalent 28	None	4610071.677	None	10630158	POINT (0.61760 41.61769)
33037	None	Osona	442772.493	938540457	cultura@rodadeter.cat	Roda de Ter	08510	Cultura\|Teatres, auditoris i espais escènics e...	Teatre Eliseu	2.3091974470722105	...	Teatre Eliseu	None	2020-03-20T07:26:58	Any de construcció\|1928\|Any del cens\|2011\|Supe...	C/ Bac de Roda 1 bis	None	4647932.364	None	10630429	POINT (2.30920 41.98132)

33038 rows × 23 columns

Data cleanup¶

Fields' mapping.¶

# Create a copy
gdf = gdf_raw

Run the cell below to convert raw data into a suitable OSM-friendly structure, according to the provided CSV fields with fields' mappings stated in CSV_PARSER variable.

# Read CSV file with fields' mapping and description.
fields_mapping = pd.read_csv(CSV_PARSER)

# Display table.
fields_mapping

	Original field	Description	OSM tagging	Comments
0	idequipament	Identificador intern de l'equipament a BDE	source:pkey	Not imported.
1	alies	Àlies de l'equipament	NaN	Not imported. Same values as `nom`
2	nom	Nom de l'eqiupament	name	NaN
3	categoria	Categories / subcategories de l'equipament	tmp_category	Not imported. Only used for filtering. Will be...
4	tipus_via	Tipus de via (adreça)	NaN	NaN
5	via	Nom de la via (adreça)	addr:full	The geojson has all the information stored in ...
6	sufix_via	Sufix (adreça)	NaN	Empty. Not imported.
7	num	Número de portal (adreça)	addr:housenumber	NaN
8	cpostal	Codi postal	addr:postcode	NaN
9	poblacio	Població	addr:city	NaN
10	comarca	Comarca	NaN	Not imported
11	telefon1	Telèfon principal	phone	NaN
12	telefon2	Telèfon secundari	NaN	NaN
13	fax	Fax	fax	NaN
14	utmx	Coordenada x (UTM)	NaN	Not imported as tags
15	utmy	Coordenada y (UTM)	NaN	Not imported as tags
16	longitud	Longitud	NaN	Not imported as tags
17	latitud	Latitud	NaN	Not imported as tags
18	email	Adreça de correu electrònic	email	NaN
19	web	Pàgina web	website	NaN
20	data_modificacio	Data de la darrera modificació	source:date	NaN
21	propietats	Propietats addicionals de l'equipament	NaN	Not imported
22	localitzacio	Columna de georeferència	NaN	Not imported as tags

# Selects and renames fields according to CSV parser.
gdf = osmi_dg.csv_parser(gdf, CSV_PARSER)


gdf.head(10)

	source:pkey	name	tmp_category	addr:full	addr:housenumber	addr:postcode	addr:city	phone	fax	email	website	source:date
0	11443807	OFICINA DE TURISME DE CATALUNYA A LLEIDA-AEROP...	Turisme\|Oficines de Turisme de la Xarxa\|Altres...	Ctra. N-230 qm. 14,5	None	25125	Alguaire	973032744	None	ot.lleida@gencat.cat	http://www.catalunya.com	2020-03-20T07:54:39
1	10242296	Direcció General d'Innovació, Recerca i Cultur...	None	Via Augusta, 202-226	None	08021	Barcelona	93 400 69 00	None	None	None	2020-03-20T07:37:02
2	3041655	Sub-direcció General de Centres Privats	None	Via Augusta, 202-226	None	08021	Barcelona	93 400 69 00	None	None	None	2020-03-20T07:36:10
3	28561	Centre d'Interpretació del Patrimoni Masia Mol...	None	C. del Molí, 2 - 4	None	08759	Vallirana	93 683 27 38	93 683 28 97	cultura@vallirana.cat	http://www.vallirana.cat	2020-03-20T07:15:37
4	3041757	Servei Territorial de l'Agència de l'Habitatge...	None	Carrer del Cardenal Vidal i Barraquer, 12-14	None	43005	Tarragona	977 24 70 36	None	None	http://agenciahabitatge.gencat.cat	2020-03-20T07:36:17
5	3040033	CCMA - Catalunya Ràdio	None	Avinguda Diagonal, 614-616	None	08021	Barcelona	93 306 92 00	93 306 92 01	None	http://www.ccma.cat/catradio	2020-03-20T07:37:15
6	6016896	Sub-direcció General de Serveis	None	Carrer de la Diputació, 355	None	08009	Barcelona	93 567 40 00	93 567 40 02	None	None	2020-03-20T07:36:07
7	3039714	Assessoria Jurídica	None	Via Laietana, 26	None	08003	Barcelona	93 567 17 00	93 567 17 51	None	http://politiquesdigitals.gencat.cat	2020-03-20T07:37:18
8	6015310	Gabinet de Relacions Externes i Protocol	None	Rambla de Catalunya, 19-21	None	08007	Barcelona	93 316 20 00	93 316 21 60	None	None	2020-03-20T07:36:57
9	3040392	Coordinació Territorial de Joventut a Lleida	None	Rambla d'Aragó, 8	None	25002	Lleida	973 27 92 17	973 27 92 01	joventut.lleida.tsf@gencat.cat	None	2020-03-20T07:37:04

Calculate some fields¶

The following code calculates some fields that are needed in OSM.

# Fix uppercase.
gdf['name'] = gdf['name'].str.title()

# Addresses' cleanup.
gdf['addr:full'] = gdf['addr:full'].str.title()
# Split address.
gdf['addr:street'], gdf['addr:housenumber'], gdf['addr:unit'] = gdf['addr:full'].str.split(',', 2).str
gdf['addr:street'].replace({'C/': 'Carrer'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Ctra.': 'Carretera'}, inplace=True, regex=True)
gdf['addr:street'].replace({'Pl.': 'Plaça'}, inplace=True, regex=True)
gdf['addr:housenumber'] = gdf['addr:housenumber'].replace(regex = 'S/N', value = '')
gdf['addr:housenumber'] = gdf['addr:housenumber'].replace(regex = 'Nº ', value = '')

# Filter out entries without category
gdf = gdf.dropna(subset=['tmp_category'])

# Remove pharmacies, because they have already been imported
gdf = gdf[gdf.tmp_category != 'Salut|Farmàcies||']

# Create amenity column according to `CATEGORIA`
# Health
gdf.loc[gdf.tmp_category.str.contains("Centres d'atenció primària"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'amenity' ] = 'clinic'
gdf.loc[gdf.tmp_category.str.contains("Centres amb atenció continuada"), 'emergency' ] = 'yes'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'amenity' ] = 'social_facility'
#gdf.loc[gdf.tmp_category.str.contains('Centres de salut mental'), 'social_facility:for' ] = 'social_facility'
gdf.loc[gdf.tmp_category.str.contains('Hospital'), 'amenity' ] = 'hospital'

# Other
gdf.loc[gdf.tmp_category.str.contains('Museus'), 'amenity' ] = 'museum'
gdf.loc[gdf.tmp_category.str.contains('Teatres'), 'amenity' ] = 'theatre'

gdf

	source:pkey	name	tmp_category	addr:full	addr:housenumber	addr:postcode	addr:city	phone	fax	email	website	source:date	addr:street	addr:unit	amenity	emergency
0	11443807	Oficina De Turisme De Catalunya A Lleida-Aerop...	Turisme\|Oficines de Turisme de la Xarxa\|Altres...	Ctra. N-230 Qm. 14,5	5	25125	Alguaire	973032744	None	ot.lleida@gencat.cat	http://www.catalunya.com	2020-03-20T07:54:39	Carretera N-230 Qm. 14	NaN	NaN	NaN
24	14174	Deixalleria De Sitges	Medi ambient\|Deixalleries\|\|	-	NaN	08870	Sitges	938109100	None	vilafjs@sitges.cat	http://www.sitges.cat/jsp/directori/detall.jsp...	2020-03-12T16:06:01	-	NaN	NaN	NaN
25	23577	Oficina De Turisme De Peratallada	Turisme\|Oficines de Turisme de la Xarxa\|Altres...	Pl. Del Castell, Nº 3	3	17113	Forallac	972645522	None	turisme@forallac.com	http://www.forallac.cat	2020-03-20T07:54:35	Plaça Del Castell	NaN	NaN	NaN
26	49680	Servei D'Informació I Atenció A Les Dones (Sia...	Societat. Ciutadania. Famílies\|Oficines d'info...	Fanalets De Sant Jaume	NaN	25002	Lleida	973700461	None	politiquesigualtat@paeria.cat	None	2018-03-08T15:46:02	Fanalets De Sant Jaume	NaN	NaN	NaN
27	49701	Servei D'Informació I Atenció A Les Dones (Sia...	Societat. Ciutadania. Famílies\|Oficines d'info...	Muralla Del Carme, 24, Baixos	24	43800	Valls	977608225	None	pad@valls.cat	None	2018-03-08T15:45:58	Muralla Del Carme	Baixos	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
33033	10629895	La Palma	Cultura\|Teatres, auditoris i espais escènics e...	C/ Ample 75	NaN	43202	Reus	977331806	None	pba.imac@reus.cat	None	2020-03-20T07:27:04	Carrer Ample 75	NaN	theatre	NaN
33034	10630412	Ateneu De Duesaigües	Cultura\|Centres culturals: ateneus, centres cí...	Pl. 15 D'Agost 15	NaN	43773	Duesaigües	977834353	None	lateneu@tinet.cat	None	2020-03-20T07:27:05	Plaça 15 D'Agost 15	NaN	NaN	NaN
33035	10630164	Patronat Sant Jaume De Corbins	Cultura\|Centres culturals: ateneus, centres cí...	Pl. Carnisseria 6	NaN	25137	Corbins	973190117	None	ajuntament@corbins.cat	None	2020-03-20T07:26:53	Plaça Carnisseria 6	NaN	NaN	NaN
33036	10630158	Centre Cultural Vallcalent	Cultura\|Centres culturals: ateneus, centres cí...	C/ Vallcalent 28	NaN	25006	Lleida	973266303	None	None	None	2020-03-20T07:26:53	Carrer Vallcalent 28	NaN	NaN	NaN
33037	10630429	Teatre Eliseu	Cultura\|Teatres, auditoris i espais escènics e...	C/ Bac De Roda 1 Bis	NaN	08510	Roda de Ter	938540457	None	cultura@rodadeter.cat	None	2020-03-20T07:26:58	Carrer Bac De Roda 1 Bis	NaN	theatre	NaN

29310 rows × 16 columns

type(gdf)

pandas.core.frame.DataFrame

Export clean data¶

If the attributes above are correct, we have to proceed to export them into a CSV and geojson files that can be used in the Task Manager's project.

# Drop unnecessary fields.
gdf = gdf.drop(columns=['tmp_category'])

# Split dataframe into different dataframes
health_amenities = ['clinic', 'hospital']
gdf_health = gdf.loc[gdf['amenity'].isin(health_amenities)]


gdf_health

# Generate  a CSV File.
gdf_health.to_csv('data/processed/health.csv', index = False)

# Export to geojson.
#gdf_health.to_file('data/processed/health.geojson', driver='GeoJSON')

As a result of this script, we get the following files (all of them stored in data/processed folder:

data/processed/health.geojson: file containing hospitals, and clinics.
data/processed/health.csv: CSV file containing hospitals, and clinics.