pyPRIMA

pyPRIMA_logo



python PReprocessing of Inputs for Model frAmeworks

Code developers

Kais Siala, Houssame Houmy

Documentation authors

Kais Siala

Maintainers

Kais Siala <kais.siala@tum.de>

Organization

Chair of Renewable and Sustainable Energy Systems, Technical University of Munich

Version

1.0.1

Date

Jul 09, 2021

License

The model code is licensed under the GNU General Public License 3.0. This documentation is licensed under a Creative Commons Attribution 4.0 International license.

Features

  • Aggregation of input data for any user-defined regions provided as a shapefile

  • Automation of the pre-processing to document assumptions and avoid human errors

  • Cleaning of raw input data and creation of model-independent intermediate files

  • Adaptation to the intermediate files to the models urbs and evrys as of version 1.0.1

Applications

This code is useful if:

  • You want to create different models using the same input database, but different model regions

  • You want to harmonize the assumptions used in different model frameworks (for comparison and/or linking)

  • You want to generate many models in a short amount of time with fewer human errors

Changes

version 1.0.0

This is the initial version.

Contents

User manual

These documents give a general overview and help you get started from the installation to you first running model.

User manual

Installation

Note

We assume that you are familiar with git and conda.

First, clone the git repository in a directory of your choice using a Command Prompt window:

$ ~\directory-of-my-choice> git clone https://github.com/tum-ens/pyPRIMA.git

We recommend using conda and installing the environment from the file gen_mod.yml that you can find in the repository. In the Command Prompt window, type:

$ cd pyPRIMA\env\
$ conda env create -f gen_mod.yml

Then activate the environment:

$ conda activate gen_mod

In the folder code, you will find multiple files:

File

Description

config.py

used for configuration, see below.

runme.py

main file, which will be run later using python runme.py.

lib\initialization.py

used for initialization.

lib\input_maps.py

used to generate input maps for the scope.

lib\generate-models.py

used to generate the model files from intermediate files.

lib\generate_intermediate_files.py

used to generate intermediate files from raw data.

lib\spatial_functions.py

contains helping functions related to maps, coordinates and indices.

lib\correction_functions.py

contains helping functions for data correction/cleaning.

lib\util.py

contains minor helping functions and the necessary python libraries to be imported.

config.py

This file contains the user preferences, the links to the input files, and the paths where the outputs should be saved. The paths are initialized in a way that follows a particular folder hierarchy. However, you can change the hierarchy as you wish.

Main configuration function
config.configuration()

This function is the main configuration function that calls all the other modules in the code.

Return (paths, param)

The dictionary param containing all the user preferences, and the dictionary path containing all the paths to inputs and outputs.

Return type

tuple(dict, dict)

config.general_settings()

This function creates and initializes the dictionaries param and paths. It also creates global variables for the root folder root and the system-dependent file separator fs.

Return (paths, param)

The empty dictionary paths, and the dictionary param including some general information.

Return type

tuple(dict, dict)

Note

Both param and paths will be updated in the code after running the function config.configuration.

Note

root points to the directory that contains all the inputs and outputs. All the paths will be defined relatively to the root, which is located in a relative position to the current folder.

The code differentiates between the geographic scope and the subregions of interest. You can run the first part of the script runme.py once and save results for the whole scope, and then repeat the second part using different subregions within the scope.

config.scope_paths_and_parameters(paths, param)

This function defines the path of the geographic scope of the output spatial_scope and of the subregions of interest subregions. It also associates two name tags for them, respectively region_name and subregions_name, which define the names of output folders. Both paths should point to shapefiles of polygons or multipolygons.

For spatial_scope, only the bounding box around all the features matters. Example: In case of Europe, whether a shapefile of Europe as one multipolygon, or as a set of multiple features (countries, states, etc.) is used, does not make a difference. Potential maps (theoretical and technical) will be later generated for the whole scope of the bounding box.

For subregions, the shapes of the individual features matter, but not their scope. For each individual feature that lies within the scope, you can later generate a summary report and time series. The shapefile of subregions does not have to have the same bounding box as spatial_scope. In case it is larger, features that lie completely outside the scope will be ignored, whereas those that lie partly inside it will be cropped using the bounding box of spatial_scope. In case it is smaller, all features are used with no modification.

year defines the year of the weather/input data, and model_year refers to the year to be modeled (could be the same as year, or in the future).

technology is a dictionary of the technologies (Storage, Process) to be used in the model. The names of the technologies should match the names which are used in assumptions_flows.csv, assumptions_processes.csv and assumptions_storage.csv.

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the user preferences.

Return (paths, param)

The updated dictionaries paths and param.

Return type

tuple of dict

Note

We recommend using a name tag that describes the scope of the bounding box of the regions of interest. For example, 'Europe' and 'Europe_without_Switzerland' will actually lead to the same output for the first part of the code.

User preferences
config.resolution_parameters(param)

This function defines the resolution of weather data (low resolution), and the desired resolution of output rasters (high resolution). Both are numpy array with two numbers. The first number is the resolution in the vertical dimension (in degrees of latitude), the second is for the horizontal dimension (in degrees of longitude).

Parameters

param (dict) – Dictionary including the user preferences.

Return param

The updated dictionary param.

Return type

dict

Note

As of version 1.0.1, these settings should not be changed. Only MERRA-2 data can be used in the tool. Its spatial resolution is 0.5° of latitudes and 0.625° of longitudes. The high resolution is 15 arcsec in both directions.

config.grid_parameters(param)

This function defines parameters related to the grid to be used while cleaning the data.

  • quality is a user assessment of the quality of the data. If the data is trustworthy, use 1, if it is not trustworthy at all, use 0. You can use values inbetween.

  • default is a collection of default values for voltage, wires, cables, and frequency, to use when these data are missing.

Parameters

param (dict) – Dictionary including the user preferences.

Return param

The updated dictionary param.

Return type

dict

config.load_parameters(param)

This function defines the user preferences which are related to the load/demand. Currently, only one parameter is used, namely default_sec_shares, which sets the reference region to be used in case data for other regions is missing.

Parameters

param (dict) – Dictionary including the user preferences.

Return param

The updated dictionary param.

Return type

dict

config.processes_parameters(param)

This function defines parameters related to the processes in general, and to distributed renewable capacities in particular.

For process, only the parameter cohorts is currently used. It defines how power plants should be grouped according to their construction period. If cohorts is 5, then you will have groups of coal power plants from 1960, then another from 1965, and so on. If you do not wish to group the power plants, use the value 1.

For distributed renewable capacities, dist_ren, the following parameters are needed:

  • units is a dictionary defining the standard power plant size for each distributed renewable technology in MW.

  • randomness is a value between 0 and 1, defining the randomness of the spatial distribution of renewable capacities. The complementary value (1 - randomness) is affected by the values of the potential raster used for the distribution. When using a high resolution map, set randomness at a high level (close to 1), otherwise all the power plants will be located in a small area of high potential, close to each other.

  • default_pa_type and default_pa_availability are two arrays defining the availability for each type of protected land. These arrays are used as default, along with the protected areas raster, in case no potential map is available for a distributed renewable technology.

Parameters

param (dict) – Dictionary including the user preferences.

Return param

The updated dictionary param.

Return type

dict

config.renewable_time_series_parameters(param)

This function defines parameters related to the renewable time series to be used in the models. In particular, the user can decide which modes to use from the files of the time series, provided they exist. See the repository tum-ens/renewable-timeseries for more information.

Parameters

param (dict) – Dictionary including the user preferences.

Return param

The updated dictionary param.

Return type

dict

Paths
config.assumption_paths(paths)

This function defines the paths for the assumption files and the dictionaries.

  • assumptions_landuse is a table with land use types as rows and sectors as columns. The table is filled with values between 0 and 1, so that each row has a total of 0 (no sectoral load there) or 1 (if there is a load, it will be distributed according to the shares of each sector).

  • assumptions_flows is a table with the following columns:

    • year: data reference year.

    • Process/Storage: name of the process or storage type.

    • Direction: either In or Out. You can have multiple inputs and outputs, each one in a separate row.

    • Commodity: name of the input or output commodity.

    • ratio: ratio to the throughput, which is an intermediate level between input and output. It could be any positive value. The ratio of the output to the input corresponds to the efficiency.

    • ratio-min similar to ratio, but at partial load.

  • assumptions_processes is a table with the following columns:

    • year: data reference year.

    • Process: name of the process usually given as the technology type.

    • cap-lo: minimum power capacity.

    • cap-up: maximum power capacity.

    • max-grad: maximum allowed power gradient (1/h) relative to power capacity.

    • min-fraction: minimum load fraction at which the process can run at.

    • inv-cost: total investment cost per power capacity (Euro/MW). It will be annualized in the model using an annuity factor derived from the wacc and depreciation period.

    • fix-cost: annual operation independent or fix cost (Euro/MW/a)

    • var-cost: variable cost per throughput energy unit(Euro/MWh) but excludes fuel costs.

    • start-cost: startup cost when the process is switch on from the off condition.

    • wacc: weighted average cost of capital. Percentage of cost of capital after taxes.

    • depreciation: deprecation period in years.

    • lifetime: lifetime of already installed capacity in years.

    • area-per-cap: area required per power capacity (m²/MW).

    • act-up: maximal load (per unit).

    • act-lo: minimal load (per unit).

    • on-off: binary variable, 1 for controllable power plants, otherwise 0 (must-run).

    • reserve-cost: cost of power reserves (Euro/MW) (to be verified).

    • ru: ramp-up capacity (MW/MWp/min).

    • rd: ramp-down capacity (MW/MWp/min).

    • rumax: maximal ramp-up (MW/MWp/h).

    • rdmax: minimal ramp-up (MW/MWp/h).

    • detail: level of detail for modeling thermal power plants, modes 1-5.

    • lambda: cooling coefficient (to be verified).

    • heatmax: maximal heating capacity (to be verified).

    • maxdeltaT: maximal temperature gradient (to be verified).

    • heatupcost: costs of heating up (Euro/MWh_th) (to be verified).

    • su: ramp-up at start (to be verified).

    • sd: ramp-down at switch-off (to be verified).

    • pdt: (to be verified).

    • hotstart: (to be verified).

    • pot: (to be verified).

    • pretemp: temperature at the initial time step, per unit of the maximum operating temperature.

    • preheat: heat content at the initial time step, per unit of the maximum operating heat content.

    • prestate: operating state at the initial time step (binary).

    • prepow: available power at the initial time step (MW) (to be verified).

    • precaponline: online capacity at the initial time step (MW).

    • year_mu: average construction year for that type of power plants.

    • year_stdev: standard deviation from the average construction year for that type of power plants.

  • assumptions_storage is a table with the following columns:

    • year: data reference year.

    • Storage: name of the storage usually given as the technology type.

    • ep-ratio: fixed energy to power ratio (hours).

    • cap-up-c: maximum allowed energy capacity (MWh)

    • cap-up-p: maximum allowed power capacity (MW)

    • inv-cost-p: total investment cost per power capacity (Euro/MW). It will be annualized in the model using an annuity factor derived from the wacc and depreciation period.

    • inv-cost-c: total investment cost per energy capacity (Euro/MWh). It will be annualized in the model using an annuity factor derived from the wacc and depreciation period.

    • fix-cost-p: annual operation independent or fix cost per power capacity (Euro/MW/a)

    • fix-cost-c: annual operation independent or fix cost per energy capacity (Euro/MWh/a)

    • var-cost-p: opertion dependent costs for input and output of energy per MWh_out stored or retreived (euro/MWh)

    • var-cost-c: operation dependent costs per MWh stored. This value can used to model technologies that have increased wear and tear proportional to the amount of stored energy.

    • lifetime: lifetime of already installed capacity in years.

    • depreciation: deprecation period in years.

    • wacc: weighted average cost of capital. Percentage of cost of capital after taxes.

    • init: initial storage content. Fraction of storage capacity that is full at the simulation start. This level has to be reached in the final timestep.

    • var-cost-pi: variable costs for charing (Euro/MW).

    • var-cost-po: variable costs for discharing (Euro/MW).

    • act-lo-pi: minimal share of active capacity for charging (per unit).

    • act-up-pi: maximal share of active capacity for charging (per unit).

    • act-lo-po: minimal share of active capacity for discharging (per unit).

    • act-up-po: maximal share of active capacity for discharging (per unit).

    • act-lo-c: minimal share of storage capacity (per unit).

    • act-up-c: maximal share of storage capacity (per unit).

    • precont: energy content of the storage unit at the initial time step (MWh) (to be verified).

    • prepowin: energy stored at the initial time step (MW).

    • prepowout: energy discharged at the initial time step (MW).

    • ru: ramp-up capacity (MW/MWp/min).

    • rd: ramp-down capacity (MW/MWp/min).

    • rumax: maximal ramp-up (MW/MWp/h).

    • rdmax: minimal ramp-up (MW/MWp/h).

    • seasonal: binary variable, 1 for seasonal storage.

    • ctr: binary variable, 1 if can be used for secondary reserve.

    • discharge: energy losses due to self-discharge per hour as a percentage of the energy capacity.

    • year_mu: average construction year for that type of storage.

    • year_stdev: standard deviation from the average construction year for that type of storage.

  • assumptions_commodities is a table with the following columns:

    • year: data reference year.

    • Commodity: name of the commodity.

    • Type_urbs: type of the commodity according to urbs’ terminology.

    • Type_evrys: type of the commodity according to evrys’ terminology.

    • price: commodity price (euro/MWh).

    • max: maximum annual commodity use (MWh).

    • maxperhour: maximum commodity use per hour (MW).

    • annual: total value per year (MWh).

    • losses: losses (to be verified).

  • assumptions_transmission is a table with the following columns:

    • Type: type of transmission.

    • length_limit_km: maximum length of the transmission line in km, for which the assumptions are valid.

    • year: data reference year.

    • Commodity: name of the commodity to be transported along the transmission line.

    • eff_per_1000km: transmission efficiency after 1000km in percent.

    • inv-cost-fix: length independent investment cost (euro).

    • inv-cost-length: length dependent investment cost (euro/km).

    • fix-cost-length: fixed annual cost dependent on the length of the line (euro/km/a).

    • var-cost: variable costs per energy unit transmitted (euro/MWh)

    • cap-lo: minimum required power capacity (MW).

    • cap-up: maximum allowed power capacity (MW).

    • wacc: weighted average cost of capital. Percentage of cost of capital after taxes.

    • depreciation: deprecation period in years.

    • act-lo: minimum capacity (MW/MWp).

    • act-up: maximum capacity (MW/MWp).

    • angle-up: maximum phase angle ramp-up (to be verified).

    • PSTmax: maximum phase angle difference.

  • dict_season_north is a table with the following columns:

    • Month: number of the month (1-12).

    • Season: corresponding season.

  • dict_daytype is a table with the following columns:

    • Weak day: name of the weekday (Monday-Sunday).

    • Type: either Working day, Saturday, or Sunday.

  • dict_sectors is a table with the following columns:

    • EUROSTAT: name of the entry in the EUROSTAT table.

    • Model_sectors: corresponding sector (leave empty if irrelevant).

  • dict_counties is a table with the following columns:

    • IRENA: names of countries in IRENA database.

    • Counties shapefile: names of countries in the countries shapefile.

    • NAME_SHORT: code names for the countries as used by the code.

    • ENTSO-E: names of countries in the ENTSO-E dataset.

    • EUROSTAT: names of countries in the EUROSTAT table.

  • dict_line_voltage is a table with the following columns:

    • voltage_kV: sorted values of possible line voltages.

    • specific_impedance_Ohm_per_km: specific impedance (leave empty if unknown).

    • loadability: loadability factor according to the St Clair’s curve (leave empty if unknown).

    • SIL_MWh: corresponding surge impedance load (leave empty if unknown).

  • dict_technologies is a table with the following columns:

    • IRENA: names of technologies in the IRENA database.

    • FRESNA: names of technologies in the FRESNA database.

    • Model names: names of technologies as used in the model.

Parameters

paths (dict) – Dictionary including the paths.

Returns

The updated dictionary paths.

Return type

dict

config.grid_input_paths(paths)

This function defines the paths where the transmission lines (inputs) are located.

Parameters

paths (dict) – Dictionary including the paths.

Return paths

The updated dictionary paths.

Return type

dict

config.load_input_paths(paths)

This function defines the paths where the load related inputs are saved:

  • sector_shares for the sectoral shares in the annual electricity demand.

  • load_ts for the load time series.

  • profiles for the sectoral load profiles.

Parameters

paths (dict) – Dictionary including the paths.

Return paths

The updated dictionary paths.

Return type

dict

config.local_maps_paths(paths, param)

This function defines the paths where the local maps will be saved:

  • LAND for the raster of land areas within the scope

  • EEZ for the raster of sea areas within the scope

  • LU for the land use raster within the scope

  • PA for the raster of protected areas within the scope

  • POP for the population raster within the scope

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the user preferences.

Return paths

The updated dictionary paths.

Return type

dict

config.output_folders(paths, param)

This function defines the paths to multiple output folders:

  • region is the main output folder.

  • local_maps is the output folder for the local maps of the spatial scope.

  • sites is the output folder for the files related to the modeled sites.

  • load is the output folder for the subregions-independent, load-related intermediate files.

  • load_sub is the output folder for the subregions-dependent, load-related intermediate files.

  • grid is the output folder for the subregions-independent, grid-related intermediate files.

  • grid_sub is the output folder for the subregions-dependent, grid-related intermediate files.

  • regional_analysis is the output folder for the regional analysis of renewable energy.

  • proc is the output folder for the subregions-independent, process-related intermediate files.

  • proc_sub is the output folder for the subregions-dependent, process-related intermediate files.

  • urbs is the output folder for the urbs model input file.

  • evrys is the output folder for the evrys model input files.

All the folders are created at the beginning of the calculation, if they do not already exist.

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the user preferences region_name and subregions_name.

Returns

The updated dictionary paths.

Return type

dict

config.output_paths(paths, param)

This function defines the paths to multiple output files.

Sites:
  • sites_sub is the CSV output file listing the modeled sites and their attributes.

Load:
  • stats_countries is the CSV output file listing some load statistics on a country level.

  • load_ts_clean is the CSV output file with cleaned load time series on a country level.

  • cleaned_profiles is a dictionary of paths to the CSV file with cleaned load profiles for each sector.

  • df_sector is the CSV output file with load time series for each sector on a country level.

  • load_sector is the CSV output file with yearly electricity demand for each sector and country.

  • load_landuse is the CSV output file with load time series for each land use type on a country level.

  • intersection_subregions_countries is a shapefile where the polygons are the outcome of the intersection between the countries and the subregions.

  • stats_country_parts is the CSV output file listing some load statistics on the level of country parts.

  • load_ts_clean is the CSV output file with load time series on the level of subregions.

Grid:
  • grid_expanded is a CSV file including a reformatted table of transmission lines.

  • grid_filtered is a CSV file obtained after filtering out erronous/useless data points.

  • grid_corrected is a CSV file obtained after correcting erronous data points.

  • grid_filled is a CSV file obtained after filling missing data with default values.

  • grid_cleaned is a CSV file obtained after cleaning the data and reformatting the table.

  • grid_shp is a shapefile of the transmission lines.

  • grid_completed is a CSV file containing the aggregated transmission lines between the subregions and their attributes.

Renewable processes:
  • IRENA_summary is a CSV file with a summary of renewable energy statistics for the countries within the scope.

  • locations_ren is a dictionary of paths pointing to shapefiles of possible spatial distributions of renewable power plants.

  • potential_ren is a CSV file with renewable potentials.

Other processes and storage:
  • process_raw is a CSV file including aggregated information about the power plants before processing it.

  • process_filtered is a CSV file obtained after filtering out erronous/useless data points.

  • process_joined is a CSV file obtained after joining the table with default attribute assumptions (like costs).

  • process_completed is a CSV file obtained after filling missing data with default values.

  • process_cleaned is a CSV file obtained after cleaning the data and reformatting the table.

  • process_regions is a CSV file containing the power plants for each subregion.

  • storage_regions is a CSV file containing the storage devices for each subregion.

  • commodities_regions is a CSV file containing the commodities for each subregion.

Framework models:
  • urbs_model is the urbs model input file.

  • evrys_model is the evrys model input file.

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the user preferences region_name, subregions_name, and year.

Returns

The updated dictionary paths.

Return type

dict

config.processes_input_paths(paths, param)

This function defines the paths where the process-related inputs are located:

  • IRENA: IRENA electricity statistics (useful to derive installed capacities of renewable energy technologies).

  • dist_ren: dictionary of paths to rasters defining how the potential for the renewable energy is spatially distributed. The rasters have to be the same size as the spatial scope.

  • FRESNA: path to the locally saved FRESNA database.

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the parameter year.

Return paths

The updated dictionary paths.

Return type

dict

config.renewable_time_series_paths(paths, param)

This function defines the paths where the renewable time series (inputs) are located. TS_ren is itself a dictionary with the keys WindOn, WindOff, PV, CSP pointing to the individual files for each technology.

Parameters
  • paths (dict) – Dictionary including the paths.

  • param (dict) – Dictionary including the parameters region_name, subregions_name, and year.

Return paths

The updated dictionary paths.

Return type

dict

runme.py

runme.py calls the main functions of the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from lib.initialization import initialization
from lib.generate_intermediate_files import *
from lib.correction_functions import *
from lib.generate_models import *

if __name__ == "__main__":
    paths, param = initialization()

    ## Clean raw data
    clean_residential_load_profile(paths, param)
    clean_commercial_load_profile(paths, param)
    clean_industry_load_profile(paths, param)
    clean_agriculture_load_profile(paths, param)
    clean_streetlight_load_profile(paths, param)
    clean_GridKit_Europe(paths, param)
    clean_sector_shares_Eurostat(paths, param)
    clean_load_data_ENTSOE(paths, param)
    distribute_renewable_capacities_IRENA(paths, param)
    clean_processes_and_storage_FRESNA(paths, param)

    ## Generate intermediate files
    generate_sites_from_shapefile(paths, param)
    generate_load_timeseries(paths, param)
    generate_transmission(paths, param)
    generate_intermittent_supply_timeseries(paths, param)
    generate_processes(paths, param)
    generate_storage(paths, param)
    generate_commodities(paths, param)

    ## Generate model files
    generate_urbs_model(paths, param)
    generate_evrys_model(paths, param)

Theoretical description

Continue here if you want to understand the theoretical conception behind the generate load time series function.

Theory

This chapters explains how the load time series are disaggregated spatially and according to sectors, then aggregated again according to the desired model regions.

Purpose

Load time series are widely available, but the published datasets are usually restricted to predefined spatial regions such as countries and their administrative subdivisions. The generate_load_timeseries() function takes the datasets which are available for these regions and disaggregate them according to a set of parameters, before aggregating them at a different spatial level. It is then possible to obtain time series for any region.

Description of lib.generate_intermediate_files.generate_load_timeseries

Description of lib.generate_intermediate_files.generate_load_timeseries

Inputs

The main inputs of this script are:

  • Load time series of the countries or regions to be disaggregated (hourly).

  • Shapefiles of the countries or regions to be disaggregated

  • Shapefiles of the regions of interest (subregions)

  • Assumptions (land use and sector correspondence)

  • Load profiles of the different sectors

  • Raster of the population and land use correspondent to the country or region

Sectoral disaggregation

The load is assumed to be perfectly divided into four distinct sectors (load sources):

  • Commercial

  • Industrial

  • Residential

  • Agricultural

Sectoral load profiles

Sectoral load profiles

Sectoral load shares:

Region

Industry

Commerce

Residential

Agriculture

A

0.41%

0.28%

0.29%

0.02%

B

0.31%

0.30%

0.38%

0.01%

C

0.44%

0.30%

0.25%

0.01%

An hourly load profile for each sector has been predefined for one week, the same load profile is assumed to repeat over the year. These load profiles are scaled and normalized based on sectoral load shares for each region(assumed to be constant throughout the spatial scope), by multiplying the load profiles by their corresponding share and normalizing their hourly sum to be equal to 1.

Scaled sectoral load profiles

Scaled sectoral load profiles

Normalized sectoral load profiles

Normalized sectoral load profiles

Once the load profiles are normalized, we can multiply them with the actual load time series to obtain the load timeseries for each sector.

Load time series

Load time series

Load per Sector

Sectoral load time series

Spatial disaggregation

The next step is the spatial disaggregation, based on the land use and population concentration rasters. First, each land-use type is assigned a sectoral load percentage corresponding to the load components of the land use category. Then, the population concentration raster is used to calculate the population of each pixel.

Example - Commerce sector disaggregation

Example - Commerce sector spatial disaggregation

Counting the pixels, and land use occurrences inside of region for which the sectoral load timeseries has been calculated, the sectoral load for the Industry, commercial, and agricultural can be retrieved for each pixel of that region. Similarly, the residential load timeseries can be assigned to each pixel based on the population contained in the said pixel. The spatial disaggregation results in the assignment to every pixel inside a given region to be assigned a specific sectoral load timeseries.

Example - Residential sector disaggregation

Example - Residential sector spatial disaggregation

Re-aggregation

The result of the sectoral and spatial disaggregation, performed in the first two sections can be used to retrieve the sectoral load timeseries and, therefore, the general load time series of any desired region by summing up the loads of every pixel contained within the region. If a subregion spans more than one region or country, it is divided into subregions restrained to each of those countries.

Technical documentation

Continue here if you want to understand in detail the model implementation.

Implementation

Start with the configuration:

You can run the code by typing:

$ python runme.py

runme.py calls the main functions of the code, which are explained in the following sections.

initialization.py

Helping functions for the models are included in generate_intermediate_files.py, correction_functions.py, spatial_functions.py, and input_maps.py.

generate_intermediate_files.py
correction_functions.py
spatial_functions.py
input_maps.py

Utility functions as well as imported libraries are included in util.py.

util.py
lib.util.assign_values_based_on_series(series, dict)

This function fills a series based on the values of another series and a dictionary. The dictionary does not have to be sorted, it will be sorted before assigning the values. However, it must contain a key that is greater than any value in the series. It is equivalent to a function that maps ranges to discrete values.

Parameters
  • series (pandas series) – Series with input values that will be mapped.

  • dict (dictionary) – Dictionary defining the limits of the ranges that will be mapped.

Return result

Series with the mapped discrete values.

Return type

pandas series

lib.util.changem(A, newval, oldval)

This function replaces existing values oldval in a data array A by new values newval.

oldval and newval must have the same size.

Parameters
  • A (numpy array) – Input matrix.

  • newval (numpy array) – Vector of new values to be set.

  • oldval (numpy array) – Vector of old values to be replaced.

Return Out

The updated array.

Return type

numpy array

lib.util.create_json(filepath, param, param_keys, paths, paths_keys)

This function creates a metadata JSON file containing information about the file in filepath by storing the relevant keys from both the param and path dictionaries.

Parameters
  • filepath (string) – Path to the file for which the JSON file will be created.

  • param (dict) – Dictionary of dictionaries containing the user input parameters and intermediate outputs.

  • param_keys (list of strings) – Keys of the parameters to be extracted from the param dictionary and saved into the JSON file.

  • paths (dict) – Dictionary of dictionaries containing the paths for all files.

  • paths_keys (list of strings) – Keys of the paths to be extracted from the paths dictionary and saved into the JSON file.

Returns

The JSON file will be saved in the desired path filepath.

Return type

None

lib.util.display_progress(message, progress_stat)

This function displays a progress bar for long computations. To be used as part of a loop or with multiprocessing.

Parameters
  • message (string) – Message to be displayed with the progress bar.

  • progress_stat (tuple(int, int)) – Tuple containing the total length of the calculation and the current status or progress.

Returns

The status bar is printed.

Return type

None

lib.util.expand_dataframe(df, column_names)

This function reads a dataframe where columns with known column_names have multiple values separated by a semicolon in each entry. It expands the dataframe by creating a row for each value in each of these columns.

Parameters
  • df (pandas dataframe) – The original dataframe, with multiple values in some entries.

  • column_names (list) – Names of columns where multiple values have to be separated.

Return df_final

The expanded dataframe, where each row contains only one value per column.

Return type

pandas dataframe

lib.util.field_exists(field_name, shp_path)

This function returns whether the specified field exists or not in the shapefile linked by a path.

Parameters
  • field_name (str) – Name of the field to be checked for.

  • shp_path (str) – Path to the shapefile.

Returns

True if it exists or False if it doesn’t exist.

Return type

bool

lib.util.get_sectoral_profiles(paths, param)

This function reads the raw standard load profiles, repeats them to obtain a full year, normalizes them so that the sum is equal to 1, and stores the obtained load profile for each sector in the dataframe profiles.

Parameters
  • paths (dict) – Dictionary containing the paths to dict_daytype, dict_season, and to the raw standard load profiles.

  • param (dict) – Dictionary containing the year and load-related assumptions.

Return profiles

The normalized load profiles for the sectors.

Return type

pandas dataframe

lib.util.resizem(A_in, row_new, col_new)

This function resizes regular data grid, by copying and pasting parts of the original array.

Parameters
  • A_in (numpy array) – Input matrix.

  • row_new (integer) – New number of rows.

  • col_new (integer) – New number of columns.

Return A_out

Resized matrix.

Return type

numpy array

lib.util.reverse_lines(df)

This function reverses the line direction if the starting point is alphabetically after the end point.

Parameters

df (pandas dataframe) – Dataframe with columns ‘Region_start’ and ‘Region_end’.

Returns df_final

The same dataframe after the line direction has been reversed.

Return type

pandas dataframe

lib.util.timecheck(*args)

This function prints information about the progress of the script by displaying the function currently running, and optionally an input message, with a corresponding timestamp. If more than one argument is passed to the function, it will raise an exception.

Parameters

args (string) – Message to be displayed with the function name and the timestamp (optional).

Returns

The time stamp is printed.

Return type

None

Raise

Too many arguments have been passed to the function, the maximum is only one string.

Finally, the module generate_models.py contains formating functions that create the input files for the urbs and evrys models.

generate_models module.py
lib.generate_models.generate_evrys_model(paths, param)

This function reads all the intermediate CSV files, adapts the formatting to the structure of the evrys Excel input file, and combines the datasets into one dataframe. It writes the dataframe into an evrys input Excel file. The function would still run even if some files have not been generated. They will simply be skipped.

Parameters
  • paths (dict) – Dictionary including the paths to the intermediate files sites_sub, commodities_regions, process_regions, grid_completed, storage_regions, load_regions, potential_ren, and to the output evrys_model.

  • param (dict) – Dictionary of user preferences, including model_year and technology.

Returns

The XLSX model input file is saved directly in the desired path.

Return type

None

lib.generate_models.generate_urbs_model(paths, param)

This function reads all the intermediate CSV files, adapts the formatting to the structure of the urbs Excel input file, and combines the datasets into one dataframe. It writes the dataframe into an urbs input Excel file. The function would still run even if some files have not been generated. They will simply be skipped.

Parameters
  • paths (dict) – Dictionary including the paths to the intermediate files sites_sub, commodities_regions, process_regions, assumptions_flows, grid_completed, storage_regions, load_regions, potential_ren, and to the output urbs_model.

  • param (dict) – Dictionary of user preferences, including model_year and technology.

Returns

The XLSX model input file is saved directly in the desired path.

Return type

None

Dependencies

A list of the used libraries is available in the environment file:

name: gen_mod
channels:
  - defaults
  - conda-forge
dependencies:
  - pip=19.3.1
  - pip:
    - pyproj
  - numpy=1.16.4
  - pandas=0.25.1
  - gdal=2.4.2
  - geopandas=0.5.1
  - geopy=1.20.0
  - openpyxl=3.0.0
  - pysal=2.0.0
  - dill=0.3.1.1
  - pyshp=2.1.0
  - python=3.7.5
  - rasterio=1.0.25
  - scipy=1.3.1
  - shapely=1.6.4
  - xlrd=1.2.0
prefix: D:\Miniconda3\envs\gen_mod

Bibliography

References

Bibliography

Indices and tables