memilio.epidata.getPopulationData

getPopulationData.py

Downloads data about population statistic

Functions

assign_population_data(df_pop_raw, counties, ...)

Assigns population data of all counties of old dataframe in new created dataframe

export_population_dataframe(df_pop, ...)

Writes population dataframe into directory with new column names and age groups

fetch_population_data([read_data, ...])

Downloads or reads the population data.

get_population_data([read_data, ...])

Download age-stratified population data for the German counties.

main()

Main program entry.

preprocess_population_data(df_pop_raw[, ...])

Processing of the downloaded data

read_population_data(ref_year)

Reads Population data from regionalstatistik.de

test_total_population(df_pop, age_cols)

Tests if total population matches expectation

write_population_data(df_pop[, out_folder, ...])

Write the population data into json files Three kinds of structuring of the data are done.

memilio.epidata.getPopulationData.assign_population_data(
df_pop_raw,
counties,
age_cols,
idCounty_idx,
)

Assigns population data of all counties of old dataframe in new created dataframe

In df_pop_raw there might be additional information like federal states, governing regions etc. which is not necessary for the dataframe. Also checks for incomplete data.

Parameters:
  • df_pop_raw – Raw Population DataFrame read from regionalstatistik.de

  • counties – List of counties to be assigned in new DataFrame

  • age_cols – Age groups in old DataFrame

  • idCountyidx – indexes in old DataFrame where data of corresponding county starts

  • idCounty_idx

Returns:

new DataFrame

memilio.epidata.getPopulationData.export_population_dataframe(
df_pop: pandas.DataFrame,
directory: str,
file_format: str,
merge_eisenach: bool,
ref_year,
)

Writes population dataframe into directory with new column names and age groups

Parameters:
  • df_pop – Population data DataFrame to be exported pd.DataFrame

  • directory – Directory where data is written to. str

  • file_format – File format which is used for writing the data. str

  • merge_eisenach – Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’. bool

  • ref_year – None or year (jjjj) convertible to str. Reference year.

Returns:

exported DataFrame

memilio.epidata.getPopulationData.fetch_population_data(
read_data: bool = False,
out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
ref_year=None,
**kwargs,
) pandas.DataFrame

Downloads or reads the population data. If it does not already exist, the folder Germany is generated in the given out_folder. If read_data == True and the file “FullData_population.json” exists, the data is read form this file and stored in a pandas dataframe. If read_data = True and the file does not exist the program is stopped. The downloaded dataframe is written to the file “FullData_population”.

Parameters:
  • read_data – False or True. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])

  • out_folder – Path to folder where data is written in folder out_folder/Germany. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])

  • ref_year – (Default: None) or year (jjjj) convertible to str. Reference year.

  • **kwargs

Returns:

DataFrame with adjusted population data for all ages to current level.

memilio.epidata.getPopulationData.get_population_data(
read_data: bool = False,
file_format: str = 'json_timeasstring',
out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
merge_eisenach: bool = True,
ref_year=None,
**kwargs,
)

Download age-stratified population data for the German counties.

The data we use is: Official ‘Bevölkerungsfortschreibung’ 12411-02-03-4: ‘Bevölkerung nach Geschlecht und Altersgruppen (17)’ of regionalstatistik.de. ATTENTION: The raw file cannot be downloaded automatically by our scripts without an Genesis Online account. In order to work on this dataset, please enter your username and password or manually download it from:

https://www.regionalstatistik.de/genesis/online -> “1: Gebiet, Bevölkerung, Arbeitsmarkt, Wahlen” -> “12: Bevölkerung” -> “12411 Fortschreibung des Bevölkerungsstandes” -> “12411-02-03-4: Bevölkerung nach Geschlecht und Altersgruppen (17) - Stichtag 31.12. - regionale Tiefe: Kreise und krfr. Städte”.

Download the xlsx or csv file and put it under dd.defaultDict[‘out_folder’], this normally is Memilio/data/pydata/Germany. The folders ‘pydata/Germany’ have to be created if they do not exist yet. Then this script can be run.

Parameters:
  • read_data – False or True. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])

  • file_format – File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])

  • out_folder – Path to folder where data is written in folder out_folder/Germany. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])

  • merge_eisenach – bool, Default: True]. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’.

  • ref_year – Default: None] or year (jjjj) convertible to str. Reference year.

  • username – str. Username to sign in at regionalstatistik.de.

  • password – str. Password to sign in at regionalstatistik.de.

  • **kwargs

Returns:

DataFrame with adjusted population data for all ages to current level.

memilio.epidata.getPopulationData.main()

Main program entry.

memilio.epidata.getPopulationData.preprocess_population_data(
df_pop_raw: pandas.DataFrame,
merge_eisenach: bool = True,
) pandas.DataFrame
Processing of the downloaded data
  • the columns are renamed to English and the state and county names are added.

Parameters:
  • df_pop_raw – pd.DataFrame. A Dataframe containing input population data

  • merge_eisenach – Default: True] or False. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’. (Default value = True)

Returns:

df pd.DataFrame. Processed population data

memilio.epidata.getPopulationData.read_population_data(ref_year)

Reads Population data from regionalstatistik.de

A request is made to regionalstatistik.de and the StringIO is read in as a csv into the dataframe format.

Parameters:

ref_year – Default: None or year (jjjj) convertible to str. Reference year.

Returns:

DataFrame

memilio.epidata.getPopulationData.test_total_population(df_pop, age_cols)

Tests if total population matches expectation

Parameters:
  • df_pop – Population Dataframe with all counties

  • age_cols – All age groups in DataFram

memilio.epidata.getPopulationData.write_population_data(
df_pop: pandas.DataFrame,
out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
file_format: str = 'json_timeasstring',
merge_eisenach: bool = True,
ref_year=None,
) pandas.DataFrame

Write the population data into json files Three kinds of structuring of the data are done. We obtain the chronological sequence of ICU and ICU_ventilated stored in the files “county_population”.json”, “state_population.json” and “germany_population.json” for counties, states and whole Germany, respectively.

Parameters:
  • df_pop – pd.DataFrame. A Dataframe containing processed population data

  • file_format – str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])

  • out_folder – str. Folder where data is written to. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])

  • merge_eisenach – bool, Default: True. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’.

  • ref_year – Default: None or year (jjjj) convertible to str. Reference year.

Returns:

None