memilio.epidata.getPopulationData
getPopulationData.py
Downloads data about population statistic
Functions
|
Assigns population data of all counties of old dataframe in new created dataframe |
|
Writes population dataframe into directory with new column names and age groups |
|
Downloads or reads the population data. |
|
Download age-stratified population data for the German counties. |
|
Main program entry. |
|
Processing of the downloaded data |
|
Reads Population data from regionalstatistik.de |
|
Tests if total population matches expectation |
|
Write the population data into json files Three kinds of structuring of the data are done. |
- memilio.epidata.getPopulationData.assign_population_data(
- df_pop_raw,
- counties,
- age_cols,
- idCounty_idx,
Assigns population data of all counties of old dataframe in new created dataframe
In df_pop_raw there might be additional information like federal states, governing regions etc. which is not necessary for the dataframe. Also checks for incomplete data.
- Parameters:
df_pop_raw – Raw Population DataFrame read from regionalstatistik.de
counties – List of counties to be assigned in new DataFrame
age_cols – Age groups in old DataFrame
idCountyidx – indexes in old DataFrame where data of corresponding county starts
idCounty_idx –
- Returns:
new DataFrame
- memilio.epidata.getPopulationData.export_population_dataframe( )
Writes population dataframe into directory with new column names and age groups
- Parameters:
df_pop – Population data DataFrame to be exported pd.DataFrame
directory – Directory where data is written to. str
file_format – File format which is used for writing the data. str
merge_eisenach – Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’. bool
ref_year – None or year (jjjj) convertible to str. Reference year.
- Returns:
exported DataFrame
- memilio.epidata.getPopulationData.fetch_population_data(
- read_data: bool = False,
- out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
- ref_year=None,
- **kwargs,
Downloads or reads the population data. If it does not already exist, the folder Germany is generated in the given out_folder. If read_data == True and the file “FullData_population.json” exists, the data is read form this file and stored in a pandas dataframe. If read_data = True and the file does not exist the program is stopped. The downloaded dataframe is written to the file “FullData_population”.
- Parameters:
read_data – False or True. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])
out_folder – Path to folder where data is written in folder out_folder/Germany. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])
ref_year – (Default: None) or year (jjjj) convertible to str. Reference year.
**kwargs –
- Returns:
DataFrame with adjusted population data for all ages to current level.
- memilio.epidata.getPopulationData.get_population_data(
- read_data: bool = False,
- file_format: str = 'json_timeasstring',
- out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
- merge_eisenach: bool = True,
- ref_year=None,
- **kwargs,
Download age-stratified population data for the German counties.
The data we use is: Official ‘Bevölkerungsfortschreibung’ 12411-02-03-4: ‘Bevölkerung nach Geschlecht und Altersgruppen (17)’ of regionalstatistik.de. ATTENTION: The raw file cannot be downloaded automatically by our scripts without an Genesis Online account. In order to work on this dataset, please enter your username and password or manually download it from:
https://www.regionalstatistik.de/genesis/online -> “1: Gebiet, Bevölkerung, Arbeitsmarkt, Wahlen” -> “12: Bevölkerung” -> “12411 Fortschreibung des Bevölkerungsstandes” -> “12411-02-03-4: Bevölkerung nach Geschlecht und Altersgruppen (17) - Stichtag 31.12. - regionale Tiefe: Kreise und krfr. Städte”.
Download the xlsx or csv file and put it under dd.defaultDict[‘out_folder’], this normally is Memilio/data/pydata/Germany. The folders ‘pydata/Germany’ have to be created if they do not exist yet. Then this script can be run.
- Parameters:
read_data – False or True. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])
file_format – File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])
out_folder – Path to folder where data is written in folder out_folder/Germany. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])
merge_eisenach – bool, Default: True]. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’.
ref_year – Default: None] or year (jjjj) convertible to str. Reference year.
username – str. Username to sign in at regionalstatistik.de.
password – str. Password to sign in at regionalstatistik.de.
**kwargs –
- Returns:
DataFrame with adjusted population data for all ages to current level.
- memilio.epidata.getPopulationData.main()
Main program entry.
- memilio.epidata.getPopulationData.preprocess_population_data(
- df_pop_raw: pandas.DataFrame,
- merge_eisenach: bool = True,
- Processing of the downloaded data
the columns are renamed to English and the state and county names are added.
- Parameters:
df_pop_raw – pd.DataFrame. A Dataframe containing input population data
merge_eisenach – Default: True] or False. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’. (Default value = True)
- Returns:
df pd.DataFrame. Processed population data
- memilio.epidata.getPopulationData.read_population_data(ref_year)
Reads Population data from regionalstatistik.de
A request is made to regionalstatistik.de and the StringIO is read in as a csv into the dataframe format.
- Parameters:
ref_year – Default: None or year (jjjj) convertible to str. Reference year.
- Returns:
DataFrame
- memilio.epidata.getPopulationData.test_total_population(df_pop, age_cols)
Tests if total population matches expectation
- Parameters:
df_pop – Population Dataframe with all counties
age_cols – All age groups in DataFram
- memilio.epidata.getPopulationData.write_population_data(
- df_pop: pandas.DataFrame,
- out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
- file_format: str = 'json_timeasstring',
- merge_eisenach: bool = True,
- ref_year=None,
Write the population data into json files Three kinds of structuring of the data are done. We obtain the chronological sequence of ICU and ICU_ventilated stored in the files “county_population”.json”, “state_population.json” and “germany_population.json” for counties, states and whole Germany, respectively.
- Parameters:
df_pop – pd.DataFrame. A Dataframe containing processed population data
file_format – str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])
out_folder – str. Folder where data is written to. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])
merge_eisenach – bool, Default: True. Defines whether the counties ‘Wartburgkreis’ and ‘Eisenach’ are listed separately or combined as one entity ‘Wartburgkreis’.
ref_year – Default: None or year (jjjj) convertible to str. Reference year.
- Returns:
None