memilio.epidata.getVaccinationData
Functions
|
Computes vaccination ratios based on the number of vaccinations and the corresponding population data |
|
|
|
Original age groups (05-11, 12-17, 18-59, 60+) are replaced by infection data age groups (0-4, 5-14, 15-34, 35-59, 60-79, 80+). |
|
Downloads or reads the vaccination data and writes the RKIVaccFull dataset |
|
Downloads the RKI vaccination data and provides different kind of structured data. |
|
Main program entry. |
|
Processes downloaded raw data While working with the data - the column names are changed to English depending on defaultDict - The column "Date" provides information on the date of each data point given in the corresponding columns. |
|
Vaccinations in all regions are split up per population of its counties. |
|
ATTENTION: DO NOT USE! ONLY FOR BACKWARD STABILITY AND DEVELOPMENT PURPOSES. |
|
|
|
Writes the vaccination data The data is exported in three different ways: - all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+) - all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+) - all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+) - To do so getPopulationData is used and age group specific date from the original source is extrapolated on the new age groups on county level. |
- memilio.epidata.getVaccinationData.compute_vaccination_ratios(
- age_group_list,
- vaccinations_table,
- vacc_column,
- region_column,
- population,
- merge_2022=True,
Computes vaccination ratios based on the number of vaccinations and the corresponding population data
- Parameters:
age_group_list – List of age groups considered.
vaccinations_table – Table of vaccinations (possible multiple columns for different number of doses)
vacc_column – Column name of vaccinations_table to be considered.
region_column – Column of regions in vaccinations table, e.g., ID_County or ID_State.
population – Table of population data for the given regions and considered age groups.
merge_2022 – Default: False] Defines whether population data has to be merged to counties as of 2022.
- Returns:
All vaccination ratios per region and age group.
- memilio.epidata.getVaccinationData.download_vaccination_data(
- read_data,
- filename,
- directory,
- interactive,
- Parameters:
read_data –
filename –
directory –
interactive –
- memilio.epidata.getVaccinationData.extrapolate_age_groups_vaccinations(
- df_data,
- population_all_ages,
- unique_age_groups_old,
- unique_age_groups_new,
- column_names,
- age_old_to_all_ages_indices,
- min_all_ages,
- all_ages_to_age_new_share,
Original age groups (05-11, 12-17, 18-59, 60+) are replaced by infection data age groups (0-4, 5-14, 15-34, 35-59, 60-79, 80+). For every county the vacinations of old age groups are split to infection data age groups by its population ratio. For every age group and county a new dataframe is created. After the extrapolation all subframes are merged together.
- Parameters:
age_old_to_all_ages_indices – List. List of original ages
df_data – DataFrame with Data to compute.
population_all_ages – Dataframe with number of population for every age group and county.
unique_age_groups_old – List of original age groups.
unique_age_groups_new – List of infection data age groups.
column_names – List of columns to compute.
age_old_to_age_new_indices – Defines in which new age group data from old age group is stored.
min_all_ages – List of lower age from all age groups
all_ages_to_age_new_share – Age groups indices of all age groups in every new age group
- Returns:
New DataFrame with new age groups.
- memilio.epidata.getVaccinationData.fetch_vaccination_data( ) pandas.DataFrame
Downloads or reads the vaccination data and writes the RKIVaccFull dataset
- Parameters:
directory – str Path to the output directory
filename – str Name of the full dataset filename
conf_obj – configuration object
read_data – bool True or False. Defines if data is read from file or downloaded. Default defined in defaultDict.
filename – str:
directory – str:
read_data – str: (Default value = dd.defaultDict[‘read_data’])
- Returns:
pd.DataFrame fetched vaccination data
- memilio.epidata.getVaccinationData.get_vaccination_data(
- read_data: str = False,
- file_format: str = 'json_timeasstring',
- out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
- start_date: date = datetime.date(2020, 1, 1),
- end_date: date = datetime.date(2026, 6, 15),
- moving_average: int = 0,
- sanitize_data: int = 1,
- impute_dates: bool = True,
- **kwargs,
Downloads the RKI vaccination data and provides different kind of structured data.
The data is read from the internet. The file is read in or stored at the folder “out_folder”/Germany/pydata. To store and change the data we use pandas.
While working with the data - the column names are changed to English depending on defaultDict - The column “Date” provides information on the date of each data point given in the corresponding columns.
- The data is exported in three different ways:
all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+)
all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+)
- all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+)
- To do so getPopulationData is used and age group specific date from the original source
is extrapolated on the new age groups on county level.
Missing dates are imputed for all data frames (‘fillDates’ is not optional but always executed).
A central moving average of N days is optional.
Start and end dates can be provided to define the length of the returned data frames.
- Parameters:
read_data – Currently not used] True or False. Defines if data is read from file or downloaded. Here Data is always downloaded from the internet. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])
file_format – File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])
out_folder – Folder where data is written to. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])
start_date – Date of first date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘start_date’])
end_date – Date of last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])
moving_average – Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])
sanitize_data – Value in {0,1,2,3}; Default: 1. For many counties, vaccination data is not correctly attributed to home locations of vaccinated persons. If ‘sanitize_data’ is set to larger 0, this is corrected. 0: No sanitizing applied. 1: Averaging ratios over federal states. 2: Averaging ratios over intermediate regions. 3: All counties with vaccination quotas of more than ‘sanitizing_threshold’ will be adjusted to the average of its federal state and remaining vaccinations will be distributed to closely connected neighboring regions using commuter mobility networks. The sanitizing threshold will be defined by the age group-specific average on the corresponding vaccination ratios on county and federal state level. Default defined in defaultDict. (Default value = dd.defaultDict[‘sanitize_data’])
impute_dates – bool True or False. Defines if values for dates without new information are imputed. (Default value = True)
**kwargs –
- Returns:
None
- memilio.epidata.getVaccinationData.main()
Main program entry.
- memilio.epidata.getVaccinationData.process_vaccination_data(
- df_data: pandas.DataFrame,
- conf_obj,
- directory: str,
- file_format: str = 'json_timeasstring',
- start_date: date = datetime.date(2020, 1, 1),
- end_date: date = datetime.date(2026, 6, 15),
- moving_average: int = 0,
- sanitize_data: int = 1,
Processes downloaded raw data While working with the data - the column names are changed to English depending on defaultDict - The column “Date” provides information on the date of each data point given in the corresponding columns.
- Parameters:
df_data – pd.DataFrame a Dataframe containing processed vaccination data
directory – str Path to the output directory
conf_obj – configuration object
file_format – str. File format which is used for writing the data. Default defined in defaultDict.
start_date – Date of first date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘start_date’])
end_date – Date of last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])
moving_average – int. Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])
sanitize_data – int. Value in {0,1,2,3}; Default: 1. For many counties, vaccination data is not correctly attributed to home locations of vaccinated persons. If ‘sanitize_data’ is set to larger 0, this is corrected. 0: No sanitizing applied. 1: Averaging ratios over federal states. 2: Averaging ratios over intermediate regions. 3: All counties with vaccination quotas of more than ‘sanitizing_threshold’ will be adjusted to the average of its federal state and remaining vaccinations will be distributed to closely connected neighboring regions using commuter mobility networks. The sanitizing threshold will be defined by the age group-specific average on the corresponding vaccination ratios on county and federal state level. Default defined in defaultDict. (Default value = dd.defaultDict[‘sanitize_data’])
- Returns:
tuple and DataFrame
- memilio.epidata.getVaccinationData.sanitizing_average_regions(
- df,
- to_county_map,
- age_groups,
- column_names,
- age_population,
Vaccinations in all regions are split up per population of its counties. This is done by summing up all vaccinations in this region and divide this by the population ratios. This is done for every age group and number of vaccination seperately. A new dataframme is created where the new data is stored.
- Parameters:
df – DataFrame with Data to compute.
to_county_map – dict with regions as keys and countyIDs as values.
age_groups – list of all age groups as in df.
column_names – list of columns to compute.
age_population – Dataframe with number of population per age group and county.
- Returns:
New DataFrame with sanitized data.
- memilio.epidata.getVaccinationData.sanitizing_extrapolation_mobility(
- df,
- age_groups,
- column_names,
- age_population,
- neighbors_mobility,
ATTENTION: DO NOT USE! ONLY FOR BACKWARD STABILITY AND DEVELOPMENT PURPOSES. Distributes vaccinations of a county to connected counties if a lot more vaccinations than the federal state average were reported at the newest date. Thus for different max dates data for a specific date can be different. The average vaccination ratio per age group is only computed for completed vaccinations. The average vaccination ratios for partially and refreshed vaccinations are not computed. Those vaccinations are also distributed by the ratios of completed vaccinations. Since the distribution is done for one county after another a different order of counties may result in different data.
- Parameters:
df – DataFrame with Data to compute.
age_groups – list of all age groups as in df.
column_names – list of columns to compute.
age_population – Dataframe with number of population per age group and county.
neighbors_mobility – dict with counties as keys and commuter mobility to other counties as values.
- Returns:
New DataFrame with sanitized data.
- memilio.epidata.getVaccinationData.sanity_checks(df)
- Parameters:
df –
- memilio.epidata.getVaccinationData.write_vaccination_data(
- dict_data: dict,
- conf_obj,
- directory: str,
- file_format: str = 'json_timeasstring',
- impute_dates: bool = True,
- moving_average: int = 0,
Writes the vaccination data The data is exported in three different ways:
all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+)
all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+)
- all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+)
- To do so getPopulationData is used and age group specific date from the original source
is extrapolated on the new age groups on county level.
Missing dates are imputed for all data frames (‘fillDates’ is not optional but always executed).
A central moving average of N days is optional.
Start and end dates can be provided to define the length of the returned data frames.
- :param :
df_data_agevacc_county_cs: pd.DataFrame a Dataframe containing processed vaccination data
vacc_column_names
unique_age_groups_old
population_old_ages
extrapolate_agegroups
population_all_ages
unique_age_groups_new
age_old_to_all_ages_indices
min_all_ages
all_ages_to_age_new_share
population_new_ages
:type : param dict_data: dict. Contains various datasets or values :param : Path to the output directory :type : param directory: str :param : :type : param conf_obj: configuration object :param : :type : param file_format: str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’]) :param : :type : param impute_dates: bool. True or False. Defines if values for dates without new information are imputed. (Default value = True) :param : :type : param moving_average: int. Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’]) :param : :type : returns: none