memilio.epidata.getVaccinationData

Functions

compute_vaccination_ratios(age_group_list, ...)

Computes vaccination ratios based on the number of vaccinations and the corresponding population data

download_vaccination_data(read_data, ...)

param read_data:

extrapolate_age_groups_vaccinations(df_data, ...)

Original age groups (05-11, 12-17, 18-59, 60+) are replaced by infection data age groups (0-4, 5-14, 15-34, 35-59, 60-79, 80+).

fetch_vaccination_data(conf_obj, filename, ...)

Downloads or reads the vaccination data and writes the RKIVaccFull dataset

get_vaccination_data([read_data, ...])

Downloads the RKI vaccination data and provides different kind of structured data.

main()

Main program entry.

process_vaccination_data(df_data, conf_obj, ...)

Processes downloaded raw data While working with the data - the column names are changed to English depending on defaultDict - The column "Date" provides information on the date of each data point given in the corresponding columns.

sanitizing_average_regions(df, ...)

Vaccinations in all regions are split up per population of its counties.

sanitizing_extrapolation_mobility(df, ...)

ATTENTION: DO NOT USE! ONLY FOR BACKWARD STABILITY AND DEVELOPMENT PURPOSES.

sanity_checks(df)

param df:

write_vaccination_data(dict_data, conf_obj, ...)

Writes the vaccination data The data is exported in three different ways: - all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+) - all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+) - all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+) - To do so getPopulationData is used and age group specific date from the original source is extrapolated on the new age groups on county level.

memilio.epidata.getVaccinationData.compute_vaccination_ratios(
age_group_list,
vaccinations_table,
vacc_column,
region_column,
population,
merge_2022=True,
)

Computes vaccination ratios based on the number of vaccinations and the corresponding population data

Parameters:
  • age_group_list – List of age groups considered.

  • vaccinations_table – Table of vaccinations (possible multiple columns for different number of doses)

  • vacc_column – Column name of vaccinations_table to be considered.

  • region_column – Column of regions in vaccinations table, e.g., ID_County or ID_State.

  • population – Table of population data for the given regions and considered age groups.

  • merge_2022 – Default: False] Defines whether population data has to be merged to counties as of 2022.

Returns:

All vaccination ratios per region and age group.

memilio.epidata.getVaccinationData.download_vaccination_data(
read_data,
filename,
directory,
interactive,
)
Parameters:
  • read_data

  • filename

  • directory

  • interactive

memilio.epidata.getVaccinationData.extrapolate_age_groups_vaccinations(
df_data,
population_all_ages,
unique_age_groups_old,
unique_age_groups_new,
column_names,
age_old_to_all_ages_indices,
min_all_ages,
all_ages_to_age_new_share,
)

Original age groups (05-11, 12-17, 18-59, 60+) are replaced by infection data age groups (0-4, 5-14, 15-34, 35-59, 60-79, 80+). For every county the vacinations of old age groups are split to infection data age groups by its population ratio. For every age group and county a new dataframe is created. After the extrapolation all subframes are merged together.

Parameters:
  • age_old_to_all_ages_indices – List. List of original ages

  • df_data – DataFrame with Data to compute.

  • population_all_ages – Dataframe with number of population for every age group and county.

  • unique_age_groups_old – List of original age groups.

  • unique_age_groups_new – List of infection data age groups.

  • column_names – List of columns to compute.

  • age_old_to_age_new_indices – Defines in which new age group data from old age group is stored.

  • min_all_ages – List of lower age from all age groups

  • all_ages_to_age_new_share – Age groups indices of all age groups in every new age group

Returns:

New DataFrame with new age groups.

memilio.epidata.getVaccinationData.fetch_vaccination_data(
conf_obj,
filename: str,
directory: str,
read_data: str = False,
) pandas.DataFrame

Downloads or reads the vaccination data and writes the RKIVaccFull dataset

Parameters:
  • directory – str Path to the output directory

  • filename – str Name of the full dataset filename

  • conf_obj – configuration object

  • read_data – bool True or False. Defines if data is read from file or downloaded. Default defined in defaultDict.

  • filename – str:

  • directory – str:

  • read_data – str: (Default value = dd.defaultDict[‘read_data’])

Returns:

pd.DataFrame fetched vaccination data

memilio.epidata.getVaccinationData.get_vaccination_data(
read_data: str = False,
file_format: str = 'json_timeasstring',
out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',
start_date: date = datetime.date(2020, 1, 1),
end_date: date = datetime.date(2026, 6, 15),
moving_average: int = 0,
sanitize_data: int = 1,
impute_dates: bool = True,
**kwargs,
)

Downloads the RKI vaccination data and provides different kind of structured data.

The data is read from the internet. The file is read in or stored at the folder “out_folder”/Germany/pydata. To store and change the data we use pandas.

While working with the data - the column names are changed to English depending on defaultDict - The column “Date” provides information on the date of each data point given in the corresponding columns.

  • The data is exported in three different ways:
    • all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+)

    • all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+)

    • all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+)
      • To do so getPopulationData is used and age group specific date from the original source

        is extrapolated on the new age groups on county level.

  • Missing dates are imputed for all data frames (‘fillDates’ is not optional but always executed).

  • A central moving average of N days is optional.

  • Start and end dates can be provided to define the length of the returned data frames.

Parameters:
  • read_data – Currently not used] True or False. Defines if data is read from file or downloaded. Here Data is always downloaded from the internet. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])

  • file_format – File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])

  • out_folder – Folder where data is written to. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])

  • start_date – Date of first date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘start_date’])

  • end_date – Date of last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])

  • moving_average – Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])

  • sanitize_data – Value in {0,1,2,3}; Default: 1. For many counties, vaccination data is not correctly attributed to home locations of vaccinated persons. If ‘sanitize_data’ is set to larger 0, this is corrected. 0: No sanitizing applied. 1: Averaging ratios over federal states. 2: Averaging ratios over intermediate regions. 3: All counties with vaccination quotas of more than ‘sanitizing_threshold’ will be adjusted to the average of its federal state and remaining vaccinations will be distributed to closely connected neighboring regions using commuter mobility networks. The sanitizing threshold will be defined by the age group-specific average on the corresponding vaccination ratios on county and federal state level. Default defined in defaultDict. (Default value = dd.defaultDict[‘sanitize_data’])

  • impute_dates – bool True or False. Defines if values for dates without new information are imputed. (Default value = True)

  • **kwargs

Returns:

None

memilio.epidata.getVaccinationData.main()

Main program entry.

memilio.epidata.getVaccinationData.process_vaccination_data(
df_data: pandas.DataFrame,
conf_obj,
directory: str,
file_format: str = 'json_timeasstring',
start_date: date = datetime.date(2020, 1, 1),
end_date: date = datetime.date(2026, 6, 15),
moving_average: int = 0,
sanitize_data: int = 1,
) dict

Processes downloaded raw data While working with the data - the column names are changed to English depending on defaultDict - The column “Date” provides information on the date of each data point given in the corresponding columns.

Parameters:
  • df_data – pd.DataFrame a Dataframe containing processed vaccination data

  • directory – str Path to the output directory

  • conf_obj – configuration object

  • file_format – str. File format which is used for writing the data. Default defined in defaultDict.

  • start_date – Date of first date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘start_date’])

  • end_date – Date of last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])

  • moving_average – int. Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])

  • sanitize_data – int. Value in {0,1,2,3}; Default: 1. For many counties, vaccination data is not correctly attributed to home locations of vaccinated persons. If ‘sanitize_data’ is set to larger 0, this is corrected. 0: No sanitizing applied. 1: Averaging ratios over federal states. 2: Averaging ratios over intermediate regions. 3: All counties with vaccination quotas of more than ‘sanitizing_threshold’ will be adjusted to the average of its federal state and remaining vaccinations will be distributed to closely connected neighboring regions using commuter mobility networks. The sanitizing threshold will be defined by the age group-specific average on the corresponding vaccination ratios on county and federal state level. Default defined in defaultDict. (Default value = dd.defaultDict[‘sanitize_data’])

Returns:

tuple and DataFrame

memilio.epidata.getVaccinationData.sanitizing_average_regions(
df,
to_county_map,
age_groups,
column_names,
age_population,
)

Vaccinations in all regions are split up per population of its counties. This is done by summing up all vaccinations in this region and divide this by the population ratios. This is done for every age group and number of vaccination seperately. A new dataframme is created where the new data is stored.

Parameters:
  • df – DataFrame with Data to compute.

  • to_county_map – dict with regions as keys and countyIDs as values.

  • age_groups – list of all age groups as in df.

  • column_names – list of columns to compute.

  • age_population – Dataframe with number of population per age group and county.

Returns:

New DataFrame with sanitized data.

memilio.epidata.getVaccinationData.sanitizing_extrapolation_mobility(
df,
age_groups,
column_names,
age_population,
neighbors_mobility,
)

ATTENTION: DO NOT USE! ONLY FOR BACKWARD STABILITY AND DEVELOPMENT PURPOSES. Distributes vaccinations of a county to connected counties if a lot more vaccinations than the federal state average were reported at the newest date. Thus for different max dates data for a specific date can be different. The average vaccination ratio per age group is only computed for completed vaccinations. The average vaccination ratios for partially and refreshed vaccinations are not computed. Those vaccinations are also distributed by the ratios of completed vaccinations. Since the distribution is done for one county after another a different order of counties may result in different data.

Parameters:
  • df – DataFrame with Data to compute.

  • age_groups – list of all age groups as in df.

  • column_names – list of columns to compute.

  • age_population – Dataframe with number of population per age group and county.

  • neighbors_mobility – dict with counties as keys and commuter mobility to other counties as values.

Returns:

New DataFrame with sanitized data.

memilio.epidata.getVaccinationData.sanity_checks(df)
Parameters:

df

memilio.epidata.getVaccinationData.write_vaccination_data(
dict_data: dict,
conf_obj,
directory: str,
file_format: str = 'json_timeasstring',
impute_dates: bool = True,
moving_average: int = 0,
) tuple

Writes the vaccination data The data is exported in three different ways:

  • all_county_vacc: Resolved per county by grouping all original age groups (05-11, 12-17, 18-59, 60+)

  • all_county_agevacc_vacc: Resolved per county and original age group (05-11, 12-17, 18-59, 60+)

  • all_county_ageinf_vacc: Resolved per county and infection data age group (0-4, 5-14, 15-34, 35-59, 60-79, 80+)
    • To do so getPopulationData is used and age group specific date from the original source

      is extrapolated on the new age groups on county level.

  • Missing dates are imputed for all data frames (‘fillDates’ is not optional but always executed).

  • A central moving average of N days is optional.

  • Start and end dates can be provided to define the length of the returned data frames.

:param :
  • df_data_agevacc_county_cs: pd.DataFrame a Dataframe containing processed vaccination data

  • vacc_column_names

  • unique_age_groups_old

  • population_old_ages

  • extrapolate_agegroups

  • population_all_ages

  • unique_age_groups_new

  • age_old_to_all_ages_indices

  • min_all_ages

  • all_ages_to_age_new_share

  • population_new_ages

:type : param dict_data: dict. Contains various datasets or values :param : Path to the output directory :type : param directory: str :param : :type : param conf_obj: configuration object :param : :type : param file_format: str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’]) :param : :type : param impute_dates: bool. True or False. Defines if values for dates without new information are imputed. (Default value = True) :param : :type : param moving_average: int. Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’]) :param : :type : returns: none