memilio.epidata.getDIVIData

getDIVIData.py

Data of the DIVI about Sars-CoV2 is downloaded. This data contains the number of Covid19 patients in intensive care and the number of those that are additionally ventilated.

DIVI - Deutsche interdisziplinäre Vereinigung für Intensiv- und Notfallmedizin

data explanation:

reporting_hospitals is the number of reporting hospitals
ICU is the number of covid patients in reporting hospitals
ICU_ventilated is the number of ventilated covid patients in reporting hospitals
free_ICU is the number of free ICUs in reporting hospitals
occupied_ICU is the number of occupied ICUs in in reporting hospitals

Functions

`divi_data_sanity_checks`(df)	Checks the sanity of the divi_data dataframe
`fetch_divi_data`(directory, filename, conf_obj)	Downloads or reads the DIVI ICU data and writes them in different files.
`get_divi_data`([read_data, file_format, ...])	Downloads or reads the DIVI ICU data and writes them in different files.
`main`()	Main program entry.
`preprocess_divi_data`(df_raw, conf_obj[, ...])	Processing of the downloaded data
`write_divi_data`(df, directory, conf_obj[, ...])	Write the divi data into json files

memilio.epidata.getDIVIData.divi_data_sanity_checks( df: pandas.DataFrame, ) → None

Checks the sanity of the divi_data dataframe

Checks if type of the given data is a dataframe Checks if the headers of the dataframe are those which are needed Checks if the size of the dataframe is not unusual

Parameters:: df – The dataframe which has to be checked pd.DataFrame

memilio.epidata.getDIVIData.fetch_divi_data( directory: str, filename: str, conf_obj, read_data: bool = False, file_format: str = 'json_timeasstring', ) → pandas.DataFrame

Downloads or reads the DIVI ICU data and writes them in different files.

If it does not already exist, the folder Germany is generated in the given out_folder. If read_data == True and the file “FullData_DIVI.json” exists, the data is read form this file and stored in a pandas dataframe. If read_data = True and the file does not exist the program is stopped. The downloaded dataframe is written to the file “FullData_DIVI”.

Parameters:

directory – str Path to the output directory
conf_obj – configuration object
filename – str File format which is used for writing the data. Default defined in defaultDict.
read_data – bool. True or False. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])
file_format – str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])

Returns:

Tuple[df_raw, start_date] Tuple. Contains the fetched data as well as the adjusted starting date

memilio.epidata.getDIVIData.get_divi_data(

read_data: bool = False,

file_format: str = 'json_timeasstring',

out_folder: str = '/home/docs/checkouts/readthedocs.org/user_builds/memilio/data/',

start_date: date = datetime.date(2020, 4, 24),

end_date: date = datetime.date(2026, 7, 31),

impute_dates: bool = False,

moving_average: int = 0,

**kwargs,

)

Downloads or reads the DIVI ICU data and writes them in different files.

Available data starts from 2020-04-24. If the given start_date is earlier, it is changed to this date and a warning is printed. It has been announced that the dataset will no longer be updated from 2024-07-21 (CW29). If end_date is later, a warning is displayed. If it does not already exist, the folder Germany is generated in the given out_folder. If read_data == True and the file “FullData_DIVI.json” exists, the data is read form this file and stored in a pandas dataframe. If read_data = True and the file does not exist the program is stopped.

The downloaded dataframe is written to the file “FullData_DIVI”. After that, the columns are renamed to English and the state and county names are added. Afterwards, three kinds of structuring of the data are done. We obtain the chronological sequence of ICU and ICU_ventilated stored in the files “county_divi”.json”, “state_divi.json” and “germany_divi.json” for counties, states and whole Germany, respectively.

Parameters:

read_data – True or False. Defines if data is read from file or downloaded. Default defined in defaultDict. (Default value = dd.defaultDict[‘read_data’])
file_format – File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])
out_folder – Folder where data is written to. Default defined in defaultDict. (Default value = dd.defaultDict[‘out_folder’])
start_date – Date of first date in dataframe. Default value = date(2020, 4, 24).
end_date – Date of last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])
impute_dates – True or False. Defines if values for dates without new information are imputed. Default defined in defaultDict. (Default value = dd.defaultDict[‘impute_dates’])
moving_average – Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])
**kwargs –

memilio.epidata.getDIVIData.main(): Main program entry.

memilio.epidata.getDIVIData.preprocess_divi_data( df_raw: pandas.DataFrame, conf_obj, start_date: date = datetime.date(2020, 4, 24), end_date: date = datetime.date(2026, 7, 31), impute_dates: bool = False, moving_average: int = 0, ) → tuple[pandas.DataFrame, pandas.DataFrame]

Processing of the downloaded data

the columns are renamed to English and the state and county names are added.

Parameters:

df_raw – pd.DataFrame
conf_obj – configuration object
start_date – date The first date in dataframe. Default value = date(2020, 4, 24).
end_date – date The last date in dataframe. Default defined in defaultDict. (Default value = dd.defaultDict[‘end_date’])
impute_dates – bool Defines if values for dates without new information are imputed. Default defined in defaultDict. (Default value = dd.defaultDict[‘impute_dates’])
moving_average – int Integers >=0.Applies an ‘moving_average’-days moving average on all time seriesto smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])

Returns:

df pd.DataFrame processed divi data

memilio.epidata.getDIVIData.write_divi_data( df: pandas.DataFrame, directory: str, conf_obj, file_format: str = 'json_timeasstring', impute_dates: bool = False, moving_average: int = 0, ) → dict

Write the divi data into json files

Three kinds of structuring of the data are done. We obtain the chronological sequence of ICU and ICU_ventilated stored in the files “county_divi”.json”, “state_divi.json” and “germany_divi.json” for counties, states and whole Germany, respectively.

Parameters:

df – pd.DataFrame. Dataframe containing processed divi data
directory – str Path to the output directory
conf_obj – configuration object
file_format – str. File format which is used for writing the data. Default defined in defaultDict. (Default value = dd.defaultDict[‘file_format’])
impute_dates – bool True or False. Defines if values for dates without new information are imputed. Default defined in defaultDict. (Default value = dd.defaultDict[‘impute_dates’])
moving_average – int Integers >=0. Applies an ‘moving_average’-days moving average on all time series to smooth out effects of irregular reporting. Default defined in defaultDict. (Default value = dd.defaultDict[‘moving_average’])

Returns:

data_dict Dict Dictionary containing datasets at county, state and national level