memilio.epidata.getDataIntoPandasDataFrame

getDataIntoPandasDataFrame.py Tools to convert data to pandas dataframes

This tool contains

load excel format
load csv format
organizes the command line interface
check if directory exists and if not creates it
writes pandas dataframe to file of three different formats

Functions

`append_filename`([filename, impute_dates, ...])	Creates consistent file names for all output.
`check_dir`(directory)	Checks existence and creates folder
`cli`(what)	Defines command line interface
`default_print`(verbosity_level, message)	param verbosity_level:
`download_file`(url[, chunk_size, timeout, ...])	Download a file using GET over HTTP.
`extract_zip`(file, **param_dict)	reads a zip file and returns a list of dataframes for every file in the zip folder.
`get_file`([filepath, url, read_data, ...])	Loads data from filepath and stores it in a pandas dataframe.
`user_choice`(message[, default])	param message:
`write_dataframe`(df, directory, file_prefix, ...)	Writes pandas dataframe to file

Classes

`Conf`(out_folder, **kwargs)	Configures all relevant download outputs etc.
`VerbosityLevel`(value)

Exceptions

DataError

Error for handling incomplete or unexpected Data

class memilio.epidata.getDataIntoPandasDataFrame.Conf(out_folder, **kwargs)

Configures all relevant download outputs etc.

excel_engine = 'openpyxl'

show_progr = False

v_level = 'Info'

exception memilio.epidata.getDataIntoPandasDataFrame.DataError: Error for handling incomplete or unexpected Data

class memilio.epidata.getDataIntoPandasDataFrame.VerbosityLevel(value)

Critical = 1

Debug = 5

Error = 2

Info = 4

Off = 0

Trace = 6

Warning = 3

memilio.epidata.getDataIntoPandasDataFrame.append_filename( filename='', impute_dates=False, moving_average=0, split_berlin=False, rep_date=False, )

Creates consistent file names for all output.

Parameters:

filename – (Default value = ‘’)
impute_dates – (Default value = False)
moving_average – (Default value = 0)
split_berlin – (Default value = False)
rep_date – (Default value = False)

memilio.epidata.getDataIntoPandasDataFrame.check_dir(directory)

Checks existence and creates folder

It is checked if the folder given in the parameter “directory” exists. If it does not exist it is created.

Parameters:: directory – directory which should exist

memilio.epidata.getDataIntoPandasDataFrame.cli(what)

Defines command line interface

The function parameter “what” is used as a dictionary key. The return of the dictionary is either a list of a string and a list of keywords. The string is the message that should be printed when working on the specific package. The further list, contains all special command line arguments which are needed for this package.

If the key is not part of the dictionary the program is stopped.

The following default arguments are added to the parser: - read-file - file-format, choices = [‘json’, ‘hdf5’, ‘json_timeasstring’] - out_path The default values are defined in default dict.

Depending on what following parser can be added: - start_date - end_date - impute_dates - moving_average - make_plot - split_berlin - rep_date - sanitize_data - no_progress_indicator - interactive - verbose - skip_checks - no_raw - to_dataset

Parameters:: what – Defines what packages calls and thus what kind of command line arguments should be defined.

memilio.epidata.getDataIntoPandasDataFrame.default_print(verbosity_level, message)

Parameters:

verbosity_level –
message –

memilio.epidata.getDataIntoPandasDataFrame.download_file( url, chunk_size=1024, timeout=None, progress_function=None, verify=True, )

Download a file using GET over HTTP.

Parameters:

url – Full url of the file to download.
chunk_size – Number of Bytes downloaded at once. Only used when a progress_function is specified. For a good display of progress, this size should be about the speed of your internet connection in Bytes/s. Can be set to None to let the server decide the chunk size (may be equal to the file size). (Default value = 1024)
timeout – Timeout in seconds for the GET request. (Default value = None)
progress_function – Function called regularly, with the current download progress in [0,1] as a float argument. (Default value = None)
verify – bool or “interactive”. If False, ignores the connection’s security. If True, only starts downloads from secure connections, and insecure connections raise a FileNotFoundError. If “interactive”, prompts the user whether or not to allow insecure connections. (Default value = True)

Returns:

File as BytesIO

memilio.epidata.getDataIntoPandasDataFrame.extract_zip(file, **param_dict)

reads a zip file and returns a list of dataframes for every file in the zip folder. If only one file is readable for func_to_use a single dataframe is returned instead of a list with one entry.

Parameters:

file – String. Path to Zipfile to read.
param_dict – Dict. Additional information for download functions (e.g. engine, sheet_name, header…)

Returns:

list od all dataframes (one for each file).

memilio.epidata.getDataIntoPandasDataFrame.get_file( filepath='', url='', read_data=False, param_dict={}, interactive=False, )

Loads data from filepath and stores it in a pandas dataframe. If data can’t be read from given filepath the user is asked whether the file should be downloaded from the given url or not. Uses the progress indicator to give feedback.

Parameters:

filepath – String. Filepath from where the data is read. (Default value = ‘’)
url – String. URL to download the dataset. (Default value = ‘’)
read_data – True or False. Defines if item is opened from directory (True) or downloaded (False). (Default value = dd.defaultDict[‘read_data’])
param_dict – Dict. Additional information for download functions (e.g. engine, sheet_name, header…) (Default value = {})
interactive – bool. Whether to ask for user input. If False, raises Errors instead. (Default value = False)

Returns:

pandas dataframe

memilio.epidata.getDataIntoPandasDataFrame.user_choice(message, default=False)

Parameters:

message –
default – (Default value = False)

memilio.epidata.getDataIntoPandasDataFrame.write_dataframe( df, directory, file_prefix, file_type, param_dict={}, )

Writes pandas dataframe to file

This routine writes a pandas dataframe to a file in a given format. The filename is given without ending. A file_type can be - json - json_timeasstring [Default] - hdf5 - csv - txt The file_type defines the file format and thus also the file ending. The file format can be json, hdf5, csv or txt. For this option the column Date is converted from datetime to string.

Parameters:

df – pandas dataframe (pandas DataFrame)
directory – directory where to safe (string)
file_prefix – filename without ending (string)
file_type – defines ending (string)
param_dict – defines parameters for to_csv/txt(dictionary) (Default value = {})