ServiceXPythonFunction#
- class servicex.servicex_python_function.ServiceXPythonFunction(dataset: ~typing.Union[str, ~typing.Iterable[str]], backend_name: ~typing.Optional[str] = None, image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', servicex_adaptor: ~typing.Optional[~servicex.servicex_adaptor.ServiceXAdaptor] = None, minio_adaptor: ~typing.Optional[~typing.Union[~servicex.minio_adaptor.MinioAdaptor, ~servicex.minio_adaptor.MinioAdaptorFactory]] = None, cache_adaptor: ~typing.Optional[~servicex.cache.Cache] = None, status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>, local_log: ~typing.Optional[~servicex.utils.log_adaptor] = None, session_generator: ~typing.Optional[~typing.Callable[[], ~typing.Awaitable[~aiohttp.client.ClientSession]]] = None, config_adaptor: ~typing.Optional[~servicex.servicex_config.ServiceXConfigAdaptor] = None, data_convert_adaptor: ~typing.Optional[~servicex.data_conversions.DataConverterAdaptor] = None, ignore_cache: bool = False)[source]#
Bases:
ServiceXDataset
- __init__(dataset: ~typing.Union[str, ~typing.Iterable[str]], backend_name: ~typing.Optional[str] = None, image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', servicex_adaptor: ~typing.Optional[~servicex.servicex_adaptor.ServiceXAdaptor] = None, minio_adaptor: ~typing.Optional[~typing.Union[~servicex.minio_adaptor.MinioAdaptor, ~servicex.minio_adaptor.MinioAdaptorFactory]] = None, cache_adaptor: ~typing.Optional[~servicex.cache.Cache] = None, status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>, local_log: ~typing.Optional[~servicex.utils.log_adaptor] = None, session_generator: ~typing.Optional[~typing.Callable[[], ~typing.Awaitable[~aiohttp.client.ClientSession]]] = None, config_adaptor: ~typing.Optional[~servicex.servicex_config.ServiceXConfigAdaptor] = None, data_convert_adaptor: ~typing.Optional[~servicex.data_conversions.DataConverterAdaptor] = None, ignore_cache: bool = False)#
Create and configure a ServiceX object for a dataset.
Arguments
dataset Name of a dataset from which queries will be selected. backend_name The type of backend. Used only if we need to find an
end-point. If we do not have a servicex_adaptor then this will default to xaod, unless you have any endpoint listed in your servicex file. It will default to best match there, or fail if a name has been given.
- image Name of transformer image to use to transform the data. If
left as default, None, then the default image for the ServiceX backend will be used.
- max_workers Maximum number of transformers to run simultaneously on
ServiceX.
- result_destination Where the transformers should write the results.
Defaults to object-store, but could be used to save results to a posix volume
- servicex_adaptor Object to control communication with the servicex instance
at a particular ip address with certain login credentials. Will be configured via the config_adaptor by default.
- minio_adaptor Object to control communication with the minio servicex
instance.
- cache_adaptor Runs the caching for data and queries that are sent up and
down.
- status_callback_factory Factory to create a status notification callback for each
query. One is created per query.
local_log Log adaptor for logging. session_generator If you want to control the ClientSession object that
is used for callbacks. Otherwise a single one for all servicex queries is used.
- config_adaptor Control how configuration options are read from the
a configuration file (e.g. servicex.yaml).
- data_convert_adaptor Manages conversions between root and parquet and pandas
and awkward, including default settings for expected datatypes from the backend.
- ignore_cache Always ignore the cache on any query for this dataset. This
is only meaningful if no cache adaptor is provided. Defaults to false - the cache is used if possible.
Notes:
The status_callback argument, by default, uses the tqdm library to render progress bars in a terminal window or a graphic in a Jupyter notebook (with proper jupyter extensions installed). If status_callback is specified as None, no updates will be rendered. A custom callback function can also be specified which takes (total_files, transformed, downloaded, skipped) as an argument. The total_files parameter may be None until the system knows how many files need to be processed (and some files can even be completed before that is known).
The full description of calling parameters is recorded in the local cache, including things like image name and tag.
Attributes
- dataset_as_name#
Return the dataset name as a string for “human” consumption.
Note that this can be very very long!
- Returns:
str: The dataset name formatted as a string
Methods
- _abc_impl()#
Internal state held by ABC machinery.
- async get_data_awkward_async(selection_function: Callable, title: Optional[str] = None)[source]#
Fetch query data from ServiceX matching selection_query and return it as dictionary of awkward arrays, an entry for each column. The data is uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
- a Dictionary of jagged arrays (as needed), one for each
column. The dictionary keys are bytes to support possible unicode characters.
- async get_data_rootfiles_async(selection_function: Callable, title: Optional[str] = None) List[Path] [source]#
Fetch query data from ServiceX matching selection_query and return it as a list of root files. The files are uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
root_files The list of root files