ServiceXABC#

class servicex.servicexabc.ServiceXABC(dataset: ~typing.Union[str, ~typing.Iterable[str]], image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>)[source]#

Bases: ABC

Abstract base class for accessing the ServiceX front-end for a particular dataset. This does have some implementations, but not a full set (hence why it isn’t an ABC).

A light weight, mostly immutable, base class that holds basic configuration information for use with ServiceX file access, including the dataset name. Subclasses implement the various access methods. Note that not all methods may be accessible!

__init__(dataset: ~typing.Union[str, ~typing.Iterable[str]], image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>)[source]#

Create and configure a ServiceX object for a dataset.

Arguments

dataset Name of a dataset from which queries will be selected. image Name of transformer image to use to transform the data. If

None the default implementation is used.

cache_adaptor Runs the caching for data and queries that are sent up and

down.

max_workers Maximum number of transformers to run simultaneously on

ServiceX.

result_destination Where the transformers should write the results.

Defaults to object-store, but could be used to save results to a posix volume

cache_path Path to the cache status_callback_factory Factory to create a status notification callback for each

query. One is created per query.

Notes:

  • The status_callback argument, by default, uses the tqdm library to render progress bars in a terminal window or a graphic in a Jupyter notebook (with proper jupyter extensions installed). If status_callback is specified as None, no updates will be rendered. A custom callback function can also be specified which takes (total_files, transformed, downloaded, skipped) as an argument. The total_files parameter may be None until the system knows how many files need to be processed (and some files can even be completed before that is known).

Attributes

dataset_as_name#

Return the dataset name as a string for “human” consumption.

Note that this can be very very long!

Returns:

str: The dataset name formatted as a string

Methods

_abc_impl()#

Internal state held by ABC machinery.

_create_notifier(title: Optional[str], downloading: bool) _status_update_wrapper[source]#

Internal method to create a updater from the status call-back

get_data_awkward(selection_query: str, title: Optional[str] = None) Dict[bytes, Array]#

Fetch query data from ServiceX matching selection_query and return it as dictionary of awkward arrays, an entry for each column. The data is uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:
a Dictionary of jagged arrays (as needed), one for each

column. The dictionary keys are bytes to support possible unicode characters.

abstract async get_data_awkward_async(selection_query: str, title: Optional[str] = None) Dict[bytes, Array][source]#

Fetch query data from ServiceX matching selection_query and return it as dictionary of awkward arrays, an entry for each column. The data is uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:
a Dictionary of jagged arrays (as needed), one for each

column. The dictionary keys are bytes to support possible unicode characters.

get_data_pandas_df(selection_query: str, title: Optional[str] = None) DataFrame#

Fetch query data from ServiceX matching selection_query and return it as a pandas DataFrame. The data is uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

df The pandas DataFrame

Exceptions:
xxx If the data is not the correct shape (e.g. a flat,

rectangular table).

abstract async get_data_pandas_df_async(selection_query: str, title: Optional[str] = None) DataFrame[source]#

Fetch query data from ServiceX matching selection_query and return it as a pandas DataFrame. The data is uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

df The pandas DataFrame

Exceptions:
xxx If the data is not the correct shape (e.g. a flat,

rectangular table).

get_data_parquet(selection_query: str, title: Optional[str] = None) List[Path]#

Fetch query data from ServiceX matching selection_query and return it as a list of parquet files. The files are uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

root_files The list of parquet files

abstract async get_data_parquet_async(selection_query: str, title: Optional[str] = None) List[Path][source]#

Fetch query data from ServiceX matching selection_query and return it as a list of parquet files. The files are uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

root_files The list of parquet files

get_data_rootfiles(selection_query: str, title: Optional[str] = None) List[Path]#

Fetch query data from ServiceX matching selection_query and return it as a list of root files. The files are uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

root_files The list of root files

abstract async get_data_rootfiles_async(selection_query: str, title: Optional[str] = None) List[Path][source]#

Fetch query data from ServiceX matching selection_query and return it as a list of root files. The files are uniquely ordered (the same query will always return the same order).

Arguments:

selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting

Returns:

root_files The list of root files