ServiceXABC#
- class servicex.servicexabc.ServiceXABC(dataset: ~typing.Union[str, ~typing.Iterable[str]], image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>)[source]#
Bases:
ABC
Abstract base class for accessing the ServiceX front-end for a particular dataset. This does have some implementations, but not a full set (hence why it isn’t an ABC).
A light weight, mostly immutable, base class that holds basic configuration information for use with ServiceX file access, including the dataset name. Subclasses implement the various access methods. Note that not all methods may be accessible!
- __init__(dataset: ~typing.Union[str, ~typing.Iterable[str]], image: ~typing.Optional[str] = None, max_workers: int = 20, result_destination: str = 'object-store', status_callback_factory: ~typing.Optional[~typing.Callable[[~typing.Union[str, ~typing.Iterable[str]], ~typing.Optional[str], bool], ~typing.Callable[[~typing.Optional[int], int, int, int], None]]] = <function _run_default_wrapper>)[source]#
Create and configure a ServiceX object for a dataset.
Arguments
dataset Name of a dataset from which queries will be selected. image Name of transformer image to use to transform the data. If
None the default implementation is used.
- cache_adaptor Runs the caching for data and queries that are sent up and
down.
- max_workers Maximum number of transformers to run simultaneously on
ServiceX.
- result_destination Where the transformers should write the results.
Defaults to object-store, but could be used to save results to a posix volume
cache_path Path to the cache status_callback_factory Factory to create a status notification callback for each
query. One is created per query.
Notes:
The status_callback argument, by default, uses the tqdm library to render progress bars in a terminal window or a graphic in a Jupyter notebook (with proper jupyter extensions installed). If status_callback is specified as None, no updates will be rendered. A custom callback function can also be specified which takes (total_files, transformed, downloaded, skipped) as an argument. The total_files parameter may be None until the system knows how many files need to be processed (and some files can even be completed before that is known).
Attributes
- dataset_as_name#
Return the dataset name as a string for “human” consumption.
Note that this can be very very long!
- Returns:
str: The dataset name formatted as a string
Methods
- _abc_impl()#
Internal state held by ABC machinery.
- _create_notifier(title: Optional[str], downloading: bool) _status_update_wrapper [source]#
Internal method to create a updater from the status call-back
- get_data_awkward(selection_query: str, title: Optional[str] = None) Dict[bytes, Array] #
Fetch query data from ServiceX matching selection_query and return it as dictionary of awkward arrays, an entry for each column. The data is uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
- a Dictionary of jagged arrays (as needed), one for each
column. The dictionary keys are bytes to support possible unicode characters.
- abstract async get_data_awkward_async(selection_query: str, title: Optional[str] = None) Dict[bytes, Array] [source]#
Fetch query data from ServiceX matching selection_query and return it as dictionary of awkward arrays, an entry for each column. The data is uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
- a Dictionary of jagged arrays (as needed), one for each
column. The dictionary keys are bytes to support possible unicode characters.
- get_data_pandas_df(selection_query: str, title: Optional[str] = None) DataFrame #
Fetch query data from ServiceX matching selection_query and return it as a pandas DataFrame. The data is uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
df The pandas DataFrame
- Exceptions:
- xxx If the data is not the correct shape (e.g. a flat,
rectangular table).
- abstract async get_data_pandas_df_async(selection_query: str, title: Optional[str] = None) DataFrame [source]#
Fetch query data from ServiceX matching selection_query and return it as a pandas DataFrame. The data is uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
df The pandas DataFrame
- Exceptions:
- xxx If the data is not the correct shape (e.g. a flat,
rectangular table).
- get_data_parquet(selection_query: str, title: Optional[str] = None) List[Path] #
Fetch query data from ServiceX matching selection_query and return it as a list of parquet files. The files are uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
root_files The list of parquet files
- abstract async get_data_parquet_async(selection_query: str, title: Optional[str] = None) List[Path] [source]#
Fetch query data from ServiceX matching selection_query and return it as a list of parquet files. The files are uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
root_files The list of parquet files
- get_data_rootfiles(selection_query: str, title: Optional[str] = None) List[Path] #
Fetch query data from ServiceX matching selection_query and return it as a list of root files. The files are uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
root_files The list of root files
- abstract async get_data_rootfiles_async(selection_query: str, title: Optional[str] = None) List[Path] [source]#
Fetch query data from ServiceX matching selection_query and return it as a list of root files. The files are uniquely ordered (the same query will always return the same order).
- Arguments:
selection_query The qastle string specifying the data to be queried title Title reported to the ServiceX backend for status reporting
- Returns:
root_files The list of root files