Skip to content

DataRailsStep

This is a description of the DataRailsSteps class. It is intended to be used as a base class for all steps in a DataRails ETL workflow.

When a child class is created, all methods that are declared with the prefix step_ will be executed in the order that they are declared. It is intended that each step will perform a single task, such as loading a file, or transforming data. The smaller the piece of work the better.

Each step has an attribute called dbx which is an instance of a DataBox. This is a container for all the data that is being processed and will be automatically passed to each step as the execution progresses.

To use this class

from datarails.step import DataRailsStep

Represents a step in a data pipeline process. This class is meant to be inherited from and not used directly. All methods that are declared with the prefix 'step_' in the child class will be run in the order they are declared.

Attributes:

Name Type Description
dbx DataBox

The data box object that stores and handles data for the step.

ctx DataRailsContext

The context object that provides additional data and functionality.

on_entry_callback Callable

The function to call upon entering a step. Default is None.

on_exit_callback Callable

The function to call upon exiting a step. Default is None.

step_method_name_iterator Iterator

An iterator that yields names of step methods.

Source code in datarails/step.py
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
class DataRailsStep(metaclass=_StepMeta):
    """
    Represents a step in a data pipeline process. This class is meant to be inherited from and not used directly.
    All methods that are declared with the prefix 'step_' in the child class will be run in the order they are declared.

    Attributes:
        dbx (DataBox): The data box object that stores and handles data for the step.
        ctx (DataRailsContext): The context object that provides additional data and functionality.
        on_entry_callback (Callable, optional): The function to call upon entering a step. Default is None.
        on_exit_callback (Callable, optional): The function to call upon exiting a step. Default is None.
        step_method_name_iterator (Iterator): An iterator that yields names of step methods.
    """

    def __init__(
        self,
        dbx: DataBox,
        ctx: DataRailsContext,
        on_entry_callback: Optional[Callable] = None,
        on_exit_callback: Optional[Callable] = None,
    ) -> None:
        """
        Constructs all the necessary attributes for the DataRailsStep object.

        Args:
            dbx (DataBox): The data box object that stores and handles data for the step.
            ctx (DataRailsContext): The context object that provides additional data and functionality.
            on_entry_callback (Callable, optional): The function to call upon entering a step. Default is None.
            on_exit_callback (Callable, optional): The function to call upon exiting a step. Default is None.
        """
        self.dbx = dbx
        self.ctx = ctx
        self.on_entry_callback = on_entry_callback
        self.on_exit_callback = on_exit_callback
        self.step_method_name_iterator = None

    def __str__(self) -> str:
        """
        Returns a user-friendly string representation of the instance, in this case, the class name.

        Returns:
            str: The name of the class.
        """
        return self.__class__.__name__

    def reset(self) -> None:
        """
        Resets the iterator that yields names of step methods, allowing the steps to be run from the beginning.
        """
        self.step_method_name_iterator = None

    def run(self) -> DataBox:
        """
        Runs all the steps in the order they were declared in the child class.
        The order is fixed and cannot be changed. Each 'step_' method is called in turn.

        on_entry_callback is called at the beginning, if provided.
        on_exit_callback is called at the end, if provided.

        Returns:
            DataBox: The updated DataBox after all steps have been run.
        """
        if self.on_entry_callback:
            self.on_entry_callback()

        for method_name in self.step_methods:
            method = getattr(self, method_name)
            method()

        if self.on_exit_callback:
            self.on_exit_callback()

        return self.dbx

    def advance(self) -> None:
        """
        Advances to the next 'step_' method in the pipeline, if available.
        This method can be used to control the execution of steps, for example in debugging scenarios.
        """

        if not self.step_method_name_iterator:
            self.step_method_name_iterator = iter(self.step_methods)

        method_name = next(self.step_method_name_iterator, None)

        if method_name:
            method = getattr(self, method_name)
            method()
        else:
            print("All steps have been executed.")

__init__(dbx, ctx, on_entry_callback=None, on_exit_callback=None)

Constructs all the necessary attributes for the DataRailsStep object.

Parameters:

Name Type Description Default
dbx DataBox

The data box object that stores and handles data for the step.

required
ctx DataRailsContext

The context object that provides additional data and functionality.

required
on_entry_callback Callable

The function to call upon entering a step. Default is None.

None
on_exit_callback Callable

The function to call upon exiting a step. Default is None.

None
Source code in datarails/step.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def __init__(
    self,
    dbx: DataBox,
    ctx: DataRailsContext,
    on_entry_callback: Optional[Callable] = None,
    on_exit_callback: Optional[Callable] = None,
) -> None:
    """
    Constructs all the necessary attributes for the DataRailsStep object.

    Args:
        dbx (DataBox): The data box object that stores and handles data for the step.
        ctx (DataRailsContext): The context object that provides additional data and functionality.
        on_entry_callback (Callable, optional): The function to call upon entering a step. Default is None.
        on_exit_callback (Callable, optional): The function to call upon exiting a step. Default is None.
    """
    self.dbx = dbx
    self.ctx = ctx
    self.on_entry_callback = on_entry_callback
    self.on_exit_callback = on_exit_callback
    self.step_method_name_iterator = None

__str__()

Returns a user-friendly string representation of the instance, in this case, the class name.

Returns:

Name Type Description
str str

The name of the class.

Source code in datarails/step.py
53
54
55
56
57
58
59
60
def __str__(self) -> str:
    """
    Returns a user-friendly string representation of the instance, in this case, the class name.

    Returns:
        str: The name of the class.
    """
    return self.__class__.__name__

advance()

Advances to the next 'step_' method in the pipeline, if available. This method can be used to control the execution of steps, for example in debugging scenarios.

Source code in datarails/step.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def advance(self) -> None:
    """
    Advances to the next 'step_' method in the pipeline, if available.
    This method can be used to control the execution of steps, for example in debugging scenarios.
    """

    if not self.step_method_name_iterator:
        self.step_method_name_iterator = iter(self.step_methods)

    method_name = next(self.step_method_name_iterator, None)

    if method_name:
        method = getattr(self, method_name)
        method()
    else:
        print("All steps have been executed.")

reset()

Resets the iterator that yields names of step methods, allowing the steps to be run from the beginning.

Source code in datarails/step.py
62
63
64
65
66
def reset(self) -> None:
    """
    Resets the iterator that yields names of step methods, allowing the steps to be run from the beginning.
    """
    self.step_method_name_iterator = None

run()

Runs all the steps in the order they were declared in the child class. The order is fixed and cannot be changed. Each 'step_' method is called in turn.

on_entry_callback is called at the beginning, if provided. on_exit_callback is called at the end, if provided.

Returns:

Name Type Description
DataBox DataBox

The updated DataBox after all steps have been run.

Source code in datarails/step.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def run(self) -> DataBox:
    """
    Runs all the steps in the order they were declared in the child class.
    The order is fixed and cannot be changed. Each 'step_' method is called in turn.

    on_entry_callback is called at the beginning, if provided.
    on_exit_callback is called at the end, if provided.

    Returns:
        DataBox: The updated DataBox after all steps have been run.
    """
    if self.on_entry_callback:
        self.on_entry_callback()

    for method_name in self.step_methods:
        method = getattr(self, method_name)
        method()

    if self.on_exit_callback:
        self.on_exit_callback()

    return self.dbx