DataRailsStep
This is a description of the DataRailsSteps
class. It is intended to be used as a base class for all steps in a DataRails ETL workflow.
When a child class is created, all methods that are declared with the prefix step_
will be executed in the order that they are declared.
It is intended that each step will perform a single task, such as loading a file, or transforming data. The smaller the piece of work the better.
Each step has an attribute called dbx
which is an instance of a DataBox
. This is a container for all the data that is being processed and will
be automatically passed to each step as the execution progresses.
To use this class
from datarails.step import DataRailsStep
Represents a step in a data pipeline process. This class is meant to be inherited from and not used directly. All methods that are declared with the prefix 'step_' in the child class will be run in the order they are declared.
Attributes:
Name | Type | Description |
---|---|---|
dbx |
DataBox
|
The data box object that stores and handles data for the step. |
ctx |
DataRailsContext
|
The context object that provides additional data and functionality. |
on_entry_callback |
Callable
|
The function to call upon entering a step. Default is None. |
on_exit_callback |
Callable
|
The function to call upon exiting a step. Default is None. |
step_method_name_iterator |
Iterator
|
An iterator that yields names of step methods. |
Source code in datarails/step.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
__init__(dbx, ctx, on_entry_callback=None, on_exit_callback=None)
Constructs all the necessary attributes for the DataRailsStep object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dbx |
DataBox
|
The data box object that stores and handles data for the step. |
required |
ctx |
DataRailsContext
|
The context object that provides additional data and functionality. |
required |
on_entry_callback |
Callable
|
The function to call upon entering a step. Default is None. |
None
|
on_exit_callback |
Callable
|
The function to call upon exiting a step. Default is None. |
None
|
Source code in datarails/step.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
__str__()
Returns a user-friendly string representation of the instance, in this case, the class name.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The name of the class. |
Source code in datarails/step.py
53 54 55 56 57 58 59 60 |
|
advance()
Advances to the next 'step_' method in the pipeline, if available. This method can be used to control the execution of steps, for example in debugging scenarios.
Source code in datarails/step.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
reset()
Resets the iterator that yields names of step methods, allowing the steps to be run from the beginning.
Source code in datarails/step.py
62 63 64 65 66 |
|
run()
Runs all the steps in the order they were declared in the child class. The order is fixed and cannot be changed. Each 'step_' method is called in turn.
on_entry_callback is called at the beginning, if provided. on_exit_callback is called at the end, if provided.
Returns:
Name | Type | Description |
---|---|---|
DataBox |
DataBox
|
The updated DataBox after all steps have been run. |
Source code in datarails/step.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|