Chapter 43. Subgraphs Overview
Subgraphs Introduction |
Design & Execution |
Subgraphs vs. Jobflow |
Subgraphs Introduction
What is Subgraph
Subgraph is a user-defined reusable component with logic implemented as graph instead of Java code.
Subgraph definition is a regular graph and may use any graph elements (components, connections, lookups, sequences or parameters).
Subgraphs can be nested; a subgraph definition may use other subgraphs.
Subgraph definition is stored in a separate file with *.sgrf
extension.
In default CloverDX project layout,
a directory ${PROJECT}/graph/subgraph
is created for storing subgraph files.
You can reference this directory via the ${SUBGRAPH_DIR}
parameter.
Use the Subgraph component to reference a subgraph in a regular graph. Once configured with a subgraph file, the Subgraph component automatically updates its ports according to ports from subgraph definition.
What are Subgraphs Good for?
Simplifying Complex Transformation Logic
Use subgraphs to visually reduce the number of component in complex graphs and highlight important processing logic.
Creating Reusable Blocks of Logic
Subgraphs allow developing prefabricated blocks of logic that can be used by other members of development team. This approach to transformation development promotes reusability and standardization.
Creating Connectors
Subgraphs provide an easy way to create new connectors from webservices or databases. Webservices communicate over HTTP protocol and provide data in JSON or XML format that needs to be preprocessed before use in transformation logic. Subgraphs can hide the parsing logic and provide data in easy-to-consume format.
Similarly for databases with complex relational structure, the DBAs can develop tuned-up queries for accessing data via optimized views and indices then publish the queries in the form of subgraphs as easy-to-use connectors to common data entities.
Design & Execution
Create a body of subgraph in the same way as an ordinary graph. You can use the same components, structure and overall approach.
Use connections, lookup tables, dictionary, etc. All these features are available in the subgraphs as well as in the graph.
Define an input and output interface. The interface - input and output ports of subgraphs component - is defined by components SubgraphInput and SubgraphOutput.
Launch as a single unit or from the graph. Subgraph can be launched as a standalone graph or as a component from a parent graph.
Anatomy of Subgraphs
Graph defining a subgraph contains the following sections:
Figure 43.1. Subgraph Layout
SubgraphInput
Represents inputs of Subgraph
Each Subgraph contains exactly one instance of SubgraphInput component
The number of its output ports define the number of subgraph’s inputs
SubgraphOutput
Represents outputs of Subgraph
- Subgraph contains exactly one instance of the SubgraphOutput component
Number of its input ports define the number of subgraph’s outputs
Body of Subgraph
Contains implementation of subgraph logic
Subgraph body can contain components (e.g. Reader) not connected to SubgraphInput or SubgraphOutput to access external data sources or static data sets.
Body of a subgraph may contain multiple phases and define component allocation for execution control. Phases and allocation are applied separately from a parent graph. For phases, this means that as a subgraph is started in a phase of its parent graph, then the subgraph's first phase runs, then second, third, etc. After all phases of the subgraph finish, it's considered finished by the parent graph and the next phase of the parent graph can start.
Components in a subgraph body can use own connections, lookups, metadata and parameters.
Debug Inputs
Any components connected to input ports of the SubgraphInput component.
Can be used to generate test data when developing and testing subgraph logic.
Components in debug input section will be automatically disabled when a subgraph is executed from a parent graph, this is visualized by graying out these components.
Debug Outputs
Any components connected to an output port of the SubgraphOutput component, or with higher phase than SubgraphOutput.
Can be used to inspect and store test data when developing and testing a subgraph.
Components in the debug output section will be automatically disabled when subgraph is executed from a parent graph, this is visualized by graying out these components.
Figure 43.2. Example of subgraph with multiple output ports
Subgraphs vs. Jobflow
While both subgraphs and jobflow provide a way of creating reusable processing logic, they serve different purposes.
Subgraphs behave the same as other built-in components; they stream data to a parent graph. When used in a graph, they execute in parallel with other components running in the graph.
Use subgraph when you need to create a new component that should be used in ETL processing and exchange large amounts of data with other components.
Jobflow in its nature provides step-by-step sequential processing. Individual steps in jobflow do not exchange large amounts of data, instead they pass status and configuration parameters to each other.
If you need to create logic that should be executed as one of several processing steps or you want to react to a job status after its execution, create a graph and call it from a jobflow via ExecuteGraph.
Note | |
---|---|
Unlike jobflow, graphs and subgraphs cannot contain cycles; thus, subgraphs cannot be called recursively. |