In CloverDX, different job types are designed to handle various data processing and integration tasks. This section provides an overview of the main job types, including Graphs, Subgraphs, Jobflows, and Data Services. Each job type serves a specific purpose and can be configured to address different aspects of data transformation, workflow orchestration, and data access. Understanding these job types will help you effectively design and manage your data processing solutions.
-
A Graph is the fundamental building block of data transformations in CloverDX. It visually represents the flow of data from one or more sources through various processing components to one or more destinations. Graphs allow you to design data processing workflows visually, making it easier to understand and modify data transformations. Graphs can be as simple or as complex as needed, depending on the requirements of the data processing task. Each graph consists of:
-
Components: These are the individual building blocks that perform specific tasks, such as reading data, transforming it, or writing it to a destination. Components can be connected via edges, which represent the flow of data between them.
-
Edges: These define the pathways through which data moves between components within a graph. Edges can carry data records, metadata, or other relevant information as defined by the graph’s structure.
-
Metadata: Metadata defines the structure and data types of the records being processed. It ensures that data is handled consistently and correctly as it moves through the components.
Graphs can also include other optional components. For a list of available components in CloverDX, see Job elements.
-
-
A Subgraph is a reusable, modular component that encapsulates a portion of a data transformation or processing logic within a larger graph. Essentially, a subgraph is a graph within a graph, allowing you to break down complex jobs into smaller, more manageable units. In practice, when you create a subgraph, you define it like a regular graph, with inputs and outputs, and then integrate it into other graphs where needed. This allows for streamlined data processing workflows, especially in complex projects. Here are the key features and benefits of using subgraphs:
-
Reusability: Subgraphs can be used across multiple graphs, enabling you to create common processing routines that can be applied to different jobs.
-
Modularity: By encapsulating specific functionalities into subgraphs, you can develop, test, and maintain different parts of your data processing logic independently.
-
Simplification: Subgraphs help simplify the main graph by abstracting complex operations into a single component, making the overall structure easier to understand and manage.
-
Parameterization: Subgraphs can accept parameters, allowing them to be customized for different contexts while still maintaining a single implementation.
-
Consistency: Using subgraphs ensures that the same logic is applied consistently across different jobs, reducing the risk of errors and inconsistencies.
Subgraphs can be included in CloverDX libraries to create reusable packages that are easy to share. They can also be used to develop custom Data Source or Data Target connectors for CloverDX Wrangler. For more information on building custom CloverDX libraries, refer here.
-
-
A Jobflow is a higher-level workflow that orchestrates the execution of multiple graphs, scripts, or other jobflows. Jobflows are used to manage and control complex processes that involve multiple steps or conditional logic. They are essential for managing complex data processing scenarios that go beyond the capabilities of a single graph, allowing for more sophisticated automation and orchestration. Key features of jobflows include:
-
Control flow: Jobflows can control the sequence of execution, including branching based on conditions, looping, and error handling, making them ideal for automating end-to-end processes.
-
Job dependencies: Jobflows can define dependencies between different graphs or tasks, ensuring that they are executed in the correct order or based on specific criteria.
-
Parallel execution: Jobflows can execute multiple tasks in parallel, optimizing performance and reducing the overall processing time.
-
Integration with external systems: Jobflows can trigger external systems or respond to external events, making them suitable for integrating CloverDX into broader enterprise workflows.
-
-
A Data Service is a real-time service that exposes CloverDX graphs or jobflows as REST web services. Data services are ideal for scenarios where real-time data processing is required, such as API integrations, real-time data analytics, or on-the-fly data transformations. They enable CloverDX to be a dynamic part of a larger service-oriented architecture. Data services allow external applications to interact with CloverDX processes, enabling the following capabilities:
-
Real-time data processing: Data services enable on-demand data transformations, where data is processed in real-time as requests are received from external systems.
-
Integration: Data services provide a way to integrate CloverDX with other applications, allowing them to request data processing, access transformed data, or trigger specific workflows via web service calls.
-
Parameterization: Data services can accept input parameters, allowing external clients to customize the behavior of the underlying graphs or jobflows based on their needs.
-
Security: CloverDX data services support various security features, including authentication and authorization, to ensure that only authorized users or systems can access the service.
-