Skip to main content

Schema Management

Schema management in Nexla provides comprehensive tools for defining, evolving, and validating data structures across your data processing workflows, ensuring data consistency, quality, and interoperability throughout your data pipeline.

Schema Overview

Data schemas define the structure, types, and constraints of your data, serving as the foundation for data validation, transformation, and integration across your Nexla platform.

Core Schema Capabilities

The schema management system provides several key capabilities for effective data structure management.

Schema Definition

Define comprehensive data structures:

  • Field Definitions: Specify field names, types, and constraints
  • Data Types: Support for primitive and complex data types
  • Validation Rules: Define data validation and business rules
  • Documentation: Include field descriptions and metadata

Schema Evolution

Manage schema changes over time:

  • Version Control: Track schema versions and changes
  • Backward Compatibility: Ensure compatibility with existing data
  • Migration Support: Support for schema migrations and updates
  • Change Tracking: Track and audit schema modifications

Schema Validation

Ensure data quality and consistency:

  • Data Validation: Validate data against schema definitions
  • Quality Checks: Perform data quality and integrity checks
  • Error Handling: Identify and handle data quality issues
  • Compliance: Ensure compliance with data standards

Schema Types

Nexla supports various schema types for different data processing needs.

Structured Schemas

Traditional structured data schemas:

  • JSON Schema: JSON-based schema definitions
  • Avro Schema: Apache Avro schema format
  • Parquet Schema: Columnar data schema definitions
  • Database Schema: Relational database schema definitions

Flexible Schemas

Schema-less and flexible data structures:

  • Document Schemas: Document-oriented schema definitions
  • Dynamic Schemas: Runtime schema discovery and adaptation
  • Hybrid Schemas: Combination of structured and flexible approaches
  • Schema Inference: Automatic schema detection from data

Schema Components

Understanding the key components of Nexla schemas helps you create effective data structures.

Field Definitions

Core building blocks of schemas:

  • Field Name: Unique identifier for each data field
  • Data Type: Primitive or complex data type specification
  • Constraints: Validation rules and business logic
  • Metadata: Additional field information and documentation

Data Types

Supported data type system:

  • Primitive Types: String, Integer, Float, Boolean, Date
  • Complex Types: Arrays, Objects, Unions, Enums
  • Custom Types: User-defined type definitions
  • Nullable Types: Optional field specifications

Validation Rules

Data quality and business rule enforcement:

  • Type Validation: Ensure data matches specified types
  • Range Validation: Validate numeric and date ranges
  • Pattern Validation: Enforce string patterns and formats
  • Business Rules: Custom validation logic and constraints

Schema Management Operations

Core operations for managing schemas in your Nexla platform.

Create Schemas

Define new data structures:

POST /schemas
Create Schema: Request
{
"name": "Customer Profile Schema",
"description": "Schema for customer profile data",
"version": "1.0.0",
"fields": [
{
"name": "customer_id",
"type": "string",
"required": true,
"description": "Unique customer identifier"
},
{
"name": "first_name",
"type": "string",
"required": true,
"max_length": 50
},
{
"name": "last_name",
"type": "string",
"required": true,
"max_length": 50
},
{
"name": "email",
"type": "string",
"required": true,
"pattern": "^[^@]+@[^@]+\\.[^@]+$"
},
{
"name": "registration_date",
"type": "date",
"required": false
}
],
"metadata": {
"owner": "data_team",
"tags": ["customer", "profile", "identity"]
}
}

List Schemas

Retrieve available schemas:

GET /schemas
List Schemas: Response
{
"schemas": [
{
"id": 5001,
"name": "Customer Profile Schema",
"version": "1.0.0",
"description": "Schema for customer profile data",
"created_at": "2023-01-15T10:00:00.000Z",
"updated_at": "2023-01-15T10:00:00.000Z",
"field_count": 5,
"status": "ACTIVE",
"owner": {
"id": 42,
"name": "John Doe"
}
},
{
"id": 5002,
"name": "Order Schema",
"version": "2.1.0",
"description": "Schema for order data",
"created_at": "2023-01-14T15:30:00.000Z",
"updated_at": "2023-01-16T09:15:00.000Z",
"field_count": 8,
"status": "ACTIVE",
"owner": {
"id": 42,
"name": "John Doe"
}
}
],
"pagination": {
"total": 2,
"page": 1,
"per_page": 20
}
}

Update Schemas

Modify existing schema definitions:

PUT /schemas/{schema_id}
Update Schema: Request
{
"version": "1.1.0",
"description": "Updated customer profile schema with additional fields",
"fields": [
{
"name": "customer_id",
"type": "string",
"required": true,
"description": "Unique customer identifier"
},
{
"name": "first_name",
"type": "string",
"required": true,
"max_length": 50
},
{
"name": "last_name",
"type": "string",
"required": true,
"max_length": 50
},
{
"name": "email",
"type": "string",
"required": true,
"pattern": "^[^@]+@[^@]+\\.[^@]+$"
},
{
"name": "registration_date",
"type": "date",
"required": false
},
{
"name": "phone_number",
"type": "string",
"required": false,
"pattern": "^\\+?[1-9]\\d{1,14}$"
},
{
"name": "preferences",
"type": "object",
"required": false,
"properties": {
"newsletter": {"type": "boolean"},
"marketing": {"type": "boolean"}
}
}
]
}

Schema Validation

Validate data against schema definitions to ensure data quality and consistency.

Validate Data

Check data compliance with schemas:

POST /schemas/{schema_id}/validate
Validate Data: Request
{
"data": {
"customer_id": "CUST-001",
"first_name": "Jane",
"last_name": "Smith",
"email": "jane.smith@example.com",
"registration_date": "2023-01-15",
"phone_number": "+1-555-123-4567",
"preferences": {
"newsletter": true,
"marketing": false
}
},
"validation_options": {
"strict_mode": true,
"include_details": true
}
}

Validation Response

Comprehensive validation results:

Validation Response
{
"valid": true,
"validation_summary": {
"total_fields": 7,
"valid_fields": 7,
"invalid_fields": 0,
"warnings": 0
},
"field_validations": [
{
"field": "customer_id",
"valid": true,
"value": "CUST-001"
},
{
"field": "first_name",
"valid": true,
"value": "Jane"
},
{
"field": "last_name",
"valid": true,
"value": "Smith"
},
{
"field": "email",
"valid": true,
"value": "jane.smith@example.com"
},
{
"field": "registration_date",
"valid": true,
"value": "2023-01-15"
},
{
"field": "phone_number",
"valid": true,
"value": "+1-555-123-4567"
},
{
"field": "preferences",
"valid": true,
"value": {
"newsletter": true,
"marketing": false
}
}
]
}

Schema Evolution

Manage schema changes and versioning to support data growth and requirements changes.

Version Management

Track schema versions and changes:

  • Version Numbering: Semantic versioning for schemas
  • Change Tracking: Track modifications and updates
  • Migration Support: Support for schema migrations
  • Rollback Capability: Ability to revert to previous versions

Backward Compatibility

Ensure compatibility with existing data:

  • Field Addition: Add new fields without breaking existing data
  • Field Deprecation: Mark fields as deprecated
  • Type Evolution: Evolve field types safely
  • Default Values: Provide defaults for new fields

Schema Migration

Support for data structure changes:

  • Data Transformation: Transform data to new schemas
  • Migration Scripts: Automated migration procedures
  • Validation: Validate migrated data
  • Rollback: Support for migration rollbacks

Schema Integration

Integrate schemas with other Nexla components for comprehensive data management.

Nexset Integration

Use schemas with Nexsets:

  • Schema Binding: Bind schemas to Nexsets
  • Data Validation: Validate Nexset data against schemas
  • Schema Evolution: Evolve schemas with Nexsets
  • Quality Assurance: Ensure data quality through schemas

Transform Integration

Apply schemas in data transformations:

  • Schema Enforcement: Enforce schemas during transformations
  • Type Conversion: Convert data types according to schemas
  • Validation: Validate transformed data
  • Error Handling: Handle schema validation errors

Flow Integration

Integrate schemas with data flows:

  • Source Validation: Validate source data against schemas
  • Destination Schemas: Apply schemas to destination data
  • Flow Validation: Validate data throughout flows
  • Schema Propagation: Propagate schemas across flows

Schema Best Practices

To effectively manage schemas in your Nexla platform:

  1. Design for Evolution: Create schemas that can evolve over time
  2. Use Semantic Versioning: Implement proper versioning for schemas
  3. Document Thoroughly: Provide comprehensive field documentation
  4. Validate Early: Validate data as early as possible in the pipeline
  5. Monitor Changes: Track and monitor schema changes and their impact

Schema Workflows

Implement structured workflows for effective schema management.

Schema Creation Workflow

Standard workflow for creating new schemas:

  1. Requirements Analysis: Analyze data structure requirements
  2. Schema Design: Design comprehensive schema structure
  3. Field Definition: Define fields, types, and constraints
  4. Validation Rules: Implement validation and business rules
  5. Testing: Test schema with sample data
  6. Documentation: Document schema structure and usage

Schema Evolution Workflow

Workflow for evolving existing schemas:

  1. Change Analysis: Analyze required schema changes
  2. Impact Assessment: Assess impact on existing data and systems
  3. Version Planning: Plan new schema version
  4. Migration Planning: Plan data migration strategy
  5. Implementation: Implement schema changes
  6. Testing: Test new schema and migration
  7. Deployment: Deploy new schema version

Error Handling

Common schema management issues and solutions:

  • Validation Failures: Review data and schema compatibility
  • Version Conflicts: Resolve schema version conflicts
  • Migration Issues: Address data migration problems
  • Performance Issues: Optimize schema validation performance

After managing schemas, you may need to:

Validate Data

POST /schemas/{schema_id}/validate
POST /data/validate

Manage Nexsets

GET /nexsets
PUT /nexsets/{nexset_id}

Monitor Quality

GET /data/quality
GET /schemas/{schema_id}/usage