Rate Limiting and Throttling

Understanding Nexla's rate limiting and throttling policies is essential for building robust applications that respect platform limits and handle resource constraints gracefully.

Overview

Rate limiting and throttling are essential mechanisms for maintaining platform stability and ensuring fair resource usage. This section explains Nexla's current policies and how to implement effective rate limiting in your applications.

Rate limiting and throttling in Nexla help ensure several important platform characteristics:

Platform Stability: Prevent resource exhaustion from excessive requests
Fair Usage: Ensure equitable access for all users
Security: Protect against abuse and automated attacks
Performance: Maintain consistent API response times

Current Rate Limiting Policy

Understanding Nexla's current rate limiting approach helps you plan your application architecture and prepare for future changes. This section covers the current policy status and what it means for your applications.

Policy Status

Currently, Nexla does not enforce strict rate limits on API usage. However, the platform monitors API usage patterns and collects data to inform future rate limiting policies.

The current policy has several important implications for your applications:

No Hard Limits: You can make API requests without hitting predefined rate limits
Monitoring Active: Nexla tracks usage patterns for analysis
Future Implementation: Rate limiting will be implemented based on collected data
Responsible Usage: While not enforced, responsible usage is encouraged

Usage Monitoring

Nexla actively monitors API usage patterns to understand user behavior and inform future rate limiting policies. This data collection helps ensure that when rate limiting is implemented, it will be based on real usage patterns rather than arbitrary limits.

Nexla collects comprehensive information about API usage to inform future policies:

Request Volume: Number of requests per user/organization over time periods
Request Patterns: Timing and frequency of API calls, including peak usage hours
Resource Usage: Which endpoints are accessed most frequently and their impact on system resources
User Behavior: Normal vs. unusual usage patterns, including automated vs. manual request patterns
Performance Metrics: Response times and error rates associated with different usage levels
Geographic Distribution: Request patterns across different regions and time zones

Throttling Mechanisms

Throttling mechanisms provide Nexla with the ability to control excessive usage and maintain platform stability. This section explains when and how throttling is applied, and how to work with throttled resources.

When Throttling Occurs

For users or organizations that exceed acceptable usage patterns, Nexla may implement throttling measures. Throttling is typically applied in response to several types of problematic behavior:

Excessive Requests: Unusually high request volumes
Resource Abuse: Patterns that suggest automated abuse
Service Impact: Behavior that affects platform performance
Security Concerns: Suspicious or malicious activity patterns

Throttling Implementation

Throttling is implemented as a temporary restriction that limits the affected resource to 1 request per second until the specified time. This approach provides immediate control over problematic usage while allowing for automatic recovery once the throttling period expires.

Throttling characteristics:

Rate limit: 1 request per second (strict enforcement)
Duration: Configurable via throttle_until parameter
Scope: Can be applied to individual users or entire organizations
Automatic recovery: Throttling automatically expires at the specified time
Immediate effect: Throttling takes effect immediately upon application

User Throttling

Administrators can throttle individual users when their API usage patterns indicate problematic behavior. This targeted approach allows for precise control over specific users without affecting other users in the organization.

Use cases: Individual user abuse, excessive API usage, suspicious activity patterns Impact: Only the specified user is affected, other users continue normal operation Duration: Configurable via the throttle_until parameter

User Throttling
curl -X PUT <nexla-api-endpoint>/users/<user-id>/throttle \
  -H "Authorization: Bearer <admin-access-token>" \
  -H "Content-Type: application/json" \
  -d '{"throttle_until": "2024-01-15T23:59:59.000Z"}'

Parameters:

throttle_until: ISO 8601 timestamp when throttling should end (format: YYYY-MM-DDTHH:MM:SS.sssZ)
user-id: ID of the user to throttle

Organization Throttling

Administrators can throttle entire organizations when the organization's collective API usage exceeds acceptable limits or when there are widespread issues across multiple users. This broader approach affects all users within the organization.

Use cases: Organization-wide excessive usage, coordinated abuse, system-wide performance issues Impact: All users within the organization are affected, regardless of individual behavior Duration: Configurable via the throttle_until parameter

Organization Throttling
curl -X PUT <nexla-api-endpoint>/orgs/<org-id>/throttle \
  -H "Authorization: Bearer <admin-access-token>" \
  -H "Content-Type: application/json" \
  -d '{"throttle_until": "2024-01-15T23:59:59.000Z"}'

Parameters:

throttle_until: ISO 8601 timestamp when throttling should end (format: YYYY-MM-DDTHH:MM:SS.sssZ)
org-id: ID of the organization to throttle

Throttling Response

Throttling endpoints provide clear feedback about the success or failure of the throttling operation. Understanding these response codes helps administrators troubleshoot issues and verify that throttling has been applied correctly.

Success responses:

200 OK: Throttling successfully applied to the specified resource

Error responses:

400 Bad Request: Invalid parameters (e.g., malformed timestamp format, missing required fields)
401 Unauthorized: Insufficient administrative permissions to apply throttling
404 Not Found: User or organization not found in the system

Rate Limit Configuration

Administrators can configure rate limits for users and organizations to control resource usage. This section covers how to set and manage rate limits, including the different categories and their configuration options.

Setting Rate Limits

Administrators can configure rate limits for users and organizations using the rate limits API endpoints. These endpoints support partial updates, allowing you to modify only specific rate limit categories without affecting others.

Key features:

Partial updates: Only specified categories are updated, others remain unchanged
Flexible configuration: Each category can be set independently
Immediate application: Changes take effect immediately upon successful request
Administrative access: Requires admin-level permissions to configure

Set User Rate Limits
curl -X PUT <nexla-api-endpoint>/users/<user-id>/rate_limits \
  -H "Authorization: Bearer <admin-access-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limiting": {
      "light": 100,
      "medium": 50,
      "high": 25,
      "common": 200
    }
  }'

Rate Limit Categories

Rate limits are organized into categories based on the impact and frequency of operations. This categorization allows for granular control over different types of API usage, ensuring that resource-intensive operations are properly managed while allowing efficient access to common operations.

Category definitions:

light: Low-impact operations (e.g., read-only queries, metadata retrieval)
medium: Moderate-impact operations (e.g., data updates, single-record operations)
high: High-impact operations (e.g., bulk operations, deletions, data transformations)
common: Frequently used operations (e.g., authentication, status checks, health monitoring)

Configuration flexibility: Each category can be configured independently, allowing administrators to set appropriate limits based on their specific use cases and resource constraints.

Rate Limit Values

Rate limit values provide specific constraints on API usage and can be customized to meet different organizational needs. These values represent the maximum number of requests allowed within a one-minute window for each category.

Value characteristics:

Requests per minute: Maximum number of requests allowed within a 60-second window
Configurable ranges: Values can be adjusted from very low (e.g., 10) to very high (e.g., 1000+) based on user needs
Partial updates: Only specified categories are updated, preserving existing values for unchanged categories
Immediate enforcement: New values take effect immediately upon successful configuration

Typical value ranges:

Light operations: 50-500 requests per minute
Medium operations: 25-250 requests per minute
High operations: 10-100 requests per minute
Common operations: 100-1000+ requests per minute

Setting Organization Rate Limits

Organization rate limits provide broader control over API usage across all users within an organization. These limits are typically higher than individual user limits to accommodate multiple users and organizational workflows.

Organization considerations:

Higher limits: Organization limits are typically 5-10x higher than individual user limits
Shared resources: All users within the organization share these rate limits
Administrative control: Only organization administrators can modify these settings
Cumulative usage: Limits apply to the total usage across all organization users

Set Organization Rate Limits
curl -X PUT <nexla-api-endpoint>/orgs/<org-id>/rate_limits \
  -H "Authorization: Bearer <admin-access-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limiting": {
      "light": 500,
      "medium": 250,
      "high": 100,
      "common": 1000
    }
  }'

Client-Side Rate Limiting

Implementing client-side rate limiting is a best practice that helps your applications work efficiently within platform constraints. This section covers implementation strategies, adaptive rate limiting, and best practices for client-side rate limiting.

Implementing Rate Limiting

While Nexla doesn't currently enforce rate limits, implementing client-side rate limiting is a best practice that prepares your applications for future rate limiting policies and helps maintain good API citizenship. This proactive approach ensures your applications will continue to work effectively when rate limiting is implemented.

Benefits of client-side rate limiting:

Future-proofing: Applications are ready when rate limiting is enforced
API citizenship: Demonstrates responsible usage patterns
Performance optimization: Prevents overwhelming the API with requests
Error reduction: Minimizes 429 errors and retry logic complexity

Python Client-Side Rate Limiting
import time
from collections import deque
from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, max_requests: int, time_window: int):
        self.max_requests = max_requests
        self.time_window = time_window  # seconds
        self.requests = deque()
        
    def can_make_request(self) -> bool:
        """Check if a request can be made"""
        now = datetime.now()
        
        # Remove expired requests
        while self.requests and (now - self.requests[0]) > timedelta(seconds=self.time_window):
            self.requests.popleft()
            
        return len(self.requests) < self.max_requests
        
    def record_request(self):
        """Record a request"""
        self.requests.append(datetime.now())
        
    def wait_if_needed(self):
        """Wait if rate limit is exceeded"""
        while not self.can_make_request():
            time.sleep(1)  # Wait 1 second before checking again

class NexlaClient:
    def __init__(self, max_requests_per_minute: int = 60):
        self.rate_limiter = RateLimiter(max_requests_per_minute, 60)
        
    def make_request(self, endpoint, **kwargs):
        """Make a rate-limited API request"""
        self.rate_limiter.wait_if_needed()
        
        try:
            response = self._make_actual_request(endpoint, **kwargs)
            self.rate_limiter.record_request()
            return response
        except Exception as e:
            # Don't record failed requests in rate limiting
            raise e

Adaptive Rate Limiting

Implement adaptive rate limiting based on response patterns to dynamically adjust request rates based on API performance and success rates. This intelligent approach optimizes API usage by increasing rates when the API is performing well and decreasing rates when issues are detected.

Adaptive rate limiting features:

Success-based scaling: Increase rates when requests are consistently successful
Failure-based backoff: Decrease rates when errors are detected
Performance optimization: Automatically find the optimal request rate
Self-healing: Recover from temporary API issues automatically

Python Adaptive Rate Limiting
class AdaptiveRateLimiter:
    def __init__(self, initial_rate: int, min_rate: int, max_rate: int):
        self.current_rate = initial_rate
        self.min_rate = min_rate
        self.max_rate = max_rate
        self.success_count = 0
        self.failure_count = 0
        
    def adjust_rate(self, success: bool):
        """Adjust rate based on success/failure"""
        if success:
            self.success_count += 1
            self.failure_count = 0
            
            # Increase rate if consistently successful
            if self.success_count >= 10:
                self.current_rate = min(self.current_rate * 1.1, self.max_rate)
                self.success_count = 0
        else:
            self.failure_count += 1
            self.success_count = 0
            
            # Decrease rate on failures
            if self.failure_count >= 3:
                self.current_rate = max(self.current_rate * 0.8, self.min_rate)
                self.failure_count = 0
                
    def get_current_rate(self) -> int:
        """Get current rate limit"""
        return int(self.current_rate)

Best Practices

Following best practices ensures that your applications work efficiently within rate limiting constraints and maintain good performance. This section covers request management, error handling, and development practices for effective rate limiting.

Request Management

Effective request management helps you work efficiently within rate limiting constraints by optimizing how and when you make API requests. This approach minimizes the number of requests while maximizing the value of each request.

Key strategies:

Batch Operations: Combine multiple operations when possible to reduce request overhead
Efficient Queries: Use pagination and filtering to reduce request volume and transfer only necessary data
Caching: Implement client-side caching for frequently accessed data to avoid redundant requests
Asynchronous Processing: Use webhooks and callbacks when available to reduce polling requirements
Request Optimization: Design your API calls to return maximum useful data per request
Connection Reuse: Maintain persistent connections to reduce connection overhead

Error Handling

Robust error handling ensures your applications can recover from rate limiting issues gracefully and maintain service availability even when encountering API constraints.

Critical components:

Retry Logic: Implement exponential backoff for failed requests to avoid overwhelming the API
Circuit Breakers: Stop making requests when errors exceed thresholds to prevent cascading failures
Fallback Strategies: Have alternative approaches when rate limits are hit (e.g., cached data, degraded functionality)
Monitoring: Track request success rates and adjust accordingly to maintain optimal performance
Graceful Degradation: Provide reduced functionality when rate limits prevent full operation
User Feedback: Inform users when rate limiting affects service availability

Development Practices

Good development practices help you build applications that work well with rate limiting from the start and can adapt to changing rate limiting policies over time.

Essential practices:

Testing: Test with realistic request volumes to ensure your application handles expected load
Monitoring: Monitor request patterns in development to identify optimization opportunities
Documentation: Document expected request volumes and patterns for future reference
Planning: Plan for future rate limiting implementation to avoid major refactoring
Code Reviews: Include rate limiting considerations in code reviews
Performance Testing: Regularly test application performance under various rate limiting scenarios

Monitoring and Alerts

Effective monitoring and alerting help you track your API usage patterns and respond to issues proactively. This section covers request monitoring, usage alerts, and how to set up comprehensive monitoring for your rate limiting implementation.

Request Monitoring

Comprehensive request monitoring is essential for understanding your API usage patterns and identifying potential issues before they become problems. This monitoring helps you optimize your application's API usage and prepare for future rate limiting policies.

Key monitoring aspects:

Request volume tracking: Monitor total requests over time periods
Response time analysis: Track API performance and identify slow endpoints
Error rate monitoring: Identify patterns in failed requests
Usage pattern analysis: Understand when and how your application uses the API
Resource consumption: Track which endpoints consume the most resources

Python Request Monitoring
import logging
from datetime import datetime
from collections import defaultdict

class RequestMonitor:
    def __init__(self):
        self.logger = logging.getLogger('requests')
        self.request_count = 0
        self.start_time = datetime.now()
        self.endpoint_stats = defaultdict(lambda: {'count': 0, 'errors': 0, 'total_time': 0})
        
    def log_request(self, endpoint: str, status_code: int, response_time: float):
        """Log detailed request information"""
        self.request_count += 1
        
        # Update endpoint statistics
        self.endpoint_stats[endpoint]['count'] += 1
        self.endpoint_stats[endpoint]['total_time'] += response_time
        
        if status_code >= 400:
            self.endpoint_stats[endpoint]['errors'] += 1
        
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'endpoint': endpoint,
            'status_code': status_code,
            'response_time': response_time,
            'total_requests': self.request_count,
            'endpoint_total': self.endpoint_stats[endpoint]['count']
        }
        
        self.logger.info(f"API Request: {log_entry}")
        
    def get_request_rate(self) -> float:
        """Calculate current request rate (requests per minute)"""
        elapsed = (datetime.now() - self.start_time).total_seconds() / 60
        return self.request_count / elapsed if elapsed > 0 else 0
        
    def get_endpoint_analysis(self):
        """Analyze endpoint usage patterns"""
        analysis = {}
        for endpoint, stats in self.endpoint_stats.items():
            if stats['count'] > 0:
                analysis[endpoint] = {
                    'total_requests': stats['count'],
                    'error_rate': stats['errors'] / stats['count'],
                    'avg_response_time': stats['total_time'] / stats['count'],
                    'percentage_of_total': (stats['count'] / self.request_count) * 100
                }
        return analysis
        
    def check_usage_patterns(self):
        """Check for unusual usage patterns and provide recommendations"""
        current_rate = self.get_request_rate()
        endpoint_analysis = self.get_endpoint_analysis()
        
        # Rate-based warnings
        if current_rate > 100:  # More than 100 requests per minute
            self.logger.warning(f"High request rate detected: {current_rate:.2f} req/min")
            
        if current_rate > 500:  # More than 500 requests per minute
            self.logger.error(f"Excessive request rate detected: {current_rate:.2f} req/min")
            
        # Endpoint-specific analysis
        for endpoint, analysis in endpoint_analysis.items():
            if analysis['error_rate'] > 0.1:  # More than 10% error rate
                self.logger.warning(f"High error rate for {endpoint}: {analysis['error_rate']:.2%}")
                
            if analysis['avg_response_time'] > 2.0:  # More than 2 seconds average
                self.logger.warning(f"Slow response time for {endpoint}: {analysis['avg_response_time']:.2f}s")
                
            if analysis['percentage_of_total'] > 50:  # More than 50% of total requests
                self.logger.info(f"Endpoint {endpoint} dominates usage: {analysis['percentage_of_total']:.1f}% of requests")

Alerting

Proactive alerting systems help you respond to API usage issues before they impact your application's performance or user experience. Effective alerting combines multiple data sources and provides actionable information to your team.

Alerting strategy:

Multi-level alerts: Different severity levels for different types of issues
Contextual information: Include relevant details to help with troubleshooting
Escalation procedures: Define when and how to escalate critical issues
Alert fatigue prevention: Avoid overwhelming teams with too many alerts
Actionable alerts: Provide clear next steps for each alert type

Python Usage Alerts
import time
from datetime import datetime, timedelta

class UsageAlerting:
    def __init__(self):
        self.alert_thresholds = {
            'high_rate': 100,      # requests per minute
            'excessive_rate': 500,  # requests per minute
            'error_rate': 0.1,      # 10% error rate
            'slow_response': 2.0,   # 2 seconds average response time
            'endpoint_dominance': 0.5  # 50% of total requests
        }
        self.alert_history = []
        self.alert_cooldown = 300  # 5 minutes between similar alerts
        
    def check_alerts(self, monitor: RequestMonitor, error_rate: float):
        """Comprehensive alert checking with cooldown protection"""
        current_rate = monitor.get_request_rate()
        endpoint_analysis = monitor.get_endpoint_analysis()
        current_time = datetime.now()
        
        # Rate-based alerts
        if current_rate > self.alert_thresholds['excessive_rate']:
            self._send_alert('CRITICAL', f"Excessive request rate: {current_rate:.2f} req/min", 
                           f"Immediate action required. Consider implementing rate limiting or reducing request frequency.")
        elif current_rate > self.alert_thresholds['high_rate']:
            self._send_alert('WARNING', f"High request rate: {current_rate:.2f} req/min",
                           f"Monitor closely. Consider optimizing request patterns.")
            
        # Error rate alerts
        if error_rate > self.alert_thresholds['error_rate']:
            self._send_alert('WARNING', f"High error rate: {error_rate:.2%}",
                           f"Investigate API issues. Check authentication and request format.")
            
        # Endpoint-specific alerts
        for endpoint, analysis in endpoint_analysis.items():
            if analysis['error_rate'] > self.alert_thresholds['error_rate']:
                self._send_alert('WARNING', f"High error rate for {endpoint}: {analysis['error_rate']:.2%}",
                               f"Check endpoint configuration and request parameters.")
                
            if analysis['avg_response_time'] > self.alert_thresholds['slow_response']:
                self._send_alert('WARNING', f"Slow response time for {endpoint}: {analysis['avg_response_time']:.2f}s",
                               f"Consider optimizing requests or implementing caching.")
                
            if analysis['percentage_of_total'] > self.alert_thresholds['endpoint_dominance']:
                self._send_alert('INFO', f"Endpoint {endpoint} dominates usage: {analysis['percentage_of_total']:.1f}% of requests",
                               f"Consider if this usage pattern is optimal for your application.")
    
    def _send_alert(self, level: str, message: str, recommendation: str = ""):
        """Send alert with cooldown protection and detailed information"""
        alert_id = f"{level}_{message}"
        
        # Check cooldown
        if self._is_in_cooldown(alert_id):
            return
            
        alert_data = {
            'timestamp': datetime.now().isoformat(),
            'level': level,
            'message': message,
            'recommendation': recommendation,
            'alert_id': alert_id
        }
        
        self.alert_history.append(alert_data)
        
        # Send the alert
        self._deliver_alert(alert_data)
        
    def _is_in_cooldown(self, alert_id: str) -> bool:
        """Check if alert is in cooldown period"""
        current_time = datetime.now()
        cooldown_start = current_time - timedelta(seconds=self.alert_cooldown)
        
        # Check recent alerts
        recent_alerts = [alert for alert in self.alert_history 
                        if alert['alert_id'] == alert_id and 
                        datetime.fromisoformat(alert['timestamp']) > cooldown_start]
        
        return len(recent_alerts) > 0
        
    def _deliver_alert(self, alert_data: dict):
        """Deliver alert through configured channels"""
        # This could send emails, Slack messages, PagerDuty alerts, etc.
        # Implement based on your alerting infrastructure
        print(f"[{alert_data['level']}] {alert_data['message']}")
        if alert_data['recommendation']:
            print(f"Recommendation: {alert_data['recommendation']}")

Future Considerations

Preparing for future rate limiting implementation ensures your applications will continue to work effectively when Nexla implements stricter rate limiting policies. This section covers preparation steps and what to expect from future changes.

Rate Limiting Implementation

When Nexla implements rate limiting, you'll need to adapt your applications to handle the new constraints gracefully. This transition period requires careful planning and systematic updates to ensure your applications continue to function effectively.

Implementation checklist:

Monitor Announcements: Stay informed about rate limiting changes through official channels
Update Clients: Modify your applications to handle rate limit responses (HTTP 429)
Implement Backoff: Add exponential backoff for rate limit errors to prevent overwhelming the API
Test Limits: Verify your applications work within new limits through comprehensive testing
Update Monitoring: Enhance your monitoring to track rate limit usage and violations
Optimize Requests: Review and optimize your request patterns to work within new constraints
Plan Fallbacks: Develop fallback strategies for when rate limits are reached

Preparation Steps

Taking these preparation steps ensures your applications will continue to work effectively when rate limiting is implemented. This proactive approach minimizes disruption and helps you maintain optimal performance.

Essential preparation activities:

Document Current Usage: Understand your current request patterns and identify optimization opportunities
Implement Monitoring: Add comprehensive request monitoring to track usage patterns and identify issues early
Plan for Limits: Design your applications to handle rate limiting gracefully with proper error handling
Test Scenarios: Test how your applications behave under various rate limits to ensure reliability
Optimize Request Patterns: Review and optimize your API usage to minimize the impact of future rate limits
Develop Fallback Strategies: Create alternative approaches for when rate limits prevent normal operation
Train Your Team: Ensure your development team understands rate limiting concepts and best practices

Troubleshooting

When rate limiting or throttling issues arise, systematic troubleshooting helps identify and resolve problems quickly. This section covers common issues, their solutions, and how to get help when needed.

Common Issues

Unexpected Throttling

Check if your organization has been throttled
Review recent API usage patterns
Contact support if throttling seems unwarranted

Rate Limit Configuration

Verify you have administrative permissions
Check that rate limit values are reasonable
Ensure proper JSON formatting in requests

Monitoring Setup

Verify logging configuration
Check alert thresholds
Test alert delivery mechanisms

Getting Help

If you encounter rate limiting or throttling issues, follow these steps to resolve them:

Check Documentation: Review this guide for configuration details
Monitor Usage: Review your API usage patterns
Contact Support: Reach out with specific error details and usage information
Review Logs: Check your application logs for unusual patterns

Next Steps

Error Handling: Learn about authentication error handling
Security Best Practices: Follow comprehensive security guidelines
Connector Authentication: Handle connector-specific authentication
Advanced Features: Explore organization context and impersonation

For comprehensive authentication information, return to the Authentication Overview.

Overview​

Current Rate Limiting Policy​

Policy Status​

Usage Monitoring​

Throttling Mechanisms​

When Throttling Occurs​

Throttling Implementation​

User Throttling​

Organization Throttling​

Throttling Response​

Rate Limit Configuration​

Setting Rate Limits​

Rate Limit Categories​

Rate Limit Values​

Setting Organization Rate Limits​

Client-Side Rate Limiting​

Implementing Rate Limiting​

Adaptive Rate Limiting​

Best Practices​

Request Management​

Error Handling​

Development Practices​

Monitoring and Alerts​

Request Monitoring​

Alerting​

Future Considerations​

Rate Limiting Implementation​

Preparation Steps​

Troubleshooting​

Common Issues​

Getting Help​

Next Steps​