Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Comprehensive Node Info System for Monitoring and Management #20

Open
hathbanger opened this issue Aug 22, 2024 · 0 comments
Open
Labels
feature New feature

Comments

@hathbanger
Copy link
Collaborator

hathbanger commented Aug 22, 2024

Description:

We need to implement a comprehensive system for gathering, storing, updating, and retrieving node information. This system will provide real-time insights into the status and performance of our network nodes, enabling efficient management and monitoring.

1. Data Collection and Storage

  • Implement a system to find OR collect and store the following node information:
    • Node ID
    • Current status (active, inactive, maintenance)
    • Hardware specifications (CPU, RAM, storage capacity)
    • Operating system and version (only compatible w linux ubuntu)
    • Geographic location
    • Current resource utilization (CPU, memory, storage, network)
    • Processes
    • Replication factor
    • Uptime
    • Last seen timestamp
    • Performance metrics (e.g., response time)
    • Version of node software
    • Connected peers (quorum membership)
    • Roles/capabilities of the node (all nodes do all things currently)
    • Any active alerts or warnings

2. Node Status Monitoring Service

  • Develop a scalable background service that:
    • Regularly pings all nodes to check their status
    • Updates node information in the database
    • Calculates and updates derived metrics (e.g., uptime percentage)
    • Implements configurable check intervals
    • Handles a large number of nodes efficiently
    • Implements retry logic for temporarily unresponsive nodes

3. API Endpoints

  • Create RESTful API endpoints for:
    • Retrieving information for a specific node: GET /api/v1/nodes/{nodeId}
    • Listing all nodes with optional filtering: GET /api/v1/nodes
    • Updating node information: PUT /api/v1/nodes/{nodeId}
    • Reporting node status (for node self-reporting): POST /api/v1/nodes/{nodeId}/status

4. Authentication and Authorization

  • Leverage PK in LASR Wallet to handle authentication and signatures for all protected endpoints.

5. Documentation

  • Provide comprehensive API documentation
  • Create system architecture documentation
  • Include deployment and configuration guides

Technical Notes:

  • Use TikV db for storage??
  • time-series db for storing historical performance data
  • Use WebSockets or server-sent events for real-time updates where applicable
@hathbanger hathbanger added the feature New feature label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

1 participant