Skip to content

Latest commit

 

History

History
182 lines (146 loc) · 9.04 KB

IndexTool.md

File metadata and controls

182 lines (146 loc) · 9.04 KB

This page covers the usage of two tools, Indexlogs, used to create an index of services and components, and Queryindex, used to return data from the index.

IndexLogs

When executed, the indexlogs tool crawls the /service folder of the configured HDFS volume. This tool records the following information about each valid component that it finds:

  • DC (i.e. 'dc1')
  • Service (i.e. 'web')
  • Type (i.e. 'logs')
  • Component (i.e. 'applog')
  • Date (to the day) of earliest valid log data
  • Date (to the day) of latest valid log data
  • Date (to the day) of latest archived log data
  • Total size of component
  • Size of all 'data' (non-incoming, non-archived) data
  • Size of all 'archived' data
  • Size of all 'incoming' data

This data is recorded to the /service/_index folder of the HDFS volume in CSV and JSON formats. The schema for the JSON format can be found below. The -t flag will return a human-readable tree of all components to the command line.

Note that indexlogs requires permissions to write to /service/_index

Indexlogs Usage

Usage: indexlogs [-t -n]
    -t      Print results to STDOUT in human-readable tree
    -n      Don't write index files into HDFS

Index JSON Schema

{
  "name": "DC",
  "type": {
    "name": "service",
    "type": {
      "name": "type",
      "type": {
        "name": "component",
        "fields": [
          {"name": "startDate", "type": "long", "description": "Date of earliest logs" },
          {"name": "endDate", "type": "long", "description": "Date of latest logs" },
          {"name": "archiveDate", "type": "long", "description": "Date before which logs are archived" },
          {"name": "totalSize", "type": "double", "description": "Total size in bytes of all logs" },
          {"name": "dataSize", "type": "double", "description": "Size in bytes of all non-incoming and non-archived logs" },
          {"name": "archiveSize", "type": "double", "description": "Size in bytes of all archived logs" },
          {"name": "incomingSize", "type": "double", "description": "Size in bytes of all incoming logs at time of index generation" }
        ]
      }
    }
  }
}

QueryIndex

The queryindex tool gathers data from the latest index generated by indexlogs. The basic queryindex tool returns the following information:

queryindex
  • Total size of all components in the index
  • Daily ingest rate of all components
  • Earliest date of log data
  • Latest date of log data
  • Total interval over which log data exists (in days)

Instead of providing a summary of all indexed components, the above data can be returned for only those components matching regular expressions for DC, service, type and component.

queryindex 'DC' 'service' 'type' 'component'

A list of all matching components can be generated with the -p flag. When using the -p flag, the -d (return start, end and archive dates), -s (return sizes of data) and -i (return daily ingest rate) flags can be used to print more detailed information for each component.

queryindex -p -s -i -d 'DC' 'service' 'type' 'component'

The returned components can be further restricted by returning only components that have valid data for times between specified dates, using the -t flag.

queryindex -p -s -i -d -t 'start date' 'end date' 'DC' 'service' 'type' 'component' 

If the -t flag is used, hourly ingest statistics can be generated for each component during the specified interval using the -a flag. This flag returns the average hourly ingest for this component over the time range, as well as the maximum and minimum ingest rates for the interval. It further generates a plot of ingest as a function of time.

Note that the -a flag will cause queryindex to access HDFS. A large number of matching components and/or a wide date range can result in slow performance using this flag.

Queryindex Usage

Usage: queryindex [options] ['DC'] ['service'] ['type'] ['component']
   Each of DC, service, type and component can be a regex
     -p                          Print names of individual matched components
     -t 'start' 'end'            Display only components with indexed data between these dates
   The following options require -p
     -d                          Print the available date range for each component
     -s                          Print the total size of each component
     -i                          Print average daily ingest for the lifetime of this component
     -l                          Print all data for each component on a single line
     -a                          Display ingest activity between the specified dates, requires -dates*
     * Display of ingest activity requires multiple queries to HDFS (can be slow)

Example queryindex output

$ queryindex -p -d -s -i -t 'Feb 27, 2012' 'Mar 1, 2012' -a 'dc1' 'web' '*' 'app'

Index is from 2013-06-03 and is 6 hours old.

99 / web / logs / applog
    Logs exist for 3 days, from 2012-02-27 to 2012-02-29, none archived
    Total: 578.05 GB, Data: 578.05 GB, Archived: 0 B, Incoming: 0 B
    Average daily ingest over logged period: 192.68 GB/day

Gathering statistics...

    Activity from 2012-02-27 00h to 2012-03-01 00h inclusive, 72 hours total.
    Ingest over this period was a total of 578.05 GB at an average of 8.03 GB/hour.
    Peak ingest over this period was 38.15 GB/hour and minimum ingest was 0 B/hour.

 38.08 GB/hour -                               ▄██                            
                                              ████                            
                                           ▄▄██████                           
                                       ▄███████████                           
                                      █████████████                           
    Ingest                            ██████████████                          
                                      ██████████████                          
                                     ███████████████    ▄▄                    
                                     ████████████████   ██                    
                                     █████████████████ ▄██                    
                                     █████████████████████                    
      0 B/hour - █▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█
               00:00       14:00       04:00       18:00       08:00       22:00       
             2012-02-27  2012-02-27  2012-02-28  2012-02-28  2012-02-29  2012-02-29  
                               Time (GMT), 1.18 hours per column


99 / web / logs / applog
    Logs exist for 1 days, from 2012-02-28 to 2012-02-28, none archived
    Total: 32.87 GB, Data: 32.87 GB, Archived: 0 B, Incoming: 0 B
    Average daily ingest over logged period: 32.87 GB/day

Gathering statistics...

    Activity from 2012-02-27 00h to 2012-03-01 00h inclusive, 72 hours total.
    Ingest over this period was a total of 32.87 GB at an average of 467.45 MB/hour.
    Peak ingest over this period was 32.87 GB/hour and minimum ingest was 0 B/hour.

 21.46 GB/hour -                              █                               
                                              █                               
                                              █                               
                                              █                               
                                              █                               
    Ingest                                    █                               
                                              █                               
                                              █                               
                                             ██                               
                                             ██                               
                                             ██                               
      0 B/hour - █▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█
               00:00       14:00       04:00       18:00       08:00       22:00       
             2012-02-27  2012-02-27  2012-02-28  2012-02-28  2012-02-29  2012-02-29  
                               Time (GMT), 1.18 hours per column


Ingest for all matched components from 2012-02-27 to 2012-03-01, 72 hours total:

  Total Ingest:        610.91 GB
  Average Ingest Rate: 8.48 GB/hour
  Max Ingest Rate:     38.15 GB/hour
  Min Ingest Rate:     0 B/hour

Totals for all matched components:

  Total Size:    610.91 GB
  Ingest Rate:   203.64 GB/day
  Earliest Date: 2012-02-27
  Latest Date:   2012-02-29
  Total Time:    3 days