This page covers the usage of two tools, Indexlogs, used to create an index of services and components, and Queryindex, used to return data from the index.
When executed, the indexlogs tool crawls the /service folder of the configured HDFS volume. This tool records the following information about each valid component that it finds:
- DC (i.e. 'dc1')
- Service (i.e. 'web')
- Type (i.e. 'logs')
- Component (i.e. 'applog')
- Date (to the day) of earliest valid log data
- Date (to the day) of latest valid log data
- Date (to the day) of latest archived log data
- Total size of component
- Size of all 'data' (non-incoming, non-archived) data
- Size of all 'archived' data
- Size of all 'incoming' data
This data is recorded to the /service/_index folder of the HDFS volume in CSV and JSON formats. The schema for the JSON format can be found below. The -t flag will return a human-readable tree of all components to the command line.
Note that indexlogs requires permissions to write to /service/_index
Usage: indexlogs [-t -n]
-t Print results to STDOUT in human-readable tree
-n Don't write index files into HDFS
{
"name": "DC",
"type": {
"name": "service",
"type": {
"name": "type",
"type": {
"name": "component",
"fields": [
{"name": "startDate", "type": "long", "description": "Date of earliest logs" },
{"name": "endDate", "type": "long", "description": "Date of latest logs" },
{"name": "archiveDate", "type": "long", "description": "Date before which logs are archived" },
{"name": "totalSize", "type": "double", "description": "Total size in bytes of all logs" },
{"name": "dataSize", "type": "double", "description": "Size in bytes of all non-incoming and non-archived logs" },
{"name": "archiveSize", "type": "double", "description": "Size in bytes of all archived logs" },
{"name": "incomingSize", "type": "double", "description": "Size in bytes of all incoming logs at time of index generation" }
]
}
}
}
}
The queryindex tool gathers data from the latest index generated by indexlogs. The basic queryindex tool returns the following information:
queryindex
- Total size of all components in the index
- Daily ingest rate of all components
- Earliest date of log data
- Latest date of log data
- Total interval over which log data exists (in days)
Instead of providing a summary of all indexed components, the above data can be returned for only those components matching regular expressions for DC, service, type and component.
queryindex 'DC' 'service' 'type' 'component'
A list of all matching components can be generated with the -p flag. When using the -p flag, the -d (return start, end and archive dates), -s (return sizes of data) and -i (return daily ingest rate) flags can be used to print more detailed information for each component.
queryindex -p -s -i -d 'DC' 'service' 'type' 'component'
The returned components can be further restricted by returning only components that have valid data for times between specified dates, using the -t flag.
queryindex -p -s -i -d -t 'start date' 'end date' 'DC' 'service' 'type' 'component'
If the -t flag is used, hourly ingest statistics can be generated for each component during the specified interval using the -a flag. This flag returns the average hourly ingest for this component over the time range, as well as the maximum and minimum ingest rates for the interval. It further generates a plot of ingest as a function of time.
Note that the -a flag will cause queryindex to access HDFS. A large number of matching components and/or a wide date range can result in slow performance using this flag.
Usage: queryindex [options] ['DC'] ['service'] ['type'] ['component']
Each of DC, service, type and component can be a regex
-p Print names of individual matched components
-t 'start' 'end' Display only components with indexed data between these dates
The following options require -p
-d Print the available date range for each component
-s Print the total size of each component
-i Print average daily ingest for the lifetime of this component
-l Print all data for each component on a single line
-a Display ingest activity between the specified dates, requires -dates*
* Display of ingest activity requires multiple queries to HDFS (can be slow)
$ queryindex -p -d -s -i -t 'Feb 27, 2012' 'Mar 1, 2012' -a 'dc1' 'web' '*' 'app'
Index is from 2013-06-03 and is 6 hours old.
99 / web / logs / applog
Logs exist for 3 days, from 2012-02-27 to 2012-02-29, none archived
Total: 578.05 GB, Data: 578.05 GB, Archived: 0 B, Incoming: 0 B
Average daily ingest over logged period: 192.68 GB/day
Gathering statistics...
Activity from 2012-02-27 00h to 2012-03-01 00h inclusive, 72 hours total.
Ingest over this period was a total of 578.05 GB at an average of 8.03 GB/hour.
Peak ingest over this period was 38.15 GB/hour and minimum ingest was 0 B/hour.
38.08 GB/hour - ▄██
████
▄▄██████
▄███████████
█████████████
Ingest ██████████████
██████████████
███████████████ ▄▄
████████████████ ██
█████████████████ ▄██
█████████████████████
0 B/hour - █▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█
00:00 14:00 04:00 18:00 08:00 22:00
2012-02-27 2012-02-27 2012-02-28 2012-02-28 2012-02-29 2012-02-29
Time (GMT), 1.18 hours per column
99 / web / logs / applog
Logs exist for 1 days, from 2012-02-28 to 2012-02-28, none archived
Total: 32.87 GB, Data: 32.87 GB, Archived: 0 B, Incoming: 0 B
Average daily ingest over logged period: 32.87 GB/day
Gathering statistics...
Activity from 2012-02-27 00h to 2012-03-01 00h inclusive, 72 hours total.
Ingest over this period was a total of 32.87 GB at an average of 467.45 MB/hour.
Peak ingest over this period was 32.87 GB/hour and minimum ingest was 0 B/hour.
21.46 GB/hour - █
█
█
█
█
Ingest █
█
█
██
██
██
0 B/hour - █▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█▀▀▀▀▀▀▀▀▀▀▀█
00:00 14:00 04:00 18:00 08:00 22:00
2012-02-27 2012-02-27 2012-02-28 2012-02-28 2012-02-29 2012-02-29
Time (GMT), 1.18 hours per column
Ingest for all matched components from 2012-02-27 to 2012-03-01, 72 hours total:
Total Ingest: 610.91 GB
Average Ingest Rate: 8.48 GB/hour
Max Ingest Rate: 38.15 GB/hour
Min Ingest Rate: 0 B/hour
Totals for all matched components:
Total Size: 610.91 GB
Ingest Rate: 203.64 GB/day
Earliest Date: 2012-02-27
Latest Date: 2012-02-29
Total Time: 3 days