1
1
PRACTICAL GREMLIN: An Apache TinkerPop Tutorial
2
2
===============================================
3
3
Kelvin R. Lawrence <gfxman@yahoo.com>
4
- v278-preview, Mar 26 , 2018
5
- // Mon Mar 26 , 2018 17:00:31 CDT
4
+ v278-preview, Mar 27 , 2018
5
+ // Tue Mar 27 , 2018 08:14:10 CDT
6
6
//:Author: Kelvin R. Lawrence
7
7
//:Email: gfxman@yahoo.com
8
- //:Date: Mar 26 2018
8
+ //:Date: Mar 27 2018
9
9
:Numbered:
10
10
:source-highlighter: pygments
11
11
:pygments-style: paraiso-dark
@@ -11921,17 +11921,23 @@ interested in as follows. Only airports with 105 outgoing routes are selected.
11921
11921
// Which of these airports have 105 outgoing routes?
11922
11922
g.V().hasLabel('airport').
11923
11923
group().by(out().count()).by('code').next().get(105L)
11924
+ ----
11925
+
11926
+ NOTE: Currently 'select' can only take a string value as the key so we have to
11927
+ use the slightly awkward 'next().get()' syntax to get a numeric key from a
11928
+ result.
11924
11929
11930
+ This time the results only include the airports with 105 outgoing routes.
11931
+
11932
+ [source,groovy]
11933
+ ----
11925
11934
PHX
11926
11935
MEX
11927
11936
TLV
11928
11937
HAM
11929
11938
XIY
11930
11939
----
11931
11940
11932
- NOTE: Currently 'select' can only take a string value as the key so we have to
11933
- use the slightly awkward 'next().get()' syntax to get a numeric key from a
11934
- result.
11935
11941
11936
11942
[[groupvar]]
11937
11943
Using groupCount with a traversal variable
@@ -12139,7 +12145,10 @@ between airports anywhere in Europe and airports in the USA.
12139
12145
[source,groovy]
12140
12146
----
12141
12147
// How many routes from anywhere in Europe to the USA?
12142
- g.V().has('continent','code','EU').out().out().has('country','US').count()
12148
+
12149
+ g.V().has('continent','code','EU').
12150
+ out().out().has('country','US').
12151
+ count()
12143
12152
12144
12153
351
12145
12154
----
@@ -12153,23 +12162,33 @@ US airports have flights that arrive from Europe.
12153
12162
[source,groovy]
12154
12163
----
12155
12164
// How many different US airports have routes from Europe?
12156
- g.V().has('continent','code','EU').out().out().has('country','US').dedup().count()
12165
+
12166
+ g.V().has('continent','code','EU').
12167
+ out().out().has('country','US').
12168
+ dedup().count()
12157
12169
12158
12170
38
12159
12171
----
12160
12172
12161
12173
So we can now see that the 345 routes from European airports arrive at one of 38
12162
12174
airports in the United States. We can dig a bit deeper and look at the distribution
12163
- of these routes across the 38 airports. John F. Kennedy airport (JFK) in New York
12164
- appears to have the most routes from Europe with Newark (EWR) having the second most.
12175
+ of these routes across the 38 airports.
12165
12176
12166
12177
[source,groovy]
12167
12178
----
12168
12179
//What is the distribution of the routes amongst those US airports?
12169
12180
12170
- g.V().has('continent','code','EU').out().out().has('country','US').
12171
- groupCount().by('code').order(local).by(values,incr)
12181
+ g.V().has('continent','code','EU').
12182
+ out().out().has('country','US').
12183
+ groupCount().by('code').
12184
+ order(local).by(values,incr)
12185
+ ----
12186
+
12187
+ John F. Kennedy airport (JFK) in New York appears to have the most routes from Europe
12188
+ with Newark (EWR) having the second most.
12172
12189
12190
+ [source,groovy]
12191
+ ----
12173
12192
[PHX:1,CVG:1,RSW:1,BDL:2,SJC:2,BWI:2,AUS:2,RDU:2,MSY:2,SAN:3,SLC:3,PDX:3,PIT:3,TPA:4,SFB:4,OAK:4,DTW:5,SWF:5,MSP:5,DEN:5,FLL:6,CLT:7,DFW:7,PVD:7,SEA:8,IAH:8,MCO:10,LAS:10,ATL:14,SFO:15,PHL:17,IAD:19,ORD:21,BOS:22,LAX:23,MIA:25,EWR:33,JFK:40]
12174
12193
----
12175
12194
@@ -12179,35 +12198,58 @@ all, we can calculate how many European airports have flights to the United Stat
12179
12198
[source,groovy]
12180
12199
----
12181
12200
// How many European airports have service to the USA?
12182
- g.V().has('continent','code','EU').out().as('a').out().has('country','US').
12201
+
12202
+ g.V().has('continent','code','EU').
12203
+ out().as('a').
12204
+ out().has('country','US').
12183
12205
select('a').dedup().count()
12184
12206
12185
12207
53
12186
12208
----
12187
12209
12188
12210
Just as we did for the airports in the US we can figure out the distribution of
12189
- routes for the European airports. It appears that London Heathrow (LHR) offers the
12190
- most US destinations and Frankfurt (FRA) the second most.
12211
+ routes for the European airports.
12191
12212
12192
12213
[source,groovy]
12193
12214
----
12194
- //What is the distribution of these routes amongst the European airports?
12195
- g.V().has('continent','code','EU').out().as('a').out().has('country','US').
12196
- select('a').groupCount().by('code').order(local).by(values,incr)
12215
+ // What is the distribution of US routes amongst
12216
+ // the European airports?
12217
+
12218
+ g.V().has('continent','code','EU').
12219
+ out().as('a').
12220
+ out().has('country','US').
12221
+ select('a').groupCount().by('code').
12222
+ order(local).by(values,incr)
12223
+ ----
12224
+
12225
+ It appears that London Heathrow (LHR) offers the
12226
+ most US destinations and Frankfurt (FRA) the second most.
12197
12227
12228
+ [source,groovy]
12229
+ ----
12198
12230
[RIX:1,TER:1,BRS:1,STN:1,NCE:1,KRK:1,ORK:1,KBP:1,PDL:1,DME:1,BEG:1,AGP:1,HAM:1,OPO:2,STR:2,VCE:2,BHX:2,ORY:2,ATH:2,MXP:3,BFS:3,BGO:3,GVA:3,VKO:3,HEL:3,WAW:4,SVO:4,GLA:4,EDI:5,SNN:5,CGN:5,TXL:6,VIE:6,LIS:6,BRU:7,ARN:7,OSL:8,IST:9,BCN:10,MAD:11,LGW:11,DUS:11,CPH:11,FCO:12,ZRH:13,MAN:13,DUB:15,MUC:16,AMS:18,KEF:18,CDG:22,FRA:24,LHR:27]
12199
12231
----
12200
12232
12201
- Lastly, we can find out what the list of routes flown is. For this example we just
12202
- return 10 of the 345 routes. Note how the 'path' step returns all parts of the
12203
- traversal including the continent code 'EU'.
12233
+ Lastly, we can find out what the list of routes flown is. For this example I decided
12234
+ to just return 10 of the 345 routes. Note how the 'path' step returns all parts of
12235
+ the traversal including the continent code 'EU'. We could remove that part of the
12236
+ result by adding a 'from' modulator as shown earlier in the "<<pathintro>>" section.
12204
12237
12205
12238
[source,groovy]
12206
12239
----
12207
- // What are some of these routes?
12208
- g.V().has('continent','code','EU').out().out().
12209
- has('country','US').path().by('code').limit(10)
12240
+ // Selected routes from Europe to the USA.
12241
+
12242
+ g.V().has('continent','code','EU').
12243
+ out().out().
12244
+ has('country','US').
12245
+ path().by('code').
12246
+ limit(10)
12247
+ ----
12210
12248
12249
+ The first 10 results returned feature routes from Warsaw, Belgrade and Istanbul.
12250
+
12251
+ [source,groovy]
12252
+ ----
12211
12253
[EU,WAW,JFK]
12212
12254
[EU,WAW,LAX]
12213
12255
[EU,WAW,ORD]
@@ -12228,25 +12270,34 @@ Earlier in the book we saw examples of 'sum' being used to count a
12228
12270
collection of values. You can also use 'fold' to do something similar but in a
12229
12271
more 'map-reduce' type of fashion.
12230
12272
12231
- First of all, here is a query that uses 'fold' in a way that we have already
12232
- seen. We find all routes from Austin and use 'fold' to return a nice list of
12233
- those names.
12273
+ First of all, here is a query that uses 'fold' in a way that we have already seen. It
12274
+ will find all routes from Austin and uses a 'fold' step to return a list of those
12275
+ names.
12234
12276
12235
12277
[source,groovy]
12236
12278
----
12237
- g.V().has('code','AUS').out('route').values('city').fold()
12279
+ g.V().has('code','AUS').
12280
+ out('route').
12281
+ values('city').fold()
12282
+ ----
12283
+
12284
+ As expected the results show all of the cities that you can fly to from Austin
12285
+ collected into a single list.
12238
12286
12287
+ [source,groovy]
12288
+ ----
12239
12289
[Toronto,London,Frankfurt,Mexico City,Pittsburgh,Portland,Charlotte,Cancun,Memphis,Cincinnati,Indianapolis,Kansas City,Dallas,St Louis,Albuquerque,Chicago,Lubbock,Harlingen,Guadalajara,Pensacola,Valparaiso,Orlando,Branson,St Petersburg-Clearwater,Atlanta,Nashville,Boston,Baltimore,Washington D.C.,Dallas,Fort Lauderdale,Washington D.C.,Houston,New York,Los Angeles,Orlando,Miami,Minneapolis,Chicago,Phoenix,Raleigh,Seattle,San Francisco,San Jose,Tampa,San Diego,Long Beach,Santa Ana,Salt Lake City,Las Vegas,Denver,New Orleans,Newark,Houston,El Paso,Cleveland,Oakland,Philadelphia,Detroit]
12240
12290
----
12241
12291
12242
12292
However, what if we wanted to reduce our results further? Take a look at the modified
12243
12293
version of our query below. It finds all routes from Austin and looks at the names of
12244
- the destination cities. However, rather than return all the names, 'fold' is used to
12245
- reduce the names to a value. That value being the total number of characters in all
12246
- of those city names. We have seen 'fold' used elsewhere in the book but this time
12247
- we provide 'fold' with a parameter and a closure. The parameter is passed to the
12248
- closure as the first variable and the name of the city as the second. The closure
12249
- then adds the zero and the length of each name effectively to a running total.
12294
+ the destination cities. However, rather than return all the names, this time the
12295
+ 'fold' step is used differently and effectively reduces the city names to a single
12296
+ value. That value being the total number of characters in all of those city names. We
12297
+ have seen 'fold' used elsewhere in the book but this time we provide 'fold' with a
12298
+ parameter and a closure. The parameter is passed to the closure as the first variable
12299
+ and the name of the city as the second. The closure then adds the zero and the length
12300
+ of each name effectively producing a running total.
12250
12301
12251
12302
[source,groovy]
12252
12303
----
0 commit comments