Itisquitecommontohaveasituationwheretherearetwoparticularverticesofagraphandaneedtoexecutesometraversalonthepathsfoundbetweenthem.Considerthefollowingexamplesusingthemoderntoygraph:
gremlin>ids=[2,4,6].toArray()==>2==>4==>6gremlin>g.V(ids).as("a").repeat(bothE().otherV().simplePath()).times(5).emit(hasId(within(ids))).as("b").filter(select(last,"a","b").by(id).where("a",lt("b"))).path().by().by(label)==>[v[2],knows,v[1],knows,v[4]]==>[v[2],knows,v[1],created,v[3],created,v[4]]==>[v[2],knows,v[1],created,v[3],created,v[6]]==>[v[2],knows,v[1],knows,v[4],created,v[3],created,v[6]]==>[v[4],created,v[3],created,v[6]]==>[v[4],knows,v[1],created,v[3],created,v[6]]ids=[2,4,6].toArray()g.V(ids).as("a").repeat(bothE().otherV().simplePath()).times(5).emit(hasId(within(ids))).as("b").filter(select(last,"a","b").by(id).where("a",lt("b"))).path().by().by(label)Foranotherexample,considerthefollowingschema:
Assumethatthegoalistofindinformationaboutaknownjobandaknownperson.Specifically,theideawouldbetoextracttheknownjob,thecompanythatcreatedthejob,thedateitwascreatedbythecompanyandwhetherornottheknownpersoncompletedanapplication.
gremlin>g=traversal().withEmbedded(graph).withComputer()==>graphtraversalsource[tinkergraph[vertices:6edges:6],graphcomputer]gremlin>g.V().pageRank().with(PageRank.propertyName,'pageRank').values('pageRank')==>0.30472009079122503==>0.17579889899708237==>0.11375510357865543==>0.11375510357865543==>0.14598540152719108==>0.14598540152719108g=traversal().withEmbedded(graph).withComputer()g.V().pageRank().with(PageRank.propertyName,'pageRank').values('pageRank')Collections
TheappearanceofaListasatraverserinGremlinusuallyarisesasaresultofafold()operation,butmayalsoappearbywayofsomeside-effectstepslikeaggregate():
gremlin>g.V().fold()==>[v[1],v[2],v[3],v[4],v[5],v[6]]gremlin>g.V().aggregate(local,'a').cap('a')==>[v[1],v[2],v[3],v[4],v[5],v[6]]g.V().fold()g.V().aggregate(local,'a').cap('a')ItisworthnotingthatwhileaPathisnottechnicallyaListitdoespresentlikeoneandcanbemanipulatedinsimilarfashiontolists:
gremlin>g.V().out().out().path()==>[v[1],v[4],v[5]]==>[v[1],v[4],v[3]]g.V().out().out().path()TheseexamplesareobviouslytrivialandthereareotherwaysthatatraversermightendupinaListform,but,atthismoment,thepointhereistofocuslessonhowtogetaListandmoreonhowtomanipulateonewithintheGremlinlanguage.TheexamplesgoingforwardwillalsobesimilarlycontrivedinsofarasproducingausableListtomanipulate.Bearinmindthatitmaybequitepossibletogetthesameendresultsoftheseexamplesusingmoredirectmeansthanwhatisdemonstrated.
Itmayseemsimple,butthemostobviouschoicetomodifyingwhatisinalististosimplyunfold()theList:
gremlin>g.V().union(fold(),fold())==>[v[1],v[2],v[3],v[4],v[5],v[6]]==>[v[1],v[2],v[3],v[4],v[5],v[6]]gremlin>g.V().union(fold(),fold()).unfold().values('name')==>marko==>marko==>vadas==>vadas==>lop==>lop==>josh==>josh==>ripple==>ripple==>peter==>peterg.V().union(fold(),fold())g.V().union(fold(),fold()).unfold().values('name')ThetwoseparateListtraversersareflattenedtoasingletraversalstreamandalltheresultsaremixedtogether.Whilethisapproachmaybeacceptable,therearemanycaseswhereitmightnotbeso.TopreservetheindividualstructureoftheListtraversers"locally"unfold()theliststotransformthem:
gremlin>g.V().union(fold(),fold()).local(unfold().values('name').fold())==>[marko,vadas,lop,josh,ripple,peter]==>[marko,vadas,lop,josh,ripple,peter]g.V().union(fold(),fold()).local(unfold().values('name').fold())Thecalltolocal()executesitsanonymoussub-traversalovereachindividualListiteratorandasthesub-traversalendswithafold()-step,theresultsarereducedbackintoaListtopreservetheoriginalstructure,thusmaintainingtwotraverserresults.
ThispatternforunfoldingandfoldingListtraversersendsuphavingotherapplications:
AnotherinterestingmethodforListcreationwasdemonstratedabitearlierbutnotexaminedindetail-theuseofunion().ItwasshownearlierinthefollowingcontextwhereithelpedcreateaListoftwolistsofthreeverticeseach:
gremlin>g.V().union(limit(3).fold(),tail(3).fold())==>[v[1],v[2],v[3]]==>[v[4],v[5],v[6]]g.V().union(limit(3).fold(),tail(3).fold())Byfoldingtheresultsofunion(),itbecomespossibletoessentiallyconstructlistswitharbitrarytraversalresults.
gremlin>g.V().local(union(identity(),////(1)bothE().count()).fold())==>[v[1],3]==>[v[2],1]==>[v[3],3]==>[v[4],3]==>[v[5],1]==>[v[6],1]gremlin>g.V().aggregate(local,'x').by(union(select('x').count(local),////(2)identity(),bothE().count()).fold()).cap('x')==>[[0,v[1],3],[1,v[2],1],[2,v[3],3],[3,v[4],3],[4,v[5],1],[5,v[6],1]]g.V().local(union(identity(),////(1)bothE().count()).fold())g.V().aggregate(local,'x').by(union(select('x').count(local),////(2)identity(),bothE().count()).fold()).cap('x')Thepatternhereistouseunion()inconjunctionwithfold().Asexplainedearlier,thefold()operationreducesthestreamfromunion()toasingleListthatisthenfedforwardtothenextstepinthetraversal.
NowthatListpatternshavebeenexplained,therecannowbesomeattentiononMap.OneofthemostcommonwaystoendupwithaMapiswithvalueMap():
gremlin>g.V().has('name','marko').valueMap('name','age')==>[name:[marko],age:[29]]g.V().has('name','marko').valueMap('name','age')Theproblemisthatunlessthegraphismakinguseofmulti-properties,thereislittleneedtohavethevalueofeachpropertystoredasaList.OnewaytounwrapthisvaluefromthelististoavoidhavingitthereinthefirstplacebyavoidinguseofvalueMap():
gremlin>g.V().has('name','marko').valueMap('name','age').unfold().group().by(keys).by(select(values).unfold())==>[name:marko,age:29]g.V().has('name','marko').valueMap('name','age').unfold().group().by(keys).by(select(values).unfold())Thecodeabove,basicallydeconstructsthenreconstructstheMap.Thekeytothepatternistofirstunfold()theMapintoitskeyandvalueentries.ThenforeachkeyandvalueproduceanewMapusinggroup()wherethekeyforthatmapisthekeyoftheentry(thoseareobviouslyuniqueasyoupickedthemoutofthevalueMap())andthevalueissimplytheunfold()ofthelistofvaluesineachentry.Recallthattheselect(values).unfold()onlyreturnsonevalue(i.e.thefirst)notonlybecausethereisonlyone,butalsobecauseby()willonlycallnext()onthatsub-traversal(itdoesnotiteratetheentirething).
Inthefollowingcase,project()isusedtocreateaMapthatdoesnotmeetthisrequirementasitcontainssomeunavoidableextraneouskeysintheoutputMap:
gremlin>g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a')))==>[name:marko,age:29,lang:n/a]==>[name:vadas,age:27,lang:n/a]==>[name:lop,age:n/a,lang:java]==>[name:josh,age:32,lang:n/a]==>[name:ripple,age:n/a,lang:java]==>[name:peter,age:35,lang:n/a]g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a')))Theuseofcoalesce()worksaroundtheproblemwhere"age"and"lang"arenotnecessarilypropertykeyspresentoneverysinglevertexinthetraversalstream.Whenthe"age"or"lang"arenotpresent,theconstantof"n/a"issupplied.Whilethismaybeanacceptableoutput,itispossibletoshapetheMaptobe"nicer":
gremlin>g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a'))).local(unfold().filter(select(values).is(P.neq('n/a'))).group().by(keys).by(values))==>[name:[marko],age:[29]]==>[name:[vadas],age:[27]]==>[name:[lop],lang:[java]]==>[name:[josh],age:[32]]==>[name:[ripple],lang:[java]]==>[name:[peter],age:[35]]g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a'))).local(unfold().filter(select(values).is(P.neq('n/a'))).group().by(keys).by(values))Theadditionalstepsaboveunfold()theMaptokey-valueentriesandfilterthevaluesfor"n/a"andremovethempriortoreconstructingtheMapwiththemethodshownearlier.Togoastepfurther,applythepatternpresentedearliertoflattenListvalueswithinaMap:
gremlin>g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a'))).local(unfold().filter(select(values).is(P.neq('n/a'))).group().by(keys).by(select(values).unfold()))==>[name:marko,age:29]==>[name:vadas,age:27]==>[name:lop,lang:java]==>[name:josh,age:32]==>[name:ripple,lang:java]==>[name:peter,age:35]g.V().project('name','age','lang').by('name').by(coalesce(values('age'),constant('n/a'))).by(coalesce(values('lang'),constant('n/a'))).local(unfold().filter(select(values).is(P.neq('n/a'))).group().by(keys).by(select(values).unfold()))AstheremaybeadesiretoremoveentriesfromaMap,theremayalsobetheneedtoaddkeystoaMap.Thepatternhereinvolvestheuseofaunion()thatreturnstheMapinstanceswhichcanbeflattenedtoentriesandthenreconstructedasanewMapthathasbeenmergedtogether:
gremlin>g.V().has('name','marko').union(project('degree').////(1)by(bothE().count()),valueMap().with(WithOptions.tokens)).unfold().////(2)group().by(keys).by(select(values).unfold())==>[degree:3,name:marko,id:1,label:person,age:29]g.V().has('name','marko').union(project('degree').////(1)by(bothE().count()),valueMap().with(WithOptions.tokens)).unfold().////(2)group().by(keys).by(select(values).unfold())Whenusingthispattern,itisimportanttorecognizethatiftherearenon-uniquekeysproducedbythetraversalssuppliedtounion(),theywilloverwriteoneanothergiventhefinalby()modulatorabove.Ifchangedtoby(select(values).unfold().fold())theywillmergetoproduceaListofvalues.Ofcourse,thatchangewillbringaListbackforallthevaluesofthenewMap.WithsomeaddedlogictheMapvaluescanbeflattenedoutofListinstanceswhennecessary:
Dependingonthesizeofthegraph,threesolutionregimescanbediscriminated:
Theseregimesarediscussedseparatelyusingthefollowinggraphwiththreeweaklyconnectedcomponents:
gremlin>g.withComputer().V().connectedComponent().group().by(ConnectedComponent.component).select(values).unfold()==>[v[E],v[D]]==>[v[C],v[B],v[A]]==>[v[F]]g.withComputer().V().connectedComponent().group().by(ConnectedComponent.component).select(values).unfold()NoteThecomponentoptionpassedtoby()isstaticallyimportedfromConnectedComponentandreferstothedefaultpropertykeywithinwhichtheresultofthealgorithmisstored.AstraightforwardwaytodetectthevarioussubgraphswithanOLTPtraversalistodothis:
gremlin>g.V().emit(cyclicPath().or().not(both())).////(1)repeat(__.where(without('a')).aggregate(local,'a').both()).until(cyclicPath()).////(2)group().by(path().unfold().limit(1)).////(3)by(path().unfold().dedup().fold()).////(4)select(values).unfold()////(5)==>[v[A],v[B],v[C]]==>[v[D],v[E]]==>[v[F]]g.V().emit(cyclicPath().or().not(both())).////(1)repeat(__.where(without('a')).aggregate(local,'a').both()).until(cyclicPath()).////(2)group().by(path().unfold().limit(1)).////(3)by(path().unfold().dedup().fold()).////(4)select(values).unfold()//5SmallgraphscalabilityThescalabilityoftheOLTPtraversalandtheconnectedComponent()-stepforin-memorygraphsisshowninthefiguresbelow.
Therandomgraphsusedforthescalabilitytestscanbemodulatedwiththeedge/vertexratio.Forsmallratiosthecomponentsgeneratedaremorelint-likeandhardertoprocessbytheconnectedComponent()-step.Forhighratiosthecomponentsaremoremesh-likeandtheConnectedComponentVertexProgramneedsfewcyclestoprocessthegraph.Thesecharacteristicsshowclearlyfromthegraph.Indeed,foragivennumberofvertices,theruntimeoftheconnectedComponent()-stepdoesnotdependonthenumberofedges,butratheronthemaximumshortestpathlengthinthegraph.
LargegraphsinTinkerPoprequiredistributedprocessingbySparkGraphComputertogetresultsinareasonabletime(OLAPapproach).ThismeansthatthegraphmustbeavailableasHadoopGraph(thirdpartyTinkerPopimplementationsoftenallowtomakeagraphavailableasanHadoopGraphbyprovidinganHadoopInputFormat).RunningtheconnectedComponent()-steponanHadoopGraphworksthesameasforasmallgraph,providedthatSparkGraphComputerisspecifiedasthegraphcomputer,eitherwiththegremlin.hadoop.defaultGraphComputerpropertyoraspartofthewithComputer()-step.
ScalabilityofthetheconnectedComponent()-stepwithSparkGraphComputerishigh,butnotethat:
Acycleoccursinagraphwhereapathloopsbackonitselftotheoriginatingvertex.Forexample,inthegraphdepictedbelowGremlincouldbeusetodetectthecycleamongverticesA-B-C.
Gremlincandetectifsuchacycleexistswith:
The"modern"graphdoesnothaveanyduplicateedgesthatfitthatdefinition,sothefollowingexampleaddsonethatisduplicativeofthe"created"edgebetweenvertex"1"and"3".
gremlin>g.V(1).as("a").V(3).addE("created").from("a").iterate()gremlin>g.V(1).outE("created")==>e[0][1-created->3]==>e[9][1-created->3]g.V(1).as("a").V(3).addE("created").from("a").iterate()g.V(1).outE("created")Onewaytofindtheduplicateedgeswouldbetodosomethinglikethis:
gremlin>g.V().outE().project("a","b").////(1)by().by(inV().path().by().by(label)).group().////(2)by("b").by(select("a").fold()).unfold().////(3)select(values).////(4)where(count(local).is(gt(1)))==>[e[0][1-created->3],e[9][1-created->3]]g.V().outE().project("a","b").////(1)by().by(inV().path().by().by(label)).group().////(2)by("b").by(select("a").fold()).unfold().////(3)select(values).////(4)where(count(local).is(gt(1)))Thismethodfindtheduplicates,butdoesrequiremorememorythanotherapproaches.Aslightlymorecomplexapproachthatuseslessmemorymightlooklikethis:
gremlin>g.V().as("ov").outE().as("e").inV().as("iv").inE().////(1)where(neq("e")).////(2)where(eq("e")).by(label).where(outV().as("ov")).group().by(select("ov","e","iv").by().by(label)).////(3)unfold().////(4)select(values).where(count(local).is(gt(1)))==>[e[9][1-created->3],e[0][1-created->3]]g.V().as("ov").outE().as("e").inV().as("iv").inE().////(1)where(neq("e")).////(2)where(eq("e")).by(label).where(outV().as("ov")).group().by(select("ov","e","iv").by().by(label)).////(3)unfold().////(4)select(values).where(count(local).is(gt(1)))Notethattheabovetraversalcouldalsobewrittenusingmatchstep:
gremlin>g.withoutStrategies(LazyBarrierStrategy,PathRetractionStrategy).V().as("ov").////(1)outE().as("e1").inV().as("iv").inE().where(neq("e1")).where(outV().as("ov")).as("e2").////(2)where("e1",eq("e2")).by(label)////(3)==>e[9][1-created->3]==>e[0][1-created->3]g.withoutStrategies(LazyBarrierStrategy,PathRetractionStrategy).V().as("ov").////(1)outE().as("e1").inV().as("iv").inE().where(neq("e1")).where(outV().as("ov")).as("e2").////(2)where("e1",eq("e2")).by(label)//3Thebasicpatternatplayhereistocomparethepathoftheoutgoingvertex,itsoutgoingedgelabelandtheincomingvertex.Thismodelcanobviouslybecontractedorexpandedasneededtofitdifferentdefinitionsof"duplicate".Forexample,a"duplicate"definitioncouldextendedtothelabelandpropertiesoftheedge.Forpurposesofdemonstration,anadditionaledgeisaddedtothe"modern"graph:
gremlin>g.V(1).as("a").V(3).addE("created").property("weight",0.4d).from("a").iterate()gremlin>g.V(1).as("a").V(3).addE("created").property("weight",0.5d).from("a").iterate()gremlin>g.V(1).outE("created").elementMap()==>[id:0,label:created,IN:[id:3,label:software],OUT:[id:1,label:person],weight:0.4]==>[id:9,label:created,IN:[id:3,label:software],OUT:[id:1,label:person],weight:0.4]==>[id:13,label:created,IN:[id:3,label:software],OUT:[id:1,label:person],weight:0.5]g.V(1).as("a").V(3).addE("created").property("weight",0.4d).from("a").iterate()g.V(1).as("a").V(3).addE("created").property("weight",0.5d).from("a").iterate()g.V(1).outE("created").elementMap()Toidentifytheduplicatewiththisreviseddefinition,theprevioustraversalcanbemodifiedto:
Considerthefollowingexamplewithsomeduplicateverticesaddedtothe"modern"graph:
gremlin>g.V().hasLabel("person").group().by(values("name","age").fold()).unfold().filter(select(values).count(local).is(gt(1)))==>[vadas,27]=[v[0],v[2]]==>[marko,29]=[v[16],v[1]]g.V().hasLabel("person").group().by(values("name","age").fold()).unfold().filter(select(values).count(local).is(gt(1)))Thatfilter,extractsthevaluesoftheMapandcountstheverticeswithineachlist.Ifthatlistcontainsmorethanonevertexthenitisaduplicate.
Asidefromtheirproperties,edgesareimmutablestructureswherethelabelandtherelatedinandoutverticescannotbemodified.To"move"anedgefromonevertextoanother,itrequiresthattheedgebedroppedandanewedgebecreatedwiththesamepropertiesandlabel.Itispossibletosimulatethis"move"inasingletraversalasfollows:
gremlin>g.V().has('name','marko').outE().inV().path().by('name').by(elementMap())==>[marko,[id:9,label:created,IN:[id:3,label:software],OUT:[id:1,label:person],weight:0.4],lop]==>[marko,[id:7,label:knows,IN:[id:2,label:person],OUT:[id:1,label:person],weight:0.5],vadas]==>[marko,[id:8,label:knows,IN:[id:4,label:person],OUT:[id:1,label:person],weight:1.0],josh]g.V().has('name','marko').outE().inV().path().by('name').by(elementMap())The"marko"vertexcontainsa"knows"edgetothe"vadas"vertex.Thefollowingcodeshowshowto"move"thatedgetothe"peter"vertexinasingletraversal:
gremlin>g.V().has('person','name','marko').hasNext()==>truegremlin>g.V().has('person','name','stephen').hasNext()==>falseg.V().has('person','name','marko').hasNext()g.V().has('person','name','stephen').hasNext()Knowingthatanelementexistsornotisusuallyacommonpointofdecisionindeterminingtheappropriatepathofcodetotake.Intheexampleabove,thecheckisforvertexexistenceandatypicalreasontocheckforexistenceistodeterminewhetherornottoaddanewvertexortoreturntheonethatexists(i.e."getorcreate"pattern).Thisentireoperationcanoccurinasingletraversal.
gremlin>g.V().has('person','name','marko').fold().coalesce(unfold(),addV('person').property('name','marko').property('age',29))==>v[1]gremlin>g.V().has('person','name','stephen').fold().coalesce(unfold(),addV('person').property('name','stephen').property('age',34))==>v[0]g.V().has('person','name','marko').fold().coalesce(unfold(),addV('person').property('name','marko').property('age',29))g.V().has('person','name','stephen').fold().coalesce(unfold(),addV('person').property('name','stephen').property('age',34))Thisuseofcoalesce()shownaboveisthebasisforthispattern.Notethatattheendofhas()-stepthereiseitheravertexornot.Byusingfold(),"existence"or"notexistence"isreducedtoaListwiththevertexoraListwithnovalues.WithaListasthetraverserflowingintocoalesce()thefirstchildtraversaltoreturnsomethingwillexecute.IftheListhasavertexthenitwillunfold()andreturntheexistingone.Ifitisempty,thenthevertexdoesnotexistanditisaddedandreturned.
This"getorcreate"logiccanbeexpandedtobe"upsert"likefunctionalityasfollows:
gremlin>g.V().has('person','name','marko').fold().coalesce(unfold(),addV('person').property('name','marko')).property('age',29)==>v[1]gremlin>g.V().has('person','name','stephen').fold().coalesce(unfold(),addV('person').property('name','stephen')).property('age',34)==>v[13]g.V().has('person','name','marko').fold().coalesce(unfold(),addV('person').property('name','marko')).property('age',29)g.V().has('person','name','stephen').fold().coalesce(unfold(),addV('person').property('name','stephen')).property('age',34)Bymovingtheproperty()-stepthatsetthe"age"valueoutsideofcoalesce(),thepropertyisthensetforbothnewlycreatedverticesandforexistingones.
Itispossibletodosimilarsortsofoperationswithedgesusingthesamepattern:
gremlin>g.V().has('person','name','vadas').as('v').V().has('software','name','ripple').coalesce(__.inE('created').where(outV().as('v')),addE('created').from('v').property('weight',0.5))==>e[0][2-created->5]g.V().has('person','name','vadas').as('v').V().has('software','name','ripple').coalesce(__.inE('created').where(outV().as('v')),addE('created').from('v').property('weight',0.5))Inthiscase,theadjacentverticesoftheedgeareretrievedfirstandwithinthecoalesce(),theexistenceoftheedgeischeckedwithwhere()usingamatchingpatternonthe"v"labelandreturnediffound.Iftheedgeisnotfoundbetweenthesetwovertices,thenitiscreatedaspartofthesecondtraversalgiventocoalesce().
Considerthefollowingtraversaloverthe"modern"toygraph:
gremlin>g.V().hasLabel('person').groupCount().by('age')==>[32:1,35:1,27:1,29:1]g.V().hasLabel('person').groupCount().by('age')Theresultisanagedistributionthatsimplyshowsthatevery"person"inthegraphisofadifferentage.Insomecases,thisresultisexactlywhatisneeded,butsometimesagroupingmayneedtobetransformedtoprovideadifferentpictureoftheresult.Forexample,perhapsagroupingonthevalue"age"wouldbebetterrepresentedbyadomainconceptsuchas"young","old"and"veryold".
gremlin>g.V().hasLabel("person").groupCount().by(values("age").choose(is(lt(28)),constant("young"),choose(is(lt(30)),constant("old"),constant("veryold"))))==>[young:1,old:1,veryold:2]g.V().hasLabel("person").groupCount().by(values("age").choose(is(lt(28)),constant("young"),choose(is(lt(30)),constant("old"),constant("veryold"))))Notethatthebymodulatorhasbeenalteredfromsimplytakingastringkeyof"age"totakeaTraversal.ThatinnerTraversalutilizeschoosewhichislikeanif-then-elseclause.ThechooseisnestedandwouldlooklikethefollowinginJava:
if(age<28){return"young";}else{if(age<30){return"old";}else{return"veryold";}}TheuseofchooseisagoodintuitivechoiceforthisTraversalasitisanaturalmappingtoif-then-else,butthereisanotheroptiontoconsiderwithcoalesce:
gremlin>g.V().hasLabel("person").groupCount().by(values("age").coalesce(is(lt(28)).constant("young"),is(lt(30)).constant("old"),constant("veryold")))==>[young:1,old:1,veryold:2]g.V().hasLabel("person").groupCount().by(values("age").coalesce(is(lt(28)).constant("young"),is(lt(30)).constant("old"),constant("veryold")))Theansweristhesame,butthistraversalremovesthenestedchoose,whichmakesiteasiertoread.
OnecommonusecasewhenworkingwithGremlinistoperformcomplexloopingstatementsusingtherepeat()step.Whilemanyofthecommonpatternsforloopingwithintraversalsarediscussedinthedocumentationthereareseveralmorecomplexpatternsthatincludeadditionalstepswhicharealsocommonlyused.Thissectionattemptstodemonstratehowtousesomeofthesemorecomplexmeasurements.
Thefollowingexampleswillusethisgraphdepictedhere:
gremlin>g.V('A').repeat(out().simplePath()).until(hasId('G').or().loops().is(3))==>v[E]g.V('A').repeat(out().simplePath()).until(hasId('G').or().loops().is(3))Thekeyportionoftheabovetraversalistheuntil()step.ThisstepiscapableofacceptingaTraversalobjectwhichcontinuestherepeat()looptraversinguntiltheevaluatedoutputofthetraversalistrue,whenthetraverserexitstherepeat()loop.Thismethodologyallowscomplexlogicalconditionstobeusedastheexitcriteria.
Anothercommonsituationencounteredwhenusingrepeat()loopsisthedesiretoemitnotonlythevalueofthetraverserateachstep,butalsotoemitthedepthofthatelementintherepeatloop.Belowareseveraldifferentrecipesforaccomplishingthistaskbasedonwhatthedesiredoutput:
Ifthedesiredoutputistogeteachvertexanditsassociateddepththiscanbeaccomplishedusingthistraversal.
gremlin>g.withSack(1).V('A').repeat(both().simplePath().sack(assign).by(loops())).emit().project('vertex','depth').by().by(sack())==>[vertex:v[B],depth:0]==>[vertex:v[C],depth:1]==>[vertex:v[E],depth:2]==>[vertex:v[D],depth:2]==>[vertex:v[F],depth:3]==>[vertex:v[G],depth:4]g.withSack(1).V('A').repeat(both().simplePath().sack(assign).by(loops())).emit().project('vertex','depth').by().by(sack())However,ifthedesiredoutputistogetasingleresultperlevelcontainingalltheverticesatthatlevelthiscanbeaccomplishedwiththistraversal.
gremlin>g.V('A').repeat(both().simplePath().group('x').by(loops())).emit().times(3).cap('x')==>[0:[v[B]],1:[v[C]],2:[v[E],v[D]]]g.V('A').repeat(both().simplePath().group('x').by(loops())).emit().times(3).cap('x')OptionalLoopDepthAthirdcommonuseofrepeat()loopsistoloopthroughaseriesofstepsacertainnumberoftimesoruntiltherearenoadditionaledgestotraverse.Theexamplesbelowdemonstratehowthisloopterminationcanbeaccomplished.
Thefirstmethod,isanextensionofour"ConditionalLoopingwithMaxDepth"patternabovewheretheconditionintheuntil()checkingforavertexwithadegreeofzero.
gremlin>g.V('A').repeat(out().simplePath()).until(outE().count().is(0).or().loops().is(5))==>v[F]g.V('A').repeat(out().simplePath()).until(outE().count().is(0).or().loops().is(5))Anotherwaytoaccomplishthisrequirementistousetheoptional()step.Hereusetheoptional()stepwithinarepeat().times()loop.IntheexamplebelowwestartatvertexAandtraverseallout()edgesuptoamaximumof5times.
gremlin>g.V('A').repeat(optional(out())).times(5)==>v[F]g.V('A').repeat(optional(out())).times(5)Eachofthesepatternscanbecombinedwithanyvarietyofotherstepstosatisfymorecomplextraversalrequirements.Forexample,wecanfirstuseamoretraditionalrepeat()looptostartatvertexA,traverseallboth()edgestwotimes,andthentraverseout()edgesuptofivetimes.
gremlin>g.V('A').repeat(both()).times(2).repeat(optional(out())).times(5)==>v[F]==>v[F]g.V('A').repeat(both()).times(2).repeat(optional(out())).times(5)OperatingonDroppedElementsOnecommonscenariothathappenswhendroppingelementsusingatraversalisthedesiretoperformsomesortofoperationontheelementsbeingremoved.Thissimplestisreturningthenumberofelementsbeingdroppedfromthetraversal,whichisshownintheexamplebelowfromthe"modern"graph.
gremlin>g.V().has('lang','java').sideEffect(drop()).count()==>2g.V().has('lang','java').sideEffect(drop()).count()Whilethismayseemabitcounterintuitive,whatisoccurringinthisrecipeisthatthesideEffect()stepisperformingthedrop()whileeachoftheinputtraverserisstillpassedalongtotheremainderofthetraversal,whichinthiscaseisacount()operation.Thispatterncanbeextendedbeyondasimplecounttoperformavarietyofmorecomplextraversalsbasedonthedroppedtraversalsuchasremovingchildnodesinthecaseofatreeorsettingpropertiesonadjacentvertices.
Inmostdatabaseapplications,itisoftentimesdesirabletoreturndiscreteblocksofdataforaqueryratherthanallofthedatathatthetotalresultswouldcontain.Thisapproachtoreturningdataisreferredtoas"pagination"andtypicallyinvolvesasituationwheretheclientexecutingthequerycanspecifythestartpositionandendposition(ortheamountofdatatoreturninlieuoftheendposition)representingtheblockofdatatoreturn.Inthisway,onecouldreturnthefirsttenrecordsofonehundred,thenthesecondtenrecordsandsoon,untilpotentiallyallonehundredwerereturned.
InGremlin,abasicapproachtopagingwouldlooksomethinglikethefollowing:
Theonlywaytocompletelyavoidthatproblemistore-usethesametraversalinstance:
gremlin>t=g.V().hasLabel('person');[]gremlin>t.next(2)==>v[1]==>v[2]gremlin>t.next(2)==>v[4]==>v[6]t=g.V().hasLabel('person');[]t.next(2)t.next(2)Afurtherconsiderationrelatestotheorderinwhichresultsarereturned.TinkerPopdoesnotguaranteethattheorderoftheitemsreturnedonthesametraversalwillbethesameordereachtimethetraversalisiterated.TinkerPoponlyguaranteesthatitdoesnotre-shuffletheorderprovidedbytheunderlyinggraphdatabase.Thisguaranteehastwoimplications:
Asasimpleexample,consideragraphthatcontains"person"and"product"verticesconnectedby"bought"edges.Thefollowingscriptgeneratessomedataforthegraphusingthatbasicschema:
gremlin>g.V().has('name','alice').out('bought').values('name')==>product#5==>product#6==>product#7==>product#3==>product#4g.V().has('name','alice').out('bought').values('name')Thefollowingdiagramdepictsoneoftheedgestraversedintheaboveexamplebetween"alice"and"product#5".Obviously,theotherproducts"alice"boughtwouldhavesimilarrelations,butthisdiagramandthosetofollowwillfocusontheneighborhoodaroundthatproduct.
Thenextstepistodeterminewhoelsepurchasedthoseproducts:
gremlin>g.V().has('name','alice').out('bought').in('bought').dedup().values('name')==>alice==>bob==>jack==>jill==>jong.V().has('name','alice').out('bought').in('bought').dedup().values('name')Itisworthnotingthat"alice"isintheresultsabove.Sheshouldreallybeexcludedfromthelistastheinterestisinwhatindividualsotherthanherselfpurchased:
gremlin>g.V().has('name','alice').as('her').out('bought').in('bought').where(neq('her')).dedup().values('name')==>bob==>jack==>jill==>jong.V().has('name','alice').as('her').out('bought').in('bought').where(neq('her')).dedup().values('name')Thefollowingdiagramshows"alice"andthoseotherswhopurchased"product#5".
Theknowledgeofthepeoplewhoboughtthesamethingsas"alice"canthenbeusedtofindthesetofproductsthattheybought:
gremlin>g.V().has('name','alice').as('her').out('bought').in('bought').where(neq('her')).out('bought').dedup().values('name')==>product#1==>product#2==>product#3==>product#4==>product#5==>product#7==>product#9==>product#6==>product#8==>product#10g.V().has('name','alice').as('her').out('bought').in('bought').where(neq('her')).out('bought').dedup().values('name')
Thissetofproductscouldbethebasisforrecommendation,butitisimportanttorememberthat"alice"mayhavealreadypurchasedsomeoftheseproductsanditwouldbebettertonotpesterherwithrecommendationsforproductsthatshealreadyowns.Thoseproductsshealreadypurchasedcanbeexcludedasfollows:
gremlin>g.V().has('name','alice').as('her').out('bought').aggregate('self').in('bought').where(neq('her')).out('bought').where(without('self')).dedup().values('name')==>product#1==>product#2==>product#9==>product#8==>product#10g.V().has('name','alice').as('her').out('bought').aggregate('self').in('bought').where(neq('her')).out('bought').where(without('self')).dedup().values('name')
Thefinalstepwouldbetogrouptheremainingproducts(insteadofdedup()whichwasmostlydonefordemonstrationpurposes)toformaranking:
gremlin>g.V().has('person','name','alice').as('her').////(1)out('bought').aggregate('self').////(2)in('bought').where(neq('her')).////(3)out('bought').where(without('self')).////(4)groupCount().order(local).by(values,desc)////(5)==>[v[10]:6,v[26]:5,v[12]:5,v[24]:4,v[28]:2]g.V().has('person','name','alice').as('her').////(1)out('bought').aggregate('self').////(2)in('bought').where(neq('her')).////(3)out('bought').where(without('self')).////(4)groupCount().order(local).by(values,desc)//5Thepreviousexamplewasalreadydescribedas"basic"andobviouslycouldtakeintoaccountwhateverdataisavailabletofurtherimprovethequalityoftherecommendation(e.g.productratings,timesofpurchase,etc.).Oneoptiontoimprovethequalityofwhatisrecommended(withoutexpandingthepreviousdataset)mightbetochoosethepersonverticesthatmakeuptherecommendationto"alice"whohavethelargestcommonsetofpurchases.
Lookingbacktothepreviouscodeexample,consideritsmorestripdownrepresentationthatshowsthoseindividualswhohaveatleastoneproductincommon:
gremlin>g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup()==>v[2]==>v[6]==>v[8]==>v[4]g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup()Next,dosomegroupingtofindcounthowmanyproductstheyhaveincommon:
gremlin>g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count())==>[v[2]:3,v[4]:2,v[6]:3,v[8]:2]g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count())Theaboveoutputshowsthatthebestthatcanbeexpectedisthreecommonproducts.Thetraversalneedstobeawareofthatmaximum:
gremlin>g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).select(values).order(local).by(desc).limit(local,1)==>3g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).select(values).order(local).by(desc).limit(local,1)Withthemaximumvalueavailable,itcanbeusedtochosethose"person"verticesthathavethethreeproductsincommon:
gremlin>g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).as("g").select(values).order(local).by(desc).limit(local,1).as("m").select("g").unfold().where(select(values).as("m")).select(keys)==>v[2]==>v[6]g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).as("g").select(values).order(local).by(desc).limit(local,1).as("m").select("g").unfold().where(select(values).as("m")).select(keys)Nowthatthereisalistof"person"verticestobasetherecommendationon,traversetotheproductsthattheypurchased:
gremlin>g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).as("g").select(values).order(local).by(desc).limit(local,1).as("m").select("g").unfold().where(select(values).as("m")).select(keys).out("bought").where(without("self"))==>v[10]==>v[12]==>v[26]==>v[10]g.V().has("person","name","alice").as("alice").out("bought").aggregate("self").in("bought").where(neq("alice")).dedup().group().by().by(out("bought").where(within("self")).count()).as("g").select(values).order(local).by(desc).limit(local,1).as("m").select("g").unfold().where(select(values).as("m")).select(keys).out("bought").where(without("self"))Theaboveoutputshowsthatoneproductisheldincommonmakingitthetoprecommendation:
gremlin>g.V().has('person','name','alice').as('her').out('bought').aggregate('self').in('bought').where(neq('her')).out('bought').where(without('self')).timeLimit(1000).groupCount().order(local).by(values,desc)==>[v[10]:6,v[26]:5,v[12]:5,v[24]:4,v[28]:2]g.V().has('person','name','alice').as('her').out('bought').aggregate('self').in('bought').where(neq('her')).out('bought').where(without('self')).timeLimit(1000).groupCount().order(local).by(values,desc)Inusingsamplingmethods,itisimportanttoconsiderthatthenaturalorderingofedgesinthegraphmaynotproduceanidealsamplefortherecommendation.Forexample,iftheedgesendupbeingreturnedoldestfirst,thentherecommendationwillbebasedontheoldestdata,whichwouldnotbeideal.Aswithanytraversal,itisimportanttounderstandthenatureofthegraphbeingtraversedandthebehavioroftheunderlyinggraphdatabasetoproperlyachievethedesiredoutcome.
gremlin>g=g.withComputer()==>graphtraversalsource[tinkergraph[vertices:5edges:5],graphcomputer]gremlin>g.V(1).shortestPath().with(ShortestPath.edges,Direction.OUT).with(ShortestPath.target,hasId(5))==>[v[1],v[2],v[4],v[5]]g=g.withComputer()g.V(1).shortestPath().with(ShortestPath.edges,Direction.OUT).with(ShortestPath.target,hasId(5))Thepreviousexampledefinesthelengthofthepathbythenumberofverticesinthepath,butthe"path"mightalsobemeasuredbydatawithinthegraphitself.Thefollowingexampleusethesamegraphstructureasthepreviousexample,butincludesa"weight"ontheedges,thatwillbeusedtohelpdeterminethe"cost"ofaparticularpath:
gremlin>g=g.withComputer()==>graphtraversalsource[tinkergraph[vertices:5edges:5],graphcomputer]gremlin>g.V(1).shortestPath().with(ShortestPath.edges,Direction.OUT).with(ShortestPath.distance,'weight').with(ShortestPath.target,hasId(5))==>[v[1],v[2],v[3],v[4],v[5]]g=g.withComputer()g.V(1).shortestPath().with(ShortestPath.edges,Direction.OUT).with(ShortestPath.distance,'weight').with(ShortestPath.target,hasId(5))Thefollowingqueryillustrateshowselect(
gremlin>g.V().has('name','marko').out('knows').has('age',gt(29)).values('name')==>joshg.V().has('name','marko').out('knows').has('age',gt(29)).values('name')Inplainlanguage,theaboveGremlinasks,"WhatarethenamesofthepeoplewhoMarkoknowswhoareovertheageof29".Inthiscase,"29"isknownasaconstanttothetraversal.Ofcourse,ifthequestionischangedslightlytoinsteadask,"WhatarethenamesofthepeoplewhoMarkoknowswhoareolderthanheis",thehardcodingof"29"willnolongersuffice.TherearemultiplewaysGremlinwouldallowthissecondquestiontobeanswered.Thefirstisobvioustoanyprogrammer-useavariable:
gremlin>vMarko=g.V().has('name','marko').next()==>v[1]gremlin>g.V(vMarko).out('knows').has('age',gt(marko.value('age'))).values('name')==>joshvMarko=g.V().has('name','marko').next()g.V(vMarko).out('knows').has('age',gt(marko.value('age'))).values('name')Thedownsidetothisapproachisthatittakestwoseparatetraversalstoanswerthequestion.Ideally,thereshouldbeasingletraversal,thatcanquery"marko"once,determinehisageandthenusethatforthevaluesuppliedtofilterthepeopleheknows.Inthiswaythevaluefortheageinthehas()-filterisinducedfromtheTraversalitself.
gremlin>g.V().has('name','marko').as('marko').////(1)out('knows').as('friend').////(2)where('friend',gt('marko')).by('age').////(3)values('name')////(4)==>joshg.V().has('name','marko').as('marko').////(1)out('knows').as('friend').////(2)where('friend',gt('marko')).by('age').////(3)values('name')//4Asanotherexampleofhowtraversalinducedvaluescanbeused,considerascenariowheretherewasagraphthatcontainedpeople,theirfriendshiprelationships,andthemoviesthattheyliked.
gremlin>g.V().has("name","alice").out("friend").aggregate("friends").////(1)out("like").dedup().////(2)filter(__.in("like").where(within("friends")).count().as("a").////(3)select("friends").count(local).where(eq("a"))).////(4)values("name")==>younggunsg.V().has("name","alice").out("friend").aggregate("friends").////(1)out("like").dedup().////(2)filter(__.in("like").where(within("friends")).count().as("a").////(3)select("friends").count(local).where(eq("a"))).////(4)values("name")Traversalinducedvaluesarenotjustforfiltering.TheycanalsobeusedwhenwritingthevaluesofthepropertiesofoneVertextoanother:
gremlin>g.V().has('name','marko').as('marko').out('created').property('creator',select('marko').by('name'))==>v[3]gremlin>g.V().has('name','marko').out('created').elementMap()==>[id:3,label:software,creator:marko,name:lop,lang:java]g.V().has('name','marko').as('marko').out('created').property('creator',select('marko').by('name'))g.V().has('name','marko').out('created').elementMap()Inamorecomplexexampleofhowthismightwork,considerasituationwherethegoalistopropagateavaluestoredonaparticularvertexthroughoneofmoreadditionalconnectedverticesusingsomevalueontheconnectingedgestodeterminethevaluetoassign.Forexample,thefollowinggraphdepictsthree"tank"verticeswheretheedgesrepresentthedirectionaparticular"tank"shoulddrainandthe"factor"bywhichitshoulddoit:
Ifthetraversalstartedattank"a",thenthevalueof"amount"onthattankwouldbeusedtocalculatewhatthevalueoftank"b"wasbymultiplyingitbythevalueofthe"factor"propertyontheedgebetweenvertices"a"and"b".Inthiscasetheamountoftank"b"wouldthenbe50.Followingthispattern,whengoingfromtank"b"totank"c",thevalueofthe"amount"oftank"c"wouldbe5.
UsingGremlinsack(),thiskindofoperationcouldbespecifiedasasingletraversal:
Asshowninthepreviousexample,sack()isausefulwayto"carry"andmanipulateavaluethatcanbelaterusedelsewhereinthetraversal.Hereisanotherexampleofitsusagewhereitisutilizedtoincrementallthe"age"valuesinthemoderntoygraphby10:
gremlin>g.withSack(0).V().has("age").sack(assign).by("age").sack(sum).by(constant(10)).property("age",sack()).elementMap()==>[id:1,label:person,name:marko,age:39]==>[id:2,label:person,name:vadas,age:37]==>[id:4,label:person,name:josh,age:42]==>[id:6,label:person,name:peter,age:45]g.withSack(0).V().has("age").sack(assign).by("age").sack(sum).by(constant(10)).property("age",sack()).elementMap()Intheabovecase,thesackisinitializedtozeroandaseachvertexisiterated,the"age"isassignedtothesackwithsack(assign).by('age').Thatvalueinthesackisthenincrementedbythevalueconstant(10)andassignedtothe"age"propertyofthesamevertex.
Thisvaluethesackisincrementedbyneednotbeaconstant.Itcouldalsobederivedfromthetraversalitself.Usingthesameexample,the"weight"propertyontheincidentedgeswillbeusedasthevaluetoaddtothesack:
gremlin>g.withSack(0).V().has("age").sack(assign).by("age").sack(sum).by(bothE().values("weight").sum()).property("age",sack()).elementMap()==>[id:1,label:person,name:marko,age:30.9]==>[id:2,label:person,name:vadas,age:27.5]==>[id:4,label:person,name:josh,age:34.4]==>[id:6,label:person,name:peter,age:35.2]g.withSack(0).V().has("age").sack(assign).by("age").sack(sum).by(bothE().values("weight").sum()).property("age",sack()).elementMap()Tree
Thefollowingcodesimplysetsupthegraphdepictedaboveusing"hasParent"fortheedgelabel:
gremlin>g.V('A').repeat(out('hasParent')).emit().as('x').repeat(__.in('hasParent')).emit(hasId('D')).select('x').limit(1)==>v[C]g.V('A').repeat(out('hasParent')).emit().as('x').repeat(__.in('hasParent')).emit(hasId('D')).select('x').limit(1)TheabovetraversalisreasonablystraightforwardtofollowinthatitsimplytraversesupthetreefromtheAvertexandthentraversesdownfromeachancestoruntilitfindsthe"D"vertex.Thefirstpaththatuncoversthatmatchisthelowestcommonancestor.
Thecomplexityoffindingthelowestcommonancestorincreaseswhentryingtofindtheancestorsofthreeormorevertices.
gremlin>input=['A','B','D']==>A==>B==>Dgremlin>g.V(input.head()).repeat(out('hasParent')).emit().as('x').////(1)V().has(id,within(input.tail())).////(2)repeat(out('hasParent')).emit(where(eq('x'))).////(3)group().by(select('x')).by(path().count(local).fold()).////(4)unfold().filter(select(values).count(local).is(input.tail().size())).////(5)order().by(select(values).unfold().sum()).////(6)select(keys).limit(1)////(7)==>v[C]input=['A','B','D']g.V(input.head()).repeat(out('hasParent')).emit().as('x').////(1)V().has(id,within(input.tail())).////(2)repeat(out('hasParent')).emit(where(eq('x'))).////(3)group().by(select('x')).by(path().count(local).fold()).////(4)unfold().filter(select(values).count(local).is(input.tail().size())).////(5)order().by(select(values).unfold().sum()).////(6)select(keys).limit(1)//7Astheabovetraversalutilizesamid-traversalV(),itcannotbeusedforOLAP.InOLAP,thepatternchangesabit:
gremlin>g.withComputer().V().has(id,within(input)).aggregate('input').hasId(input.head()).////(1)repeat(out('hasParent')).emit().as('x').select('input').unfold().has(id,within(input.tail())).repeat(out('hasParent')).emit(where(eq('x'))).group().by(select('x')).by(path().count(local).fold()).unfold().filter(select(values).count(local).is(input.tail().size())).order().by(select(values).unfold().sum()).select(keys).limit(1)==>v[C]g.withComputer().V().has(id,within(input)).aggregate('input').hasId(input.head()).////(1)repeat(out('hasParent')).emit().as('x').select('input').unfold().has(id,within(input.tail())).repeat(out('hasParent')).emit(where(eq('x'))).group().by(select('x')).by(path().count(local).fold()).unfold().filter(select(values).count(local).is(input.tail().size())).order().by(select(values).unfold().sum()).select(keys).limit(1)MaximumDepthFindingthemaximumdepthofatreestartingfromaspecifiedrootvertexcanbedeterminedasfollows:
gremlin>g.V().has('name','F').repeat(__.in()).emit(__.not(inE())).tail(1).path().count(local)==>5gremlin>g.V().has('name','C').repeat(__.in()).emit(__.not(inE())).tail(1).path().count(local)==>3g.V().has('name','F').repeat(__.in()).emit(__.not(inE())).tail(1).path().count(local)g.V().has('name','C').repeat(__.in()).emit(__.not(inE())).tail(1).path().count(local)Therearetwooptimizationsatplay.First,thereisnoneedtoemitallthevertices,onlythe"leaf"vertices(i.e.thosewithoutincomingedges).Second,allresultssavethelastonecanbeignoredtothatpoint(i.e.thelastoneistheoneatthedeepestpointinthetree).Inthisway,thepathandpathlengthonlyneedtobecalculatedforasingleresult.
Thepreviousapproachestocalculatingthemaximumdepthusepathcalculationstoachievetheanswer.Pathcalculationscanbeexpensiveandifpossibleavoidediftheyarenotneeded.Anotherwaytoexpressatraversalthatcalculatesthemaximumdepthistousethesack()-step:
gremlin>g.withSack(1).V().has('name','F').repeat(__.in().sack(sum).by(constant(1))).emit().sack().max()==>5gremlin>g.withSack(1).V().has('name','C').repeat(__.in().sack(sum).by(constant(1))).emit().sack().max()==>3g.withSack(1).V().has('name','F').repeat(__.in().sack(sum).by(constant(1))).emit().sack().max()g.withSack(1).V().has('name','C').repeat(__.in().sack(sum).by(constant(1))).emit().sack().max()Time-basedIndexingTreescanbeusedformodelingtime-orienteddatainagraph.Modelingtimewherethereare"year","month"and"day"vertices(orlowergranularityasneeded)allowsthestructureofthegraphtoinherentlyindexdatatiedtothem.
TheGremlinscriptbelowcreatesthegraphdepictedinthegraphabove:
MostconfigurationproblemsofTinkerPopwithSparkonYARNstemfromthreereasons:
ThisrecipeissuitableforbotharealexternalandalocalpseudoHadoopcluster.WhiletherecipeismaintainedforthevanillaHadooppseudo-cluster,ithasbeenreportedtoworkonrealclusterswithHadoopdistributionsfromvariousvendors.
ForstartingtheGremlinConsoleintherightenvironment,createashellscript(e.g.bin/spark-yarn.sh)withthecontentsbelow.Ofcourse,actualvaluesforGREMLIN_HOME,HADOOP_HOMEandHADOOP_CONF_DIRneedtobeadaptedtoyourparticularenvironment.
#!/bin/bash#VariablestobeadaptedtotheactualenvironmentGREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-3.7.3-standaloneexportHADOOP_HOME=/usr/local/lib/hadoop-3.3.1exportHADOOP_CONF_DIR=/usr/local/lib/hadoop-3.3.1/etc/hadoop#HaveTinkerPopfindthehadoopclusterconfigsandhadoopnativelibrariesexportCLASSPATH=$HADOOP_CONF_DIRexportJAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64"#Startgremlin-consolewithoutgettingtheHADOOP_GREMLIN_LIBSwarningcd$GREMLIN_HOME[!-eempty]&&mkdiremptyexportHADOOP_GREMLIN_LIBS=$GREMLIN_HOME/emptybin/gremlin.shRunningthejobYoucannowrunagremlinOLAPquerywithSparkonYARN:
Rather,itexploitsthespark.yarn.archiveproperty,whichpointstoanarchivewithjarsonthelocalfilesystemandisloadedintothevariousYARNcontainers.Asaresultthespark-gremlin.ziparchivebecomesavailableasthedirectorynamed__spark_libs__intheYARNcontainers.Thespark.executor.extraClassPathandspark.yarn.appMasterEnv.CLASSPATHpropertiespointtothejarsinsidethisdirectory.Thisiswhytheycontainthe./__spark_lib__/*item.JustbecauseaSparkexecutorgotthearchivewithjarsloadedintoitscontainer,doesnotmeanitknowshowtoaccessthem.
AlsotheHADOOP_GREMLIN_LIBSmechanismisnotusedbecauseitcannotworkforSparkonYARNasimplemented(jarsaddedtotheSparkContextarenotavailabletotheYARNapplicationmaster).
ThisrecipeusestheGremlinConsole,butthingsshouldnotbeverydifferentforyourownJVM-basedapplication,aslongasyoudonotusethespark-submitorspark-shellcommands.YouwillalsowanttochecktheadditionalruntimedependencieslistedintheGremlin-Plugin-Dependenciessectionofthemanifestfileinthespark-gremlinjar.
YoumaynotliketheideathattheHadoopandSparkjarsfromtheTinkerPopdistributiondifferfromtheversionsinyourcluster.Ifso,justbuildTinkerPopfromsourcewiththecorrespondingdependencieschangedinthevariouspom.xmlfiles(e.g.spark-core_2.11-2.2.0-some-vendor.jarinsteadofspark-core_2.11-2.2.0.jar).Ofcourse,TinkerPopwillonlybuildforexactlymatchingorslightlydifferingartifactversions.
Thefollowingcodeblockshowsafewexamplesthataregoodforprototypingorgraphdiscovery.
gremlin>g.V().has("person","name","marko").out()==>v[3]==>v[2]==>v[4]gremlin>g.V().has("person","name","marko").out("created").elementMap()==>[id:3,label:software,name:lop,lang:java]gremlin>g.V().has("software","name","ripple").inE().has("weight",gte(0.5)).outV().properties()==>vp[name->josh]==>vp[age->32]g.V().has("person","name","marko").out()g.V().has("person","name","marko").out("created").elementMap()g.V().has("software","name","ripple").inE().has("weight",gte(0.5)).outV().properties()Thenextcodeblockshowsthesamequeries,butwithspecifiedkeysandlabels.
Anoftenseenanti-patternistheonethatexplicitlytraversestoanedgeandthentoavertexwithoutusinganyfilters.
gremlin>g.V().hasLabel("person").outE("created").inV().dedup()////(1)==>v[3]==>v[5]gremlin>g.V().hasLabel("software").inE("created").outV().count()////(2)==>4g.V().hasLabel("person").outE("created").inV().dedup()////(1)g.V().hasLabel("software").inE("created").outV().count()//2Thenextcodeblockshowsthetwoaforementionedqueriesproperlyrewritten.
gremlin>g.V().hasLabel("person").out("created").dedup()==>v[3]==>v[5]gremlin>g.V().hasLabel("software").inE("created").count()==>4g.V().hasLabel("person").out("created").dedup()g.V().hasLabel("software").inE("created").count()Anotheranti-patternthatiscommonlyseenisthechainingofwhere()-stepsusingpredicates.Considerthefollowingtraversal:
gremlin>g.V().as('a').both().where(lt('a')).by(id).as('b').both().where(lt('a').and(gt('b'))).by(id).as('c').not(both().where(eq('a'))).select('a','b','c').by('name')==>[a:lop,b:marko,c:vadas]==>[a:josh,b:marko,c:vadas]==>[a:peter,b:lop,c:josh]g.V().as('a').both().where(lt('a')).by(id).as('b').both().where(lt('a').and(gt('b'))).by(id).as('c').not(both().where(eq('a'))).select('a','b','c').by('name')Theprofile()outputofbothqueriesshouldmakeclearwhythisisbetterthanusingtwowhere()-steps.
Theeasyfixfortheinitiallymentionedqueryfollowsinthecodeblockbelow.
gremlin>g.V().groupCount().by(label())==>[software:2,person:4]gremlin>g.V().group().by(label()).by(id().fold())==>[software:[3,5],person:[1,2,4,6]]gremlin>g.V().project("id","label").by(id()).by(label())==>[id:1,label:person]==>[id:2,label:person]==>[id:3,label:software]==>[id:4,label:person]==>[id:5,label:software]==>[id:6,label:person]gremlin>g.V().choose(label()).option("person",project("person").by(values("name"))).option("software",project("product").by(values("name")))==>[person:marko]==>[person:vadas]==>[product:lop]==>[person:josh]==>[product:ripple]==>[person:peter]g.V().groupCount().by(label())g.V().group().by(label()).by(id().fold())g.V().project("id","label").by(id()).by(label())g.V().choose(label()).option("person",project("person").by(values("name"))).option("software",project("product").by(values("name")))Withtokensusedinsteadofstepsthetraversalsbecomealittleshorterandmorereadable.
StartingwiththelatterissueofPandTraversalitshouldbenotedthatwhilePvaluestakeObjectandthusaTraversalitdoesnotmeantheTraversalwillberesolvedtoaresultthatwillbecomparable.PwillratherdoacompareontherawTraversalobjectwhichofcoursewillalwaysreturnfalse(unlessforsomeoddreasonyouhappentostorethatTraversalobjectinyourgraph):
gremlin>g.V().has('name',eq(constant('josh')))gremlin>eq(constant('josh'))==>eq([ConstantStep(josh)])g.V().has('name',eq(constant('josh')))eq(constant('josh'))Asfortheformerissuewithhas(String,Traversal),thisrequiresabitmoreexplanation.TheTraversalobjectismeanttobetreatedasaPredicate,meaningthatifitreturnsavaluethehas()willallowthetraversertopass:
gremlin>g.V().has('name',constant('josh'))////(1)==>v[1]==>v[2]==>v[3]==>v[4]==>v[5]==>v[6]gremlin>g.V().has('name',constant('josh').is('xyz'))////(2)g.V().has('name',constant('josh'))////(1)g.V().has('name',constant('josh').is('xyz'))//2Theseexamplesareabitcontrivedforsakeofdemonstration,butthecommonpatternfolksattemptappearsasfollows:
gremlin>g.withSideEffect('x',['name':'josh']).V().has('name',select('x').select('name'))==>v[1]==>v[2]==>v[3]==>v[4]==>v[5]==>v[6]g.withSideEffect('x',['name':'josh']).V().has('name',select('x').select('name'))Theaboveexamplerepresentsacommonlyseenmistakewherewetrytodynamicallyinjectthevalue"josh"fromaMapstoredinaside-effectnamed"x".Aswecansee,sinceselect('x').select('name')returnsavaluethehas()succeedsforeverysinglevertexwhichisunexpected.Thecorrectwaytodothisdynamicinjectioniswithwhere()asinthefollowingexample:
Goodsoftwaredevelopmentpracticesrequirereusetokeepsoftwaremaintainable.InGremlin,thereareoftenbitsoftraversallogicthatcouldberepresentedascomponentsthatmightbetestedindependentlyandutilizedaspartofothertraversals.OneapproachtodoingthiswouldbetoextractsuchlogicintoananonymoustraversalandprovideittoaparenttraversalthroughflatMap()-step.
Usingthemoderntoygraphasanexample,assumethattherearenumberoftraversalsthatareinterestedinfilteringonedgeswherethe"weight"propertyisgreaterthan"0.5".Aquerylikethatmightlooklikethis:
gremlin>g.V(1).outE("knows").has('weight',P.gt(0.5d)).inV().both()==>v[5]==>v[3]==>v[1]g.V(1).outE("knows").has('weight',P.gt(0.5d)).inV().both()Repeatedlyrequiringthatfilteron"weight"couldleadtoalotofduplicatecode,whichbecomesdifficulttomaintain.Itwouldbenicetoextractthatlogicsoastocentralizeitforreuseinallplaceswhereneeded.Ananonymoustraversalallowsthattohappenandcanbecreatedasfollows.
gremlin>weightFilter=outE("knows").has('weight',P.gt(0.5d)).inV();[]gremlin>g.V(1).flatMap(weightFilter).both()==>v[5]==>v[3]==>v[1]weightFilter=outE("knows").has('weight',P.gt(0.5d)).inV();[]g.V(1).flatMap(weightFilter).both()TheweightFilterisananonymoustraversalanditiscreatedbyway__class.The__isomittedabovefrominitalizationofweightFilterbecauseitisstaticallyimportedtotheGremlinConsole.TheweightFiltergetspassedtothe"full"traversalbywayforflatMap()-stepandtheresultsarethesame.Ofcourse,thereisaproblem.IfthereisanattempttousethatweightFilterasecondtime,thetraversalwiththrownanexceptionbecauseboththeweightFilterandparenttraversalhavebeen"compiled"whichpreventstheirre-use.AsimplefixtothiswouldbetoclonetheweightFilter.
gremlin>weightFilter=outE("knows").has('weight',P.gt(0.5d)).inV();[]gremlin>g.V(1).flatMap(weightFilter.clone()).both()==>v[5]==>v[3]==>v[1]gremlin>g.V(1).flatMap(weightFilter.clone()).bothE().otherV()==>v[5]==>v[3]==>v[1]gremlin>g.V(1).flatMap(weightFilter.clone()).groupCount()==>[v[4]:1]weightFilter=outE("knows").has('weight',P.gt(0.5d)).inV();[]g.V(1).flatMap(weightFilter.clone()).both()g.V(1).flatMap(weightFilter.clone()).bothE().otherV()g.V(1).flatMap(weightFilter.clone()).groupCount()NowtheweightFiltercanbereusedoverandoveragain.Rememberingtoclone()mightleadtoyetanothermaintenanceissueinthatfailingtorecallthatstepwouldlikelyresultinabug.OneoptionmightbetowraptheweightFiltercreationinafunctionthatreturnstheclone.Anotherapproachmightbetoparameterizethatfunctiontoconstructanewanonymoustraversaleachtimewiththeideabeingthatthismightgainevenmoreflexibilityinparameterizingtheanonymoustraversalitself.
Tocontributearecipe,firstclonetherepository:
lsdocs/src/recipesEachrecipeexistswithinaseparate.asciidocfile.Thefilenameshouldmatchthenameoftherecipe.Recipenamesshouldbeshort,butdescriptive(astheyneedtofitintheleft-handtableofcontentswhengenerated).Theindex.asciidocistheparentdocumentthat"includes"thecontentofeachindividualrecipefile.Arecipefileisincludedintheindex.asciidocwithanentrylikethis:include::my-recipe.asciidoc[]
Documentationshouldbegeneratedlocallyforreviewpriortosubmittingapullrequest.TinkerPopdocumentationis"live"inthatitisboundtoaspecificversionwhengenerated.Furthermore,codeexamples(thosethataregremlin-groovybased)areexecutedatdocumentgenerationtimewiththeresultswrittendirectlyintotheoutput.Thefollowingcommandwillgeneratethedocumentationwith:
bin/process-docs.shThegenerateddocumentationcanbefoundattarget/docs/htmlsingle/recipes.Thisprocesscanbelongonthefirstrunofthedocumentationasitisgeneratingallofthedocumentationlocally(e.g.referencedocumentation,tutorials,etc).Togeneratejusttherecipes,followthisprocess:
bin/process-docs.sh-fdocs/src/recipesThebin/process-docs.shapproachrequiresthatHadoopisinstalled.Toavoidthatprerequisite,tryusingDocker:
docker/build.sh-dThedownsidetousingDockeristhattheprocesswilltakelongeraseachrunwillrequiretheentiredocumentationsettobegenerated.
Foreachpersonina"follows"graph,determinethenumberoffollowersandlisttheirnames.
gremlin>g.V().group().by('name').by(project('numFollowers','followers').by(__.in('follows').count()).by(__.in('follows').values('name').fold())).next()==>daniel={numFollowers=0,followers=[]}==>matthias={numFollowers=0,followers=[]}==>josh={numFollowers=2,followers=[matthias,daniel]}==>marko={numFollowers=2,followers=[josh,daniel]}g.V().group().by('name').by(project('numFollowers','followers').by(__.in('follows').count()).by(__.in('follows').values('name').fold())).next()oreven:
gremlin>g.V().group().by('name').by(__.in('follows').values('name').fold().project('numFollowers','followers').by(count(local)).by()).next()==>daniel={numFollowers=0,followers=[]}==>matthias={numFollowers=0,followers=[]}==>josh={numFollowers=2,followers=[matthias,daniel]}==>marko={numFollowers=2,followers=[josh,daniel]}g.V().group().by('name').by(__.in('follows').values('name').fold().project('numFollowers','followers').by(count(local)).by()).next()Inthe"modern"graph,showeachperson,thesoftwaretheyworkedonandtheco-workercountforthesoftwareandthenamesofthoseco-workers.
gremlin>g.V().hasLabel("person").as("p").out("created").as("s").map(__.in("created").where(neq("p")).values("name").fold()).group().by(select("p").by("name")).by(group().by(select("s").by("name")).by(project("numCoworkers","coworkers").by(count(local)).by())).next()==>peter={lop={numCoworkers=2,coworkers=[marko,josh]}}==>josh={ripple={numCoworkers=0,coworkers=[]},lop={numCoworkers=2,coworkers=[marko,peter]}}==>marko={lop={numCoworkers=2,coworkers=[josh,peter]}}g.V().hasLabel("person").as("p").out("created").as("s").map(__.in("created").where(neq("p")).values("name").fold()).group().by(select("p").by("name")).by(group().by(select("s").by("name")).by(project("numCoworkers","coworkers").by(count(local)).by())).next()Assumingagraphofstudents,classesandtimes,detectstudentswhohaveaconflictingschedule.
gremlin>g.V(1).as("a").V(3).addE("created").property("weight",0.4d).from("a").iterate()gremlin>g.V(1).outE("created")==>e[0][1-created->3]==>e[9][1-created->3]gremlin>g.V().as("a").out().as("b").groupCount().by(select("a","b")).unfold().filter(select(values).is(gt(1))).select(keys)==>[a:v[1],b:v[3]]g.V(1).as("a").V(3).addE("created").property("weight",0.4d).from("a").iterate()g.V(1).outE("created")g.V().as("a").out().as("b").groupCount().by(select("a","b")).unfold().filter(select(values).is(gt(1))).select(keys)ThefollowingexampleassumesthattheedgespointintheOUTdirection.Assumingundirectededges:
gremlin>g.V().where(without("x")).as("a").outE().as("e").inV().as("b").filter(bothE().where(neq("e")).otherV().where(eq("a"))).aggregate(local,"x").select("a","b").dedup()g.V().where(without("x")).as("a").outE().as("e").inV().as("b").filter(bothE().where(neq("e")).otherV().where(eq("a"))).aggregate(local,"x").select("a","b").dedup()Inthe"crew"graph,findverticesthatmatchonacompletesetofmulti-properties.
gremlin>places=["centreville","dulles"];[]//willnotmatchas"purcellville"ismissinggremlin>g.V().not(has("location",without(places))).where(values("location").is(within(places)).count().is(places.size())).elementMap()gremlin>places=["centreville","dulles","purcellville"];[]gremlin>g.V().not(has("location",without(places))).where(values("location").is(within(places)).count().is(places.size())).elementMap()==>[id:7,label:person,name:stephen,location:purcellville]places=["centreville","dulles"];[]//willnotmatchas"purcellville"ismissingg.V().not(has("location",without(places))).where(values("location").is(within(places)).count().is(places.size())).elementMap()places=["centreville","dulles","purcellville"];[]g.V().not(has("location",without(places))).where(values("location").is(within(places)).count().is(places.size())).elementMap()Methodsforperformingsomebasicmathematicaloperationsinthe"modern"graph.
gremlin>g.V().values("age").sum()//sumallages==>123gremlin>g.V().values("age").fold(1,mult)//multiplyallages==>876960gremlin>g.withSack(0).V().values("age").sack(sum).sack(sum).by(constant(-1)).sack()//subtract1==>28==>26==>31==>34gremlin>g.withSack(0).V().values("age").sack(sum).sack(sum).sack()//multiplyby2(simple)==>58==>54==>64==>70gremlin>g.withSack(0).V().values("age").sack(sum).sack(mult).by(constant(2)).sack()//multiplyby2(generallyusefulformultiplicationsbyn)==>58==>54==>64==>70g.V().values("age").sum()//sumallagesg.V().values("age").fold(1,mult)//multiplyallagesg.withSack(0).V().values("age").sack(sum).sack(sum).by(constant(-1)).sack()//subtract1g.withSack(0).V().values("age").sack(sum).sack(sum).sack()//multiplyby2(simple)g.withSack(0).V().values("age").sack(sum).sack(mult).by(constant(2)).sack()//multiplyby2(generallyusefulformultiplicationsbyn)