Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about parse forest support #4

Open
randyv12 opened this issue Feb 6, 2018 · 5 comments
Open

Question about parse forest support #4

randyv12 opened this issue Feb 6, 2018 · 5 comments

Comments

@randyv12
Copy link

randyv12 commented Feb 6, 2018

Hey there, I really like this project, and it seems very challenging to implement, but rewarding once it works on a well written grammar. Good job on this and it's really good. I've started doing research into NLPs and remembered something that I've learned in college which is the earley parser :)

I have a few questions regarding parse forest traversal and how we can output multiple parse trees.

Is it fully supported to be able to parse ambiguous grammars and output multiple parse trees based on the grammar? A few months back I've tried using the parse forest for an ambiguous grammar. However I ran into a few issues with using the parse forest visitor to display multiple parse trees. Do we have any good tips on how to accomplish that task? ^_^

I'm not sure if it's truly implemented or I have not yet stumbled into the how-to of parse forest traversal. I've checked the specs in the code base, it seems covered but I still ran into a problem of a stack trace. That is Ok tho, I have created a different solution, currently in progress but it can work:

  1. Is there an already written way on how to solve this in the code base, or something we can learn from based on the current design of the parse forest (i.e. is it an adopted format where there exists a solution for its traversal)
  2. Written below is my sort-of working solution(not pseudo code) that works 80% of the time but still fixing bugs on it (lol -_-) -- i hope i am doing this right, haha. please tell me if this is the right direction to approach this

before i start..

definition:
path -- array of nodes (n1,n2,n3, ...) where n1 is connected to n2
paths -- array of path

Shying away from the subscriber/publisher architecture of traversing the parse forest. I tried traversing it with DFS, creating an array of many paths, and cloning the path per every ALT node :or refinement and then forking the traversal and re-continue traversal, that sort of gave me a starting point that is slow in performance but it can give me results. Then I realized that multiple non terminal S nodes with refinement :or can also provide multiple parse trees.

Doing it this way was ok, however, it presented a few issues for me :(

  1. previsit and postvisit during the traversal was if-casey, it had two or three conditionals
  2. it's slightly difficult to see the structure of the parse forest, since the root node can have multiple S nodes of a parse forest, i tried using the renderers that can output parse trees, but no success, i will try again this week
  3. it's hard to log the label of an alternative node if the paths are duplicated because it can generate a multiple parse trees. maybe when i create the new paths, i should store the label of the current node i am traversing, then store the labels of the subnodes of it.

I truly apologize if this is confusing to you but I just want to ask if we have already a fully working way of converting a parse forest to an array of parse trees (Rley::Ptree:ParseTree)

@famished-tiger
Copy link
Owner

famished-tiger commented Feb 9, 2018

Hi Randy,

Shame on me for answering to your questions so late. Just spotted them a couple minutes ago...

First, thanks for showing your interest in Rley...

You're right there is very limited documentation about the use of SPPFs (Shared Packed Parse Forest) with Rley. Here are a few (poor?) reasons for that situation:

  • Introducing the Rley parser and the use of parse trees in the README is already a kind of challenge in terms of readability and length. SPPFs are probably one of most useful data structure for parsing NL but they are really hard to tame beasts. So they should take their place in some advanced documentation that doesn't exist yet.
  • The last months I devoted my attention on parse trees generation (AST = Abstract Syntax Trees), easy to implement but not necessarily the most convenient to use in real parsers) and Concrete Syntax Trees (which are customized variant more practical for parser writers). I think that the SRL parser is already a nice showcase for AST. But this feature is also not yet documented..
  • Another reason for investing time in parse trees was to have familiar data structure that I could rely to test the correctnes of the parsing algorithm itself before jumping to SPPFs.
  • Last but not least reason: I my mind having ambiguous parses in NLP is a common situation: a sentence may lead to a large number of alternative parses. Representing the syntactical ambiguities with SPPF is one thing, what next then?... Which parse shall we select as the "right" one? Shall we implement:
    • A kind of enumerating interface that allows us to list possible parse trees (e.g. an array of parse trees)? Or,
    • Have some criteria that help to retain the most plausible parse?

I have serious doubts for approach 1 because it is not difficult to find ambiguous grammars where the number of interpretations explodes literally. You can see an example in the section
Pernicious Ambiguities.
I think that approach 2 is the way to go: we extend Rley so that it supports PCFG (Probabilistic Context-Free Grammar). Once a SPPFs is created, one selects the few most probable parse trees instead of listing all of them...
Easy to say... but not yet implemented.
Now enough talk... All the above reasons won't help you now.
So first what is the status of the SPPF codebase?
I've tested this on my side with a couple of little ambiguous grammars and the SPPFs creation seems to work fine. Also the ParseForestVisitor passed my tests.
I am willing to start the SPPF how-to documentation but this will be a work that will require several iterations before it reaches some maturity. I hope that will fit your agenda.

@randyv12
Copy link
Author

Hey, thanks for the response! I will have to do some more research on SPPFs and thank you for sharing the ntlk.org as a resource. To solve my problem it seems that it is easier to find the best possible parse tree in a parse forest rather than exhausting all the possible trees, (approach 2).

I will definitely read into PCFG's! Thanks a lot! I haven't said much but you definitely help point me in the right direction.

@famished-tiger
Copy link
Owner

Hi Randy,
Just realized that I my previous comment I swapped about AST and CST. No big deal but for the sake of accuracy I corrected my comment.
I will add in the examples an ambiguous grammar -not yet a NLP one- and show how to produce a SPPF.

@famished-tiger
Copy link
Owner

Hello Randy,

Was busy for a couple of months with another project about Simple Regex Language. It may look as a diversion but in fact, the Rley greatly benefited from being used in a situations that went well over the "toy" examples. A number of fixes and improvements were brought to Rley.
Today, I released a new version that contains:

  • A vastly reworked ParseForestVisitor class that is able to cope with cycles in the SPPF.
  • An improved Debug formatter that displays the visit events for parse trees and parse forest
  • An example showing how to generate a SPPF from a parse & how to visit it.

@randyv12
Copy link
Author

Hey there! @famished-tiger
Sorry for my super late response. Thank you for getting back at this topic. I'll definitely look into it. Many many thanks for the updates!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants