-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed & Languages #2
Comments
Hi Arjun, Sorry for my delayed answer, was busy these last couple of days. Can you give more details about the context you experienced the slow output (which grammar used, input text...)? Concerning the language: not 100% sure about your question. Do mean extend Rley to other implementation language than Ruby? Or do you mean extending to another language than Mini-English?
|
Hey About the language, I was asking about a write up, if you have, about extending to other spoken languages. Also, when I started searching for more I realised instead of context-free approach, the current methods are to use shift-reduce parsers, which are apparently fast and provide very good accurate dependency parsing. As you guess, can we work on that. |
Hi Arjun, As I was anxious about Rly speed, I wrote a benchmark script and I just it committed to Github . In short: repeating 100,000 times the parse of a sentence took less than 40 seconds. Not too bad for a NLP-grade parse. As you can see, my configuration isn't geared for high performance. |
Hey irb(main):277:0> Benchmark.bmbm(100) do |x|
irb(main):278:1* x.report("actual result is") {result = parser.parse(tokens)}
irb(main):279:1> end
Rehearsal -----
actual result is 0.000000 0.000000 0.000000 ( 0.001110)
------
total: 0.000000sec
user system total real
actual result is 0.010000 0.000000 0.010000 ( 0.000792)
=> [#<Benchmark::Tms:0x005571fedae880 @label="actual result is", @real=0.0007919409999885829, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.00999999999999801, @total=0.00999999999999801>] Your script runs like this ~/Desktop/ruby$ ruby benchmark_mini_en.rb
user system total real
Parse 100 times 0.030000 0.000000 0.030000 ( 0.034606)
Parse 1000 times 0.330000 0.000000 0.330000 ( 0.326775)
Parse 10000 times 3.210000 0.000000 3.210000 ( 3.205700)
Parse 1000000 times 32.490000 0.010000 32.500000 ( 32.499518)
~/Desktop/ruby$ ruby -v
ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux] But outputting simply Some notes
~/Desktop/ruby$ ruby benchmark_mini_en.rb
benchmark_mini_en.rb:60:in `block in tokenizer': uninitialized constant Rley::Lexical (NameError)
Did you mean? Lexicon
from benchmark_mini_en.rb:56:in `map'
from benchmark_mini_en.rb:56:in `tokenizer'
from benchmark_mini_en.rb:76:in `<main>'
~/Desktop/ruby$ ruby -v
ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-linux] In the def tokenizer(aTextToParse, aGrammar)
tokens = aTextToParse.scan(/\S+/).map do |word|
term_name = Lexicon[word]
raise StandardError, "Word '#{word}' not found in lexicon" if term_name.nil?
terminal = aGrammar.name2symbol[term_name]
Rley::Tokens::Token.new(word, terminal) # FROM README wont work in 2.2.3
Rley::Lexical::Token.new(word, terminal) # FROM BENCHMARK wont work in 2.3.3
end
return tokens
end In ruby 2.2.3 NameError: uninitialized constant Rley::Tokens
from (irb):267:in `block in tokenizer'
from (irb):263:in `map'
from (irb):263:in `tokenizer'
from (irb):272
from /home/arjun/.rbenv/versions/2.2.3/bin/irb:11:in `<main>' In ruby 2.3.3 NameError: uninitialized constant Rley::Lexical
Did you mean? Lexicon
from (irb):151:in `block in tokenizer'
from (irb):147:in `map'
from (irb):147:in `tokenizer'
from (irb):161
from /home/arjun/.rbenv/versions/2.3.3/bin/irb:11:in `<main>' PS - I am on i3-3240 with 8GB ram on Elementary 0.4.1 |
Hi Arjun, Ruby version should not be an issue: when I commit a new version of Rley , it is tested against several Rubies thanks to Travis CI and AppVeyor. |
Yeah that was the case. Didn't check you had recently uploaded a new version. |
Hi Arjun, In the meantime, I committed version 0.5.07 that contains a README.md with updated sample code. Thank you for helping me detect the issue. |
Cool.Thanks. |
Hi Arjun, Three points. |
Hey
Yup it does. That was really fun to see.
Okay. I didn't know that.
Okay, that would be really helpful. Will look into now.
Shift-reduce. 🙏
That would be cool to see. Maybe I can work on a timeline with you. |
Hi Arjun, Executing the
Hey Arjun, if you have some idea of a nice example to demo the Rley capabilities that will be cool. Probably the best will be start with something not overly ambitious... |
Hey
I will keep on adding more examples, Non/NLP, as I go along and get to know more.
That's fine. Since you have expressed your goal, its cool to have a distinct pace of convergence.
I came across this, but its in prolog - https://github.com/cbaziotis/prolog-cfg-parser |
Hi Arjun, Really nice that you catch the ball... |
And that is why I poked you to give a thought on newer techniques ☝️. No need to be like spacy or syntaxnet, just something for ruby to say, I got that. |
Thanks Arjun for your comments.
By the way, I looked rather briefly to the code of the dependency parser. My first impression was it many stuff are hardcoded. Not sure that it is flexible enough to cope with the language malleability. |
Hello Arjun, The speed issues with |
Hey
The parser tokens is slow to output,
result = parser.parse(tokens)
The text was updated successfully, but these errors were encountered: