Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

another form of the sentence splitting function (Testing...Does NOT work) #473

Closed
wants to merge 1 commit into from

Conversation

DrewThomasson
Copy link
Owner

No description provided.

@DrewThomasson DrewThomasson changed the title another form of the sentence splitting function (Testing... another form of the sentence splitting function (Testing...unknown if works) Mar 11, 2025
@DrewThomasson
Copy link
Owner Author

Running the Dev test workflow for this file as well as you can see here

https://github.com/DrewThomasson/ebook2audiobook/actions/runs/13792746551

@DrewThomasson
Copy link
Owner Author

Nope locally testing it, shows this funtion breaks, removing the workflow from the que

@DrewThomasson
Copy link
Owner Author

Full log

drew@wmughal-CN4D09397T ebook2audiobook % ./ebook2audiobook.sh
v25.3.10 native mode
IPs available for connection:
['127.0.0.1', '::1', 'fe80::1%lo0', '10.5.167.48', 'fe80::825:8530:ff35:8d36%en0', 'fe80::f4d4:88ff:fe9d:7259%ap1', 'fe80::1891:7cff:fedd:85b8%awdl0', 'fe80::1891:7cff:fedd:85b8%llw0', 'fe80::a4fb:2110:4319:2225%utun0', 'fe80::ddfe:7bd7:2805:ee6e%utun1', 'fe80::ce81:b1c:bd2c:69e%utun2', 'fe80::fd5:243a:9e68:883b%utun3', 'fe80::bde5:2d9b:824a:4557%utun4', 'fe80::3667:9ce3:d5c6:d826%utun6']
Note: 0.0.0.0 is not the IP to connect. Instead use an IP above to connect.
* Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/blocks.py", line 2099, in process_api
    inputs = await self.preprocess_data(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/blocks.py", line 1794, in preprocess_data
    processed_input.append(block.preprocess(inputs_cached))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/components/dropdown.py", line 202, in preprocess
    raise Error(
gradio.exceptions.Error: 'Value: internal is not in the list of choices: []'
Processing eBook file: alice.txt
GPU is not available on your device!
Available Processor Unit: cpu
Running command: /opt/homebrew/bin/ebook-convert /Users/drew/ebook2audiobook/tmp/ebook-01205268-1020-4b59-84b2-29b7a9eb8fb0/0abcd0ee87c3df04dd58415c1f7788af/alice.txt /Users/drew/ebook2audiobook/tmp/ebook-01205268-1020-4b59-84b2-29b7a9eb8fb0/0abcd0ee87c3df04dd58415c1f7788af/__alice.epub
Conversion options changed from defaults:
  output_profile: 'generic_eink'
  input_encoding: 'utf-8'
  epub_version: '3'
  smarten_punctuation: True
  verbose: 1
  disable_font_rescaling: True
  flow_size: 0
1% Converting input to HTML...
InputFormatPlugin: TXT Input running
on /Users/drew/ebook2audiobook/tmp/ebook-01205268-1020-4b59-84b2-29b7a9eb8fb0/0abcd0ee87c3df04dd58415c1f7788af/alice.txt
Reading text from file...
Using user specified input encoding of utf-8
Auto detected paragraph type as unformatted
Auto detected formatting as heuristic
Running text through basic conversion...
Language not specified
Creator not specified
Building file list...
	Found files...
		 HTMLFile:0:a:'/Users/drew/ebook2audiobook/tmp/calibre_7.24.0_tmp_q157msxy/tiuz74so_plumber/index.html'
Normalizing filename cases
Rewriting HTML links
Parsing index.html ...
*********  Heuristic processing HTML  *********
There are 16 blank lines. 0.41025641025641024 percent blank
minimum chapters required are: 1
found 0 pre-existing headings
Total wordcount is: 1615, Average words per section is: 1615, Marked up 0 chapters
deleting blank lines
Hard line breaks check returned False
Median line length is 252, calculated with html format
Fixing hyphenated content
Looking for more split points based on punctuation, currently have 0
Formatting scene breaks
Forcing index.html into XHTML namespace
34% Running transforms on e-book...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 23 items of level: p_1
Ignoring level p_1
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
67% Running EPUB Output plugin
Splitting markup on page breaks and flow limits, if any...
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
Upgrading to EPUB 3...
EPUB output written to /Users/drew/ebook2audiobook/tmp/ebook-01205268-1020-4b59-84b2-29b7a9eb8fb0/0abcd0ee87c3df04dd58415c1f7788af/__alice.epub
Output saved to   /Users/drew/ebook2audiobook/tmp/ebook-01205268-1020-4b59-84b2-29b7a9eb8fb0/0abcd0ee87c3df04dd58415c1f7788af/__alice.epub

/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/ebooklib/epub.py:1423: FutureWarning: This search incorrectly ignores the root element, and will be fixed in a future version.  If you rely on the current behaviour, change it to './/xmlns:rootfile[@media-type]'
  for root_file in tree.findall('//xmlns:rootfile[@media-type]', namespaces={'xmlns': NAMESPACES['CONTAINERNS']}):
******* NOTE: YOU CAN SAFELY IGNORE "Character xx not found in the vocabulary." *******
Error extracting main content pages: maximum recursion depth exceeded
Traceback (most recent call last):
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 544, in get_chapters
    doc_cache[doc] = filter_chapter(doc, session['language'], session['language_iso1'], session['tts_engine'])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 591, in filter_chapter
    chapter_sentences = get_sentences(phoneme_list, max_tokens)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 688, in get_sentences
    sentences.extend(advanced_split(current_sentence.strip()))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 678, in advanced_split
    return advanced_split(part1) + advanced_split(part2)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 678, in advanced_split
    return advanced_split(part1) + advanced_split(part2)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 678, in advanced_split
    return advanced_split(part1) + advanced_split(part2)
           ^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 986 more times]
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 671, in advanced_split
    if any(p in sentence for p in punctuation_split):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded
Caught DependencyError: Error extracting main content pages: maximum recursion depth exceeded
get_chapters() failed!
^CKeyboard interruption in main thread... closing server.
^CServer interrupted by user. Shutting down...
Traceback (most recent call last):
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/blocks.py", line 2959, in block_thread
    time.sleep(0.1)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/drew/ebook2audiobook/lib/functions.py", line 2769, in web_interface
    outputs=[gr_voice_list, gr_custom_model_list, gr_audiobook_list, gr_modal]
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/blocks.py", line 2865, in launch
    self.block_thread()
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/blocks.py", line 2963, in block_thread
    self.server.close()
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/site-packages/gradio/http_server.py", line 69, in close
    self.thread.join(timeout=5)
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/threading.py", line 1153, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/Users/drew/ebook2audiobook/python_env/lib/python3.12/threading.py", line 1169, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Caught DependencyError: Server interrupted by user. Shutting down...
^C

@DrewThomasson DrewThomasson changed the title another form of the sentence splitting function (Testing...unknown if works) another form of the sentence splitting function (Testing...Does NOT work) Mar 11, 2025
@DrewThomasson
Copy link
Owner Author

So not merging then...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant