Skip to content

java.lang.NoSuchMethodError: 'void org.apache.tika.parser.pdf.PDF2XHTML.setIgnoreContentStreamSpaceGlyphs(boolean)' #3265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
showmethemoney opened this issue May 21, 2025 · 2 comments · May be fixed by #3271

Comments

@showmethemoney
Copy link

Bug description

An error occurs when processing PDF files using TikaDocumentReader. This issue does not occur in version 1.0.0-M6.

Environment

Spring Boot version: 3.4.5
Spring AI version: 1.0.0-RC1
Java version: 21


  • Code
TikaDocumentReader reader = new TikaDocumentReader(resource);
  • Output
java.lang.NoSuchMethodError: 'void org.apache.tika.parser.pdf.PDF2XHTML.setIgnoreContentStreamSpaceGlyphs(boolean)'
        at org.apache.tika.parser.pdf.PDFParserConfig.configure(PDFParserConfig.java:229)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:105)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:219)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
        at org.springframework.ai.reader.tika.TikaDocumentReader.get(TikaDocumentReader.java:147)
        at org.springframework.ai.reader.tika.TikaDocumentReader.get(TikaDocumentReader.java:51)
        at org.springframework.ai.document.DocumentReader.read(DocumentReader.java:25)
  • Maven
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-vertex-ai-embedding</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-starter-model-vertex-ai-gemini</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-advisors-vector-store</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-tika-document-reader</artifactId>
        </dependency>
@sunyuhan1998
Copy link
Contributor

I tried to reproduce the issue using version 1.0.0-RC1, following the code and Maven dependencies you provided, but I was unable to replicate the problem. Could you provide a minimal demo project that can reproduce the issue? Additionally, Spring AI 1.0.0 GA has already been released — you could also try that version to see if the issue still persists.

@dafriz
Copy link
Contributor

dafriz commented May 21, 2025

setIgnoreContentStreamSpaceGlyphs was added in pdfbox 3.0.4 which is the version expected by Tika, however, the spring-ai-pdf-document-reader is using 3.0.3. We need to keep these in sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants