You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
java.lang.NoSuchMethodError: 'void org.apache.tika.parser.pdf.PDF2XHTML.setIgnoreContentStreamSpaceGlyphs(boolean)'
at org.apache.tika.parser.pdf.PDFParserConfig.configure(PDFParserConfig.java:229)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:105)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:219)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
at org.springframework.ai.reader.tika.TikaDocumentReader.get(TikaDocumentReader.java:147)
at org.springframework.ai.reader.tika.TikaDocumentReader.get(TikaDocumentReader.java:51)
at org.springframework.ai.document.DocumentReader.read(DocumentReader.java:25)
I tried to reproduce the issue using version 1.0.0-RC1, following the code and Maven dependencies you provided, but I was unable to replicate the problem. Could you provide a minimal demo project that can reproduce the issue? Additionally, Spring AI 1.0.0 GA has already been released — you could also try that version to see if the issue still persists.
setIgnoreContentStreamSpaceGlyphs was added in pdfbox 3.0.4 which is the version expected by Tika, however, the spring-ai-pdf-document-reader is using 3.0.3. We need to keep these in sync.
Bug description
An error occurs when processing PDF files using TikaDocumentReader. This issue does not occur in version 1.0.0-M6.
Environment
Spring Boot version: 3.4.5
Spring AI version: 1.0.0-RC1
Java version: 21
The text was updated successfully, but these errors were encountered: