You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and did not find a match.
Who can help?
No response
What are you working on?
I am using the VisionEncoderDecoderForImageCaptioning annotator in a my Java application. The issue surfaced after upgrading to a recent version of Spark NLP and attempting to use the newer image captioning models.
I am trying to use the model image_captioning_vit_gpt2_en. This model utilizes the ONNX backend (encoder_model.onnx, decoder_model.onnx) instead of the previous TensorFlow format (vision_encoder_decoder_tensorflow). The error occurs during the model loading/initialization phase when the ONNX engine is detected and used.
Current Behavior
When initializing VisionEncoderDecoderForImageCaptioning model that relies on the ONNX engine, the application crashes with a scala.MatchError. The error message indicates that the string "onnx" was not handled in a match statement within the com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput method.
The specific error is:
java.lang.ExceptionInInitializerError
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at java.base/java.util.Optional.orElseGet(Optional.java:364)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
...
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.IllegalStateException: Unable to load model
at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.load(AbstractSparkAnalyzer.java:378)
at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.<init>(AbstractSparkAnalyzer.java:81)
at cz.xxx.xxx.sparknlp.ImageAnalyzer.<init>(ImageAnalyzer.java:78)
at cz.xxx.xxx.sparknlp.image.ImageCaptionGeneration.<init>(ImageCaptionGeneration.java:40)
at cz.xxx.xxx.sparknlp.images.ImageCaptionGenerationTest.<clinit>(ImageCaptionGenerationTest.java:52)
... 60 more
Caused by: scala.MatchError: onnx (of class java.lang.String)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput(VisionEncoderDecoder.scala:428)
at com.johnsnowlabs.ml.ai.util.Generation.Generate.$anonfun$beamSearch$7(Generate.scala:228)
at scala.util.control.Breaks.breakable(Breaks.scala:42)
at com.johnsnowlabs.ml.ai.util.Generation.Generate.beamSearch(Generate.scala:216)
at com.johnsnowlabs.ml.ai.util.Generation.Generate.beamSearch$(Generate.scala:184)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.beamSearch(VisionEncoderDecoder.scala:41)
at com.johnsnowlabs.ml.ai.util.Generation.Generate.generate(Generate.scala:153)
at com.johnsnowlabs.ml.ai.util.Generation.Generate.generate$(Generate.scala:85)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.generate(VisionEncoderDecoder.scala:41)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.$anonfun$generateFromImage$1(VisionEncoderDecoder.scala:321)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.$anonfun$generateFromImage$1$adapted(VisionEncoderDecoder.scala:271)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator.toStream(Iterator.scala:1417)
at scala.collection.Iterator.toStream$(Iterator.scala:1416)
at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.generateFromImage(VisionEncoderDecoder.scala:271)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.sessionWarmup(VisionEncoderDecoder.scala:90)
at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.<init>(VisionEncoderDecoder.scala:93)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning.setModelIfNotSet(VisionEncoderDecoderForImageCaptioning.scala:270)
at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.readModel(VisionEncoderDecoderForImageCaptioning.scala:461)
at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.readModel$(VisionEncoderDecoderForImageCaptioning.scala:427)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.readModel(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.$anonfun$$init$$1(VisionEncoderDecoderForImageCaptioning.scala:479)
at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.$anonfun$$init$$1$adapted(VisionEncoderDecoderForImageCaptioning.scala:479)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:508)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:500)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.com$johnsnowlabs$nlp$annotators$cv$ReadablePretrainedVisionEncoderDecoderModel$$super$pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained(VisionEncoderDecoderForImageCaptioning.scala:413)
at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained$(VisionEncoderDecoderForImageCaptioning.scala:409)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:52)
at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:51)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.com$johnsnowlabs$nlp$annotators$cv$ReadablePretrainedVisionEncoderDecoderModel$$super$pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained(VisionEncoderDecoderForImageCaptioning.scala:401)
at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained$(VisionEncoderDecoderForImageCaptioning.scala:401)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning.pretrained(VisionEncoderDecoderForImageCaptioning.scala)
at cz.xxx.xxx.sparknlp.image.ImageCaptionGeneration.initPipelineStages(ImageCaptionGeneration.java:63)
at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.train(AbstractSparkAnalyzer.java:419)
at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.trainAndSaveModel(AbstractSparkAnalyzer.java:392)
at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.load(AbstractSparkAnalyzer.java:367)
... 64 more
Expected Behavior
The VisionEncoderDecoderForImageCaptioning model using the ONNX backend (like image_captioning_vit_gpt2_en) should load and perform inference without crashing. The internal com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput method should correctly handle the case where detectedEngine is "onnx".
Steps To Reproduce
The issue occurs simply by attempting to load an ONNX-based VisionEncoderDecoderForImageCaptioning model using .pretrained(). The error happens during this call because the internal code is missing logic to handle the ONNX engine.
openjdk 17.0.12 2024-07-16 OpenJDK Runtime Environment Temurin-17.0.12+7 (build 17.0.12+7) OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (build 17.0.12+7, mixed mode, sharing)
Java Home Directory
C:\Progs\Java\jdk-17.0.12+7\
Setup and installation
Maven
Operating System and Version
Windows 11 Pro 24H2 26100.3624
Link to your project (if available)
No response
Additional Information
This issue seems directly linked to the transition of certain models (like image_captioning_vit_gpt2_en) from TensorFlow to ONNX backends. The existing code in VisionEncoderDecoder was not prepared to handle the "onnx" engine type in the overridden getModelOutput method.
The likely fix involves replacing the duplicate Openvino.name case with the missing case ONNX.name in the match statement, potentially calling the overloaded getModelOutput method similar to the TensorFlow case.
Suggested code change (conceptual):
// In com.johnsnowlabs.ml.ai.VisionEncoderDecoderoverridedefgetModelOutput(/*...*/):Array[Array[Float]] = {
detectedEngine match {
caseOpenvino.name =>
getDecoderOutputsOv(decoderInputIds, ovInferRequest.get)
// Replace the duplicate Openvino case with the missing ONNX casecaseOpenvino.name =>
getModelOutput(decoderInputIds, decoderEncoderStateTensors, session, ovInferRequest)
caseTensorFlow.name =>
getModelOutput(decoderInputIds, decoderEncoderStateTensors, session, ovInferRequest)
}
}
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Who can help?
No response
What are you working on?
I am using the
VisionEncoderDecoderForImageCaptioning
annotator in a my Java application. The issue surfaced after upgrading to a recent version of Spark NLP and attempting to use the newer image captioning models.I am trying to use the model image_captioning_vit_gpt2_en. This model utilizes the ONNX backend (
encoder_model.onnx
,decoder_model.onnx
) instead of the previous TensorFlow format (vision_encoder_decoder_tensorflow
). The error occurs during the model loading/initialization phase when the ONNX engine is detected and used.Current Behavior
When initializing
VisionEncoderDecoderForImageCaptioning
model that relies on the ONNX engine, the application crashes with ascala.MatchError
. The error message indicates that the string"onnx"
was not handled in a match statement within thecom.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput
method.The specific error is:
Expected Behavior
The
VisionEncoderDecoderForImageCaptioning
model using the ONNX backend (likeimage_captioning_vit_gpt2_en
) should load and perform inference without crashing. The internalcom.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput
method should correctly handle the case wheredetectedEngine
is"onnx"
.Steps To Reproduce
The issue occurs simply by attempting to load an ONNX-based
VisionEncoderDecoderForImageCaptioning
model using.pretrained()
. The error happens during this call because the internal code is missing logic to handle the ONNX engine.Minimal code example demonstrating the failure:
Spark NLP version and Apache Spark
Spark NLP Version: 5.5.3 (Scala 2.12 artifact)
Apache Spark Version: 3.5.5
Type of Spark Application
Java Application
Java Version
openjdk 17.0.12 2024-07-16 OpenJDK Runtime Environment Temurin-17.0.12+7 (build 17.0.12+7) OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (build 17.0.12+7, mixed mode, sharing)
Java Home Directory
C:\Progs\Java\jdk-17.0.12+7\
Setup and installation
Maven
Operating System and Version
Windows 11 Pro 24H2 26100.3624
Link to your project (if available)
No response
Additional Information
This issue seems directly linked to the transition of certain models (like
image_captioning_vit_gpt2_en
) from TensorFlow to ONNX backends. The existing code inVisionEncoderDecoder
was not prepared to handle the"onnx"
engine type in the overriddengetModelOutput
method.The likely fix involves replacing the duplicate
Openvino.name
case with the missingcase ONNX.name
in thematch
statement, potentially calling the overloadedgetModelOutput
method similar to the TensorFlow case.Suggested code change (conceptual):
The text was updated successfully, but these errors were encountered: