Skip to content

MatchError in VisionEncoderDecoder.getModelOutput due to missing ONNX case #14546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
krumpi36 opened this issue Apr 1, 2025 · 2 comments
Closed
1 task done
Assignees

Comments

@krumpi36
Copy link

krumpi36 commented Apr 1, 2025

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

I am using the VisionEncoderDecoderForImageCaptioning annotator in a my Java application. The issue surfaced after upgrading to a recent version of Spark NLP and attempting to use the newer image captioning models.

I am trying to use the model image_captioning_vit_gpt2_en. This model utilizes the ONNX backend (encoder_model.onnx, decoder_model.onnx) instead of the previous TensorFlow format (vision_encoder_decoder_tensorflow). The error occurs during the model loading/initialization phase when the ONNX engine is detected and used.

Current Behavior

When initializing VisionEncoderDecoderForImageCaptioning model that relies on the ONNX engine, the application crashes with a scala.MatchError. The error message indicates that the string "onnx" was not handled in a match statement within the com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput method.

The specific error is:


java.lang.ExceptionInInitializerError
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at java.base/java.util.Optional.orElseGet(Optional.java:364)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	...
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.IllegalStateException: Unable to load model
	at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.load(AbstractSparkAnalyzer.java:378)
	at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.<init>(AbstractSparkAnalyzer.java:81)
	at cz.xxx.xxx.sparknlp.ImageAnalyzer.<init>(ImageAnalyzer.java:78)
	at cz.xxx.xxx.sparknlp.image.ImageCaptionGeneration.<init>(ImageCaptionGeneration.java:40)
	at cz.xxx.xxx.sparknlp.images.ImageCaptionGenerationTest.<clinit>(ImageCaptionGenerationTest.java:52)
	... 60 more
Caused by: scala.MatchError: onnx (of class java.lang.String)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput(VisionEncoderDecoder.scala:428)
	at com.johnsnowlabs.ml.ai.util.Generation.Generate.$anonfun$beamSearch$7(Generate.scala:228)
	at scala.util.control.Breaks.breakable(Breaks.scala:42)
	at com.johnsnowlabs.ml.ai.util.Generation.Generate.beamSearch(Generate.scala:216)
	at com.johnsnowlabs.ml.ai.util.Generation.Generate.beamSearch$(Generate.scala:184)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.beamSearch(VisionEncoderDecoder.scala:41)
	at com.johnsnowlabs.ml.ai.util.Generation.Generate.generate(Generate.scala:153)
	at com.johnsnowlabs.ml.ai.util.Generation.Generate.generate$(Generate.scala:85)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.generate(VisionEncoderDecoder.scala:41)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.$anonfun$generateFromImage$1(VisionEncoderDecoder.scala:321)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.$anonfun$generateFromImage$1$adapted(VisionEncoderDecoder.scala:271)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator.toStream(Iterator.scala:1417)
	at scala.collection.Iterator.toStream$(Iterator.scala:1416)
	at scala.collection.AbstractIterator.toStream(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toSeq(TraversableOnce.scala:354)
	at scala.collection.TraversableOnce.toSeq$(TraversableOnce.scala:354)
	at scala.collection.AbstractIterator.toSeq(Iterator.scala:1431)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.generateFromImage(VisionEncoderDecoder.scala:271)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.sessionWarmup(VisionEncoderDecoder.scala:90)
	at com.johnsnowlabs.ml.ai.VisionEncoderDecoder.<init>(VisionEncoderDecoder.scala:93)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning.setModelIfNotSet(VisionEncoderDecoderForImageCaptioning.scala:270)
	at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.readModel(VisionEncoderDecoderForImageCaptioning.scala:461)
	at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.readModel$(VisionEncoderDecoderForImageCaptioning.scala:427)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.readModel(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.$anonfun$$init$$1(VisionEncoderDecoderForImageCaptioning.scala:479)
	at com.johnsnowlabs.nlp.annotators.cv.ReadVisionEncoderDecoderDLModel.$anonfun$$init$$1$adapted(VisionEncoderDecoderForImageCaptioning.scala:479)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:508)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:500)
	at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
	at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.com$johnsnowlabs$nlp$annotators$cv$ReadablePretrainedVisionEncoderDecoderModel$$super$pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained(VisionEncoderDecoderForImageCaptioning.scala:413)
	at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained$(VisionEncoderDecoderForImageCaptioning.scala:409)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:52)
	at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:51)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.com$johnsnowlabs$nlp$annotators$cv$ReadablePretrainedVisionEncoderDecoderModel$$super$pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained(VisionEncoderDecoderForImageCaptioning.scala:401)
	at com.johnsnowlabs.nlp.annotators.cv.ReadablePretrainedVisionEncoderDecoderModel.pretrained$(VisionEncoderDecoderForImageCaptioning.scala:401)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning$.pretrained(VisionEncoderDecoderForImageCaptioning.scala:649)
	at com.johnsnowlabs.nlp.annotators.cv.VisionEncoderDecoderForImageCaptioning.pretrained(VisionEncoderDecoderForImageCaptioning.scala)
	at cz.xxx.xxx.sparknlp.image.ImageCaptionGeneration.initPipelineStages(ImageCaptionGeneration.java:63)
	at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.train(AbstractSparkAnalyzer.java:419)
	at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.trainAndSaveModel(AbstractSparkAnalyzer.java:392)
	at cz.xxx.xxx.sparknlp.AbstractSparkAnalyzer.load(AbstractSparkAnalyzer.java:367)
	... 64 more

Expected Behavior

The VisionEncoderDecoderForImageCaptioning model using the ONNX backend (like image_captioning_vit_gpt2_en) should load and perform inference without crashing. The internal com.johnsnowlabs.ml.ai.VisionEncoderDecoder.getModelOutput method should correctly handle the case where detectedEngine is "onnx".

Steps To Reproduce

The issue occurs simply by attempting to load an ONNX-based VisionEncoderDecoderForImageCaptioning model using .pretrained(). The error happens during this call because the internal code is missing logic to handle the ONNX engine.

Minimal code example demonstrating the failure:

import com.johnsnowlabs.nlp.annotator._

val imageClassifier = VisionEncoderDecoderForImageCaptioning
  .pretrained()

Spark NLP version and Apache Spark

Spark NLP Version: 5.5.3 (Scala 2.12 artifact)
Apache Spark Version: 3.5.5

Type of Spark Application

Java Application

Java Version

openjdk 17.0.12 2024-07-16 OpenJDK Runtime Environment Temurin-17.0.12+7 (build 17.0.12+7) OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (build 17.0.12+7, mixed mode, sharing)

Java Home Directory

C:\Progs\Java\jdk-17.0.12+7\

Setup and installation

Maven

Operating System and Version

Windows 11 Pro 24H2 26100.3624

Link to your project (if available)

No response

Additional Information

This issue seems directly linked to the transition of certain models (like image_captioning_vit_gpt2_en) from TensorFlow to ONNX backends. The existing code in VisionEncoderDecoder was not prepared to handle the "onnx" engine type in the overridden getModelOutput method.

The likely fix involves replacing the duplicate Openvino.name case with the missing case ONNX.name in the match statement, potentially calling the overloaded getModelOutput method similar to the TensorFlow case.

Suggested code change (conceptual):

  // In com.johnsnowlabs.ml.ai.VisionEncoderDecoder
  override def getModelOutput(/*...*/): Array[Array[Float]] = {
    detectedEngine match {
      case Openvino.name =>
        getDecoderOutputsOv(decoderInputIds, ovInferRequest.get)

      // Replace the duplicate Openvino case with the missing ONNX case
      case Openvino.name => 
        getModelOutput(decoderInputIds, decoderEncoderStateTensors, session, ovInferRequest)

      case TensorFlow.name =>
        getModelOutput(decoderInputIds, decoderEncoderStateTensors, session, ovInferRequest)
    }
  }
@DevinTDHa
Copy link
Member

Hi @krumpi36, thanks a lot for reporting in this much detail.

You pinpointed the exact issue and we are taking a look.

@ahmedlone127
Copy link
Contributor

Hey @krumpi36 the error has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants