Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilteredStream: unable to read content of stream multiple times #98

Closed
mxr576 opened this issue Mar 13, 2018 · 10 comments · Fixed by #167
Closed

FilteredStream: unable to read content of stream multiple times #98

mxr576 opened this issue Mar 13, 2018 · 10 comments · Fixed by #167

Comments

@mxr576
Copy link
Contributor

mxr576 commented Mar 13, 2018

Q A
Bug? yes
New Feature? no
Version 1.7.0

I am working on a 3rd party integration for my library that uses HTTplug and this library. When I integrated my library to Drupal 8 that has its own Guzzle client wrapper I faced with an interesting problem. By using the \Http\Client\Common\Plugin\DecoderPlugin sometimes I got back an empty string instead of the JSON that API has actually returned.

I think this is the combination of two possible issues:

  1. If something reads the content of the full response body before the \Http\Client\Common\Plugin\DecoderPlugin plugin can do that then when it starts processing the response (ex.: with the \Http\Message\Encoding\DechunkStream) it does not rewinds the stream, therefore when the \Http\Message\Encoding\FilteredStream::getContents() is being called the while loop does not do anything because the wrapped (underlying) stream has already in the end.
    Maybe this can be solved by adding this condition to the constructor of the FilteredStream class.
if ($this->stream->isSeekable()) {
  $this->stream->rewind();
}
  1. If the wrapped stream was being closed then any of the \Http\Message\Encoding\FilteredStream implementations could not read the content of the stream again. It means that if I use the \Http\Client\Common\Plugin\DecoderPlugin on my client an try to get the body of the response multiple times like this: (string) $response->getBody() then after the first call I always get back an empty string. (Or if we add the suggested condition to the FilteredStream class' constructor an exception, because the read() method tries to read on a closed stream after the first (string) $response->getBody()). If I remove the plugin then the originally wrapped \GuzzleHttp\Psr7\Stream object inside an instance of \GuzzleHttp\Psr7\Response can always return the same response body (string) multiple times.
@mxr576
Copy link
Contributor Author

mxr576 commented Mar 13, 2018

In my opinion a possible solution for the 2nd point could be storing only a copy of the $stream parameter inside an instance of FilteredStream class by using the MessageFactoryDiscovery to find a suitable class that can "deep copy" a stream. That would introduce a new dependency to the php-http/discovery in this library.

@mxr576
Copy link
Contributor Author

mxr576 commented Mar 13, 2018

Actually, what reads the response earlier than the DecoderPlugin is a custom Guzzle on_stats callback that uses the \Http\Message\Formatter\FullHttpMessageFormatter formatter to log debug performance data and information about the request and response body. So if you would like to reproduce this problem maybe the easiest way to do something similar.

@joelwurtz
Copy link
Member

When doing (string) $response->getBody() the body should always be rewinded if it's seekable as stated by PSR7

Problem comes from using stream that cannot be seek, DecoderPlugin may decorate it like that because it's hard to rewind a stream that is gzip encoded (how to rewind correctly 4 decoded characeters, need to have a mapping between encoded / decoded : a true mess)

However you can decorate your stream with this implementation : https://github.com/php-http/message/blob/master/src/Stream/BufferedStream.php that allows to buffer a stream so it can transform a not seekable stream into a seekable one (but it may cause slowdown / memory consumption on big stream)

There is no plugin for that stream, but if you want to do a PR with it we will certainly merge this. Also this may be done directly in decoder plugin / others plugins with a specific option (like buffer => true)

@mxr576
Copy link
Contributor Author

mxr576 commented Mar 13, 2018

The code is using the built-in Stream implementation provided by Guzzle which is seekable. The first problem that I mentioned could be caused the fact that this implementation automatically closes the underlying resource when the object is destructed.
Luckily Guzzle out of the box does gzip decoding so the Decorator plugin does not have to take care of that.
But basically what your answered indicates that the Decorator plugin is currently useless because of the stream wrapper implementations in this library unable to handle seekable streams.

@joelwurtz
Copy link
Member

But basically what your answered indicates that the Decorator plugin is currently useless because of the stream wrapper implementations in this library unable to handle seekable streams.

This is by design, handling seekability in this case means buffering : that the only option possible. And buffering means keeping all the content into memory which is bad for performance and resource management (and may also be a ddos vulnerability in some cases).

That's why by default there is no seekable stream as user learn to use their request / response without buffering which result in better perfomance and less problems over time. Futhermore it is easier to add buffering than removing it.

If you really need the content of the body multiple times you can store it in a variable, or use the BufferedStream proposed (so you can still transfer a psr7 response across your services).

@mxr576
Copy link
Contributor Author

mxr576 commented Mar 14, 2018

Opened a PR for the message library with a for the __toString() implementation: php-http/message#95

Tried BufferedStream in the FilteredStream implementation as a wrapper around Guzzle's Stream implementation. It did not work because when Guzzle closed the stream the BufferedStream also become unreadable (of course).

If you really need the content of the body multiple times you can store it in a variable,

The problem is not that I need this value multiple times, the problem is that if something reads the value of the response body earlier than my code in the call chain then my code gets back an empty string and it becomes broken if I use the Decorator plugin in my API client.

After this conversation it seems reasonable for me to remove the Decorator plugin from library. if someone tries to use the Curl or any other HTTP client with my library that does not have built-in gzip or dechunk implementation and runs into an issue with this Decorator plugin then I'll suggest creating a custom implementation or use another HTTP client, like Guzzle.

mxr576 added a commit to mxr576/edge-php-sdk that referenced this issue Mar 14, 2018
@joelwurtz
Copy link
Member

joelwurtz commented Mar 14, 2018

BufferedStream should decorate the FIlteredStream, not the guzzle one. So it should be done just after the decoder plugin like the following :

Guzzle -> Decoder -> BufferedStream

Once you use the Decoder plugin and this one decode your response (as it's chunked / encoded) it will make it not seekable,

@joelwurtz
Copy link
Member

I made a PR here it miss a lot of things, but it's a good start and should fix your use case : #99

@mxr576
Copy link
Contributor Author

mxr576 commented Mar 15, 2018

Thanks, but I have already removed this plugin from my library.

The idea is good but I am afraid that you will run into the same issue as I did. BufferedStream/FilteredSteam will only decorate the original Guzzle Stream which automatically closes the resource when it is not needed anymore.

@joelwurtz
Copy link
Member

which automatically closes the resource when it is not needed anymore

It is problematic for FilteredStream but not for BufferedStream which have its own stream (and on which guzzle cannot do anything)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants