Skip to content

Commit

Permalink
Add ExternalRef, which specifies a byte array in a way which allows…
Browse files Browse the repository at this point in the history
… sharing

it with other objects without copying if that is considered more efficient than
copying. It mediates between the producer and the consumer of the data during
transfer; it is not suitable for longer storage. Creating an `ExternalRef` is
usually more efficient than creating a `Chain` or `absl::Cord` if the data will
ultimately be copied rather than shared.

`{,Backward}Writer::Write(ExternalRef)` replace
`{,Backward}Writer::SupportsCopying()` as a mechanism to choose between copying
and sharing. At the same time the mechanism is extended with allowing a
preference to producing `absl::Cord` or a pointer with a deleter.

Add `ExternalStorage` alias for specifying a pointer with a deleter, and
`ExternalData` struct for specifying that together with a substring of a byte
array it owns.

Add
`{Chain::BlockIterator,Buffer,SharedBuffer,SizedSharedBuffer,CompactString}::ToExternalRef()`
to wrap typical usages of the `ExternalRef` constructor.

Remove functionality subsumed by `ExternalRef`:
* `Buffer::{Release,DeleteReleased}()`
* `SharedBuffer::{Share,DeleteShared}()`
* `{Buffer,SharedBuffer}::{ToCord,AppendSubstrTo,PrependSubstrTo}()`
* `SizedSharedBuffer::{Substr,storage,operator absl::Cord,AppendTo,PrependTo}()`
* `Chain::{Chain,Append,Prepend}(SizedSharedBuffer)`
* `Chain::BlockIterator::Pin()` with `Chain::PinnedBlock`
* `Chain::BlockIterator::{Append,Prepend}{,Substr}To()`
* `Chain::RawBlock::{Append,Prepend}{,Substr}To()` (internal)
* `FlatCordRef::{Append,Prepend}{,Substr}To()` (internal)
* `SharedBufferRef` (internal)

Remove publicly unused and not publicly useful `SizedSharedBuffer::IsUnique()`.

Tweak conditions for choosing between copying and sharing across classes to be
more uniform. In particular prefer sharing if the size hint indicates that this
is the last write, or the current position is 0 in absence of a size hint which
indicates that this is the first and perhaps the only write, i.e. the data being
written will more likely not be concatenated with other data.

Split `chain.h` to `chain_base.h` and `chain_details.h` because of mutual
dependency betewen `Chain` and `ExternalRef`.

Minor changes:

* Change the split of responsibility between the fast path of
  `{,Backward}Writer::Write(std::string&&)` and `WriteStringSlow()` so that the
  fast path covers a short string only when it fits in the buffer.

* In `Chain::FromExternal()` which converts the data from the object, require
  the conversion to `absl::string_view` to be const. This makes the expected
  interface for `ExternalRef` a superset of the interface for
  `Chain::FromExeternal()`. In practice it is unlikely that such a conversion
  might benefit from being non-const.

* Rename the `absl::string_view data` parameter of functions associated with
  external objects with `absl::string_view substr`, to highlight that it might
  represent a substring of the owned data.

PiperOrigin-RevId: 646086366
  • Loading branch information
QrczakMK committed Jun 24, 2024
1 parent 32627d8 commit e72e741
Show file tree
Hide file tree
Showing 88 changed files with 4,621 additions and 3,143 deletions.
47 changes: 37 additions & 10 deletions riegeli/base/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,13 @@ cc_library(
],
)

cc_library(
name = "external_data",
srcs = ["external_data.cc"],
hdrs = ["external_data.h"],
deps = ["@com_google_absl//absl/strings"],
)

cc_library(
name = "shared_ptr",
hdrs = [
Expand All @@ -359,6 +366,7 @@ cc_library(
":arithmetic",
":assert",
":compare",
":external_data",
":initializer",
":new_aligned",
"@com_google_absl//absl/base:core_headers",
Expand All @@ -371,13 +379,14 @@ cc_library(
srcs = ["buffer.cc"],
hdrs = ["buffer.h"],
deps = [
":arithmetic",
":assert",
":buffering",
":cord_utils",
":estimated_allocated_size",
":external_data",
":external_ref",
"@com_google_absl//absl/base:core_headers",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/strings:cord",
],
)

Expand All @@ -386,15 +395,15 @@ cc_library(
srcs = ["shared_buffer.cc"],
hdrs = ["shared_buffer.h"],
deps = [
":arithmetic",
":assert",
":buffer",
":buffering",
":cord_utils",
":external_data",
":external_ref",
":initializer",
":shared_ptr",
"@com_google_absl//absl/base:core_headers",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/strings:cord",
],
)

Expand All @@ -406,31 +415,35 @@ cc_library(
":arithmetic",
":assert",
":buffering",
":external_ref",
":shared_buffer",
"@com_google_absl//absl/base:core_headers",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/strings:cord",
"@com_google_absl//absl/types:span",
],
)

cc_library(
name = "chain",
name = "chain_and_external_ref",
srcs = ["chain.cc"],
hdrs = ["chain.h"],
hdrs = [
"chain_base.h",
"chain_details.h",
"external_ref_base.h",
],
visibility = ["//visibility:private"],
deps = [
":arithmetic",
":assert",
":buffering",
":compare",
":cord_utils",
":external_data",
":global",
":initializer",
":memory_estimator",
":new_aligned",
":shared_buffer",
":shared_ptr",
":sized_shared_buffer",
":string_utils",
":zeros",
"@com_google_absl//absl/base:core_headers",
Expand All @@ -443,6 +456,18 @@ cc_library(
],
)

cc_library(
name = "chain",
hdrs = ["chain.h"],
deps = [":chain_and_external_ref"],
)

cc_library(
name = "external_ref",
hdrs = ["external_ref.h"],
deps = [":chain_and_external_ref"],
)

cc_library(
name = "compact_string",
srcs = ["compact_string.cc"],
Expand All @@ -452,6 +477,8 @@ cc_library(
":assert",
":compare",
":estimated_allocated_size",
":external_data",
":external_ref",
":new_aligned",
"@com_google_absl//absl/base:config",
"@com_google_absl//absl/base:core_headers",
Expand Down
91 changes: 12 additions & 79 deletions riegeli/base/buffer.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2017 Google LLC
// Copyright 2023 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand All @@ -14,90 +14,23 @@

#include "riegeli/base/buffer.h"

#include <stddef.h>
#include <ostream>

#include <functional>
#include <utility>

#include "absl/strings/cord.h"
#include "absl/strings/string_view.h"
#include "riegeli/base/assert.h"
#include "riegeli/base/buffering.h"
#include "riegeli/base/cord_utils.h"
#include "riegeli/base/arithmetic.h"

namespace riegeli {

namespace {

// A releasing callback for embedding a `Buffer` in an `absl::Cord`.
struct Releaser {
void operator()() const {
// Nothing to do: the destructor does the work.
}
Buffer buffer;
};

} // namespace

absl::Cord Buffer::ToCord(const char* data, size_t length) && {
if (data != nullptr || length > 0) {
RIEGELI_ASSERT(std::greater_equal<>()(data, data_))
<< "Failed precondition of Buffer::ToCord(): "
"substring not contained in the buffer";
RIEGELI_ASSERT(std::less_equal<>()(data + length, data_ + capacity_))
<< "Failed precondition of Buffer::ToCord(): "
"substring not contained in the buffer";
}
if (length <= cord_internal::kMaxInline ||
Wasteful(
cord_internal::kSizeOfCordRepExternal + sizeof(Releaser) + capacity_,
length)) {
return cord_internal::MakeBlockyCord(absl::string_view(data, length));
}
return absl::MakeCordFromExternal(absl::string_view(data, length),
Releaser{std::move(*this)});
}

void Buffer::AppendSubstrTo(const char* data, size_t length,
absl::Cord& dest) && {
if (data != nullptr || length > 0) {
RIEGELI_ASSERT(std::greater_equal<>()(data, data_))
<< "Failed precondition of Buffer::AppendSubstrTo(): "
"substring not contained in the buffer";
RIEGELI_ASSERT(std::less_equal<>()(data + length, data_ + capacity_))
<< "Failed precondition of Buffer::AppendSubstrTo(): "
"substring not contained in the buffer";
}
if (length <= cord_internal::MaxBytesToCopyToCord(dest) ||
Wasteful(
cord_internal::kSizeOfCordRepExternal + sizeof(Releaser) + capacity_,
length)) {
cord_internal::AppendToBlockyCord(absl::string_view(data, length), dest);
return;
}
dest.Append(absl::MakeCordFromExternal(absl::string_view(data, length),
Releaser{std::move(*this)}));
}

void Buffer::PrependSubstrTo(const char* data, size_t length,
absl::Cord& dest) && {
if (data != nullptr || length > 0) {
RIEGELI_ASSERT(std::greater_equal<>()(data, data_))
<< "Failed precondition of Buffer::PrependSubstrTo(): "
"substring not contained in the buffer";
RIEGELI_ASSERT(std::less_equal<>()(data + length, data_ + capacity_))
<< "Failed precondition of Buffer::PrependSubstrTo(): "
"substring not contained in the buffer";
}
if (length <= cord_internal::MaxBytesToCopyToCord(dest) ||
Wasteful(
cord_internal::kSizeOfCordRepExternal + sizeof(Releaser) + capacity_,
length)) {
cord_internal::PrependToBlockyCord(absl::string_view(data, length), dest);
return;
void Buffer::DumpStructure(absl::string_view substr, std::ostream& out) const {
out << "[buffer] {";
if (!substr.empty()) {
if (substr.data() != data()) {
out << " space_before: " << PtrDistance(data(), substr.data());
}
out << " space_after: "
<< PtrDistance(substr.data() + substr.size(), data() + capacity());
}
dest.Prepend(absl::MakeCordFromExternal(absl::string_view(data, length),
Releaser{std::move(*this)}));
out << " }";
}

} // namespace riegeli
80 changes: 42 additions & 38 deletions riegeli/base/buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,17 @@

#include <stddef.h>

#include <functional>
#include <iosfwd>
#include <utility>

#include "absl/base/attributes.h"
#include "absl/strings/cord.h"
#include "absl/strings/string_view.h"
#include "riegeli/base/assert.h"
#include "riegeli/base/buffering.h"
#include "riegeli/base/estimated_allocated_size.h"
#include "riegeli/base/external_data.h"
#include "riegeli/base/external_ref.h"

namespace riegeli {

Expand Down Expand Up @@ -56,41 +61,46 @@ class
// Returns the usable data size. It can be greater than the requested size.
size_t capacity() const { return capacity_; }

// Returns the data pointer, releasing its ownership; the `Buffer` is left
// deallocated. The returned pointer must be deleted using `DeleteReleased()`.
// Converts a substring of `*this` to `ExternalRef`.
//
// If the returned pointer is `nullptr`, it allowed but not required to call
// `DeleteReleased()`.
char* Release();

// Deletes the pointer obtained by `Release()`.
// `storage` must outlive usages of the returned `ExternalRef`.
//
// Does nothing if `ptr == nullptr`.
static void DeleteReleased(void* ptr);
// Precondition:
// if `!substr.empty()` then `substr` is a substring of
// [`data()`..`data() + capacity()`).
ExternalRef ToExternalRef(absl::string_view substr,
ExternalRef::StorageSubstr<Buffer&&>&& storage
ABSL_ATTRIBUTE_LIFETIME_BOUND =
ExternalRef::StorageSubstr<Buffer&&>()) && {
if (!substr.empty()) {
RIEGELI_ASSERT(std::greater_equal<>()(substr.data(), data()))
<< "Failed precondition of Buffer::ToExternalRef(): "
"substring not contained in the buffer";
RIEGELI_ASSERT(std::less_equal<>()(substr.data() + substr.size(),
data() + capacity()))
<< "Failed precondition of Buffer::ToExternalRef(): "
"substring not contained in the buffer";
}
return ExternalRef(std::move(*this), substr, std::move(storage));
}

// Converts [`data`..`data + length`) to `absl::Cord`.
//
// If `data != nullptr || length > 0` then [`data`..`data + length`) must be
// contained in `*this`.
//
// `*this` is left unchanged or deallocated.
absl::Cord ToCord(const char* data, size_t length) &&;
// Support `ExternalRef`.
friend size_t RiegeliAllocatedMemory(const Buffer* self) {
return self->capacity();
}

// Appends [`data`..`data + length`) to `dest`.
//
// If `data != nullptr || length > 0` then [`data`..`data + length`) must be
// contained in `*this`.
//
// `*this` is left unchanged or deallocated.
void AppendSubstrTo(const char* data, size_t length, absl::Cord& dest) &&;
// Support `ExternalRef`.
friend ExternalStorage RiegeliToExternalStorage(Buffer* self) {
self->capacity_ = 0;
return ExternalStorage(
std::exchange(self->data_, nullptr), operator delete);
}

// Prepends [`data`..`data + length`) to `dest`.
//
// If `data != nullptr || length > 0` then [`data`..`data + length`) must be
// contained in `*this`.
//
// `*this` is left unchanged or deallocated.
void PrependSubstrTo(const char* data, size_t length, absl::Cord& dest) &&;
// Support `Chain::FromExternal()` and `ExternalRef`.
friend void RiegeliDumpStructure(const Buffer* self, absl::string_view substr,
std::ostream& out) {
self->DumpStructure(substr, out);
}

// Support `MemoryEstimator`.
template <typename MemoryEstimator>
Expand All @@ -102,6 +112,7 @@ class
private:
void AllocateInternal(size_t min_capacity);
void DeleteInternal();
void DumpStructure(absl::string_view substr, std::ostream& out) const;

char* data_ = nullptr;
size_t capacity_ = 0;
Expand Down Expand Up @@ -151,13 +162,6 @@ inline void Buffer::DeleteInternal() {
#endif
}

inline char* Buffer::Release() {
capacity_ = 0;
return std::exchange(data_, nullptr);
}

inline void Buffer::DeleteReleased(void* ptr) { operator delete(ptr); }

} // namespace riegeli

#endif // RIEGELI_BASE_BUFFER_H_
Loading

0 comments on commit e72e741

Please sign in to comment.