DynamoDB is a very good choice for Pay-as-you-go, high-concurrent key value database. Somestimes, you want to store large binary object along with Dynamodb items. Especially, in web crawler app. But Dynamodb has a limitation that one item can not be larger than 250KB. How could you solve the problem?
A easy solution is to store large binary object in s3, and only store the s3 uri in Dynamodb. pynamodb_mate
library provides this feature on top of pynamodb
project (A DynamoDB ORM layer in Python).
Here's how you define your ORM layer:
from pynamodb.models import Model
from pynamodb.attributes import UnicodeAttribute
from pynamodb_mate.s3_backed_attribute import (
S3BackedBinaryAttribute,
S3BackedUnicodeAttribute,
S3BackedMixin,
s3_key_safe_b64encode,
)
BUCKET_NAME = "my-bucket"
URI_PREFIX = "s3://{BUCKET_NAME}/".format(BUCKET_NAME=BUCKET_NAME)
class PageModel(Model, S3BackedMixin):
class Meta:
table_name = "pynamodb_mate-pages"
region = "us-east-1"
url = UnicodeAttribute(hash_key=True)
cover_image_url = UnicodeAttribute(null=True)
# this field is for html content string
html_content = S3BackedUnicodeAttribute(
s3_uri_getter=lambda obj: URI_PREFIX + s3_key_safe_b64encode(obj.url) + ".html",
compress=True,
)
# this field is for image binary content
cover_image_content = S3BackedBinaryAttribute(
s3_uri_getter=lambda obj: URI_PREFIX + s3_key_safe_b64encode(obj.cover_image_url) + ".jpg",
compress=True,
)
Here's how you store large binary to s3:
url = "http://www.python.org"
url_cover_image = "http://www.python.org/logo.jpg"
html_content = "Hello World!\n" * 1000
cover_image_content = ("this is a dummy image!\n" * 1000).encode("utf-8")
page = PageModel(url=url, cover_image_url=url_cover_image)
# create, if something wrong with s3.put_object in the middle,
# dirty s3 object will be cleaned up
page.atomic_save(
s3_backed_data=[
page.html_content.set_to(html_content),
page.cover_image_content.set_to(cover_image_content)
]
)
# update, if something wrong with s3.put_object in the middle,
# partially done new s3 object will be roll back
html_content_new = "Good Bye!\n" * 1000
cover_image_content_new = ("this is another dummy image!\n" * 1000).encode("utf-8")
page.atomic_update(
s3_backed_data=[
page.html_content.set_to(html_content_new),
page.cover_image_content.set_to(cover_image_content_new),
]
)
# delete, make sure s3 object are all gone
page.atomic_delete()
pynamodb_mate
is released on PyPI, so all you need is:
$ pip install pynamodb_mate
To upgrade to latest version:
$ pip install --upgrade pynamodb_mate