Skip to content

Commit 5d2fce2

Browse files
committed
Merge branch 'ckirby-master'
2 parents b5e5f51 + f0fd9a5 commit 5d2fce2

File tree

7 files changed

+289
-106
lines changed

7 files changed

+289
-106
lines changed

.travis.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,13 @@ cache:
88
python:
99
- "2.7"
1010
- "3.5"
11+
- "3.6"
1112

1213
env:
13-
- DJANGO_VERSION=1.8.15
14-
- DJANGO_VERSION=1.9.10
15-
- DJANGO_VERSION=1.10.5
14+
- DJANGO_VERSION=1.8.17
15+
- DJANGO_VERSION=1.9.12
16+
- DJANGO_VERSION=1.10.6
17+
- DJANGO_VERSION=1.11b1
1618

1719
install:
1820
- pip install -r requirements.txt

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,6 @@ ship:
99
test:
1010
clear;
1111
flake8 postgres_copy;
12+
flake8 tests;
1213
coverage run setup.py test;
1314
coverage report -m;

docs/index.rst

Lines changed: 72 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,9 @@ Quickly load comma-delimited data into a Django model using PostgreSQL's COPY co
77
Why and what for?
88
-----------------
99

10-
`The people <http://www.californiacivicdata.org/about/>`_ who made this library are data journalists.
11-
We are often downloading, cleaning and analyzing new data.
10+
`The people <http://www.californiacivicdata.org/about/>`_ who made this library are data journalists. We are often downloading, cleaning and analyzing new data.
1211

13-
That means we write a load of loaders. You can usually do this by looping through each row
14-
and saving it to the database using the Django's ORM `create method <https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.create>`_.
12+
That means we write a load of loaders. You can usually do this by looping through each row and saving it to the database using the Django's ORM `create method <https://docs.djangoproject.com/en/1.10/ref/models/querysets/#django.db.models.query.QuerySet.create>`_.
1513

1614
.. code-block:: python
1715
@@ -24,12 +22,9 @@ and saving it to the database using the Django's ORM `create method <https://doc
2422
2523
But if you have a big CSV, Django will rack up database queries and it can take a long time to finish.
2624

27-
Lucky for us, PostgreSQL has a built-in tool called `COPY <http://www.postgresql.org/docs/9.4/static/sql-copy.html>`_ that will hammer data into the
28-
database with one quick query.
25+
Lucky for us, PostgreSQL has a built-in tool called `COPY <http://www.postgresql.org/docs/9.4/static/sql-copy.html>`_ that will hammer data into the database with one quick query.
2926

30-
This package tries to make using COPY as easy any other database routine supported by Django. It is
31-
largely based on the design of the `LayerMapping <https://docs.djangoproject.com/en/1.8/ref/contrib/gis/layermapping/>`_
32-
utility for importing geospatial data.
27+
This package tries to make using COPY as easy any other database routine supported by Django. It is largely based on the design of the `LayerMapping <https://docs.djangoproject.com/en/1.8/ref/contrib/gis/layermapping/>`_ utility for importing geospatial data.
3328

3429
.. code-block:: python
3530
@@ -53,14 +48,12 @@ The package can be installed from the Python Package Index with `pip`.
5348
5449
$ pip install django-postgres-copy
5550
56-
You will of course have to have Django, PostgreSQL and an adapter between the
57-
two (like psycopg2) already installed to put this library to use.
51+
You will of course have to have Django, PostgreSQL and an adapter between the two (like psycopg2) already installed to put this library to use.
5852

5953
An example
6054
----------
6155

62-
It all starts with a CSV file you'd like to load into your database. This library
63-
is intended to be used with large files but here's something simple as an example.
56+
It all starts with a CSV file you'd like to load into your database. This library is intended to be used with large files but here's something simple as an example.
6457

6558
.. code-block:: text
6659
@@ -87,8 +80,7 @@ If the model hasn't been created in your database, that needs to happen.
8780
8881
$ python manage.py migrate
8982
90-
Create a loader that uses this library to load CSV data into the model. One place you could
91-
put it is in a Django management command.
83+
Create a loader that uses this library to load CSV data into the model. One place you could put it is in a Django management command.
9284

9385
.. code-block:: python
9486
@@ -125,7 +117,7 @@ Like I said, that's it!
125117
``CopyMapping`` API
126118
-------------------
127119

128-
.. class:: CopyMapping(model, csv_path, mapping[, using=None, delimiter=',', null=None, encoding=None])
120+
.. class:: CopyMapping(model, csv_path, mapping[, using=None, delimiter=',', null=None, encoding=None, static_mapping=None])
129121

130122
The following are the arguments and keywords that may be used during
131123
instantiation of ``CopyMapping`` objects.
@@ -175,8 +167,7 @@ Keyword Arguments
175167

176168
.. method:: CopyMapping.save([silent=False, stream=sys.stdout])
177169

178-
The ``save()`` method also accepts keywords. These keywords are
179-
used for controlling output logging and error handling.
170+
The ``save()`` method also accepts keywords. These keywords are used for controlling output logging and error handling.
180171

181172
=========================== =================================================
182173
Keyword Arguments Description
@@ -194,12 +185,9 @@ Keyword Arguments Description
194185
Transforming data
195186
-----------------
196187

197-
By default, the COPY command cannot transform data on-the-fly as it is loaded into
198-
the database.
188+
By default, the COPY command cannot transform data on-the-fly as it is loaded into the database.
199189

200-
This library first loads the data into a temporary table
201-
before inserting all records into the model table. So it is possible to use PostgreSQL's
202-
built-in SQL methods to modify values during the insert.
190+
This library first loads the data into a temporary table before inserting all records into the model table. So it is possible to use PostgreSQL's built-in SQL methods to modify values during the insert.
203191

204192
As an example, imagine a CSV that includes a column of yes and no values that you wanted to store in the database as 1 or 0 in an integer field.
205193

@@ -230,9 +218,9 @@ Custom-field transformations
230218

231219
One approach is to create a custom Django field.
232220

233-
You can set a temporary data type for a column when it is first loaded, and then provide a SQL string for how to transform it during the insert into the model table. The transformation must include a string interpolation keyed to "name", where the name of the database column will be slotted.
221+
You can provide a SQL statement for how to transform the data during the insert into the model table. The transformation must include a string interpolation keyed to "name", where the title of the database column will be slotted.
234222

235-
This example loads in the column as the forgiving `text <http://www.postgresql.org/docs/9.4/static/datatype-character.html>`_ data type and then uses a `CASE statement <http://www.postgresql.org/docs/9.4/static/plpgsql-control-structures.html>`_ to transforms the data using a CASE statement.
223+
This example uses a `CASE statement <http://www.postgresql.org/docs/9.4/static/plpgsql-control-structures.html>`_ to transforms the data.
236224

237225
.. code-block:: python
238226
@@ -264,9 +252,9 @@ Run your loader and it should finish fine.
264252
Model-method transformations
265253
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266254

267-
A second approach is to provide a SQL string for how to transform a field during the insert on the model itself. This lets you specific different transformations for different fields of the same type.
255+
A second approach is to provide a SQL string for how to transform a field during the insert on the model itself. This lets you specify different transformations for different fields of the same type.
268256

269-
You must name the method so that the field name is sandwiched between ``copy_`` and ``_template``. It must return a string interpolation keyed to "name", where the name of the database column will be slotted.
257+
You must name the method so that the field name is sandwiched between ``copy_`` and ``_template``. It must return a SQL statement with a string interpolation keyed to "name", where the name of the database column will be slotted.
270258

271259
For the example above, the model might be modified to look like this.
272260

@@ -292,12 +280,9 @@ And that's it.
292280
Inserting static values
293281
-----------------------
294282

295-
If your model has columns that are not in the CSV, you can set static values
296-
for what is inserted using the ``static_mapping`` keyword argument. It will
297-
insert the provided values into every row in the database.
283+
If your model has columns that are not in the CSV, you can set static values for what is inserted using the ``static_mapping`` keyword argument. It will insert the provided values into every row in the database.
298284

299-
An example could be if you want to include the name of the source CSV file
300-
along with each row.
285+
An example could be if you want to include the name of the source CSV file along with each row.
301286

302287
Your model might look like this:
303288

@@ -338,6 +323,61 @@ And your loader would look like this:
338323
# Then save it.
339324
c.save()
340325
326+
327+
Extending with hooks
328+
--------------------
329+
330+
The ```CopyMapping`` loader includes optional hooks run before and after the COPY statement that loads your CSV into a temporary table and again before and again the INSERT statement that then slots it into your model.
331+
332+
If you have extra steps or more complicated logic you'd like to work into a loading routine, these hooks provide an opportunity to extend the base library.
333+
334+
To try them out, subclass ``CopyMapping`` and fill in as many of the optional hook methods below as you need.
335+
336+
.. code-block:: python
337+
338+
from postgres_copy import CopyMapping
339+
340+
341+
class HookedCopyMapping(CopyMapping):
342+
def pre_copy(self, cursor):
343+
print "pre_copy!"
344+
# Doing whatever you'd like here
345+
346+
def post_copy(self, cursor):
347+
print "post_copy!"
348+
# And here
349+
350+
def pre_insert(self, cursor):
351+
print "pre_insert!"
352+
# And here
353+
354+
def post_insert(self, cursor):
355+
print "post_insert!"
356+
# And finally here
357+
358+
359+
Now you can run that subclass as you normally would its parent
360+
361+
... code-block:: python
362+
363+
from myapp.models import Person
364+
from myapp.loaders import HookedCopyMapping
365+
from django.core.management.base import BaseCommand
366+
367+
368+
class Command(BaseCommand):
369+
370+
def handle(self, *args, **kwargs):
371+
# Note that we're using HookedCopyMapping here
372+
c = HookedCopyMapping(
373+
Person,
374+
'/path/to/my/data.csv',
375+
dict(name='NAME', number='NUMBER'),
376+
)
377+
# Then save it.
378+
c.save()
379+
380+
341381
Open-source resources
342382
---------------------
343383

0 commit comments

Comments
 (0)