You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.rst
+72-32Lines changed: 72 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -7,11 +7,9 @@ Quickly load comma-delimited data into a Django model using PostgreSQL's COPY co
7
7
Why and what for?
8
8
-----------------
9
9
10
-
`The people <http://www.californiacivicdata.org/about/>`_ who made this library are data journalists.
11
-
We are often downloading, cleaning and analyzing new data.
10
+
`The people <http://www.californiacivicdata.org/about/>`_ who made this library are data journalists. We are often downloading, cleaning and analyzing new data.
12
11
13
-
That means we write a load of loaders. You can usually do this by looping through each row
14
-
and saving it to the database using the Django's ORM `create method <https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.create>`_.
12
+
That means we write a load of loaders. You can usually do this by looping through each row and saving it to the database using the Django's ORM `create method <https://docs.djangoproject.com/en/1.10/ref/models/querysets/#django.db.models.query.QuerySet.create>`_.
15
13
16
14
.. code-block:: python
17
15
@@ -24,12 +22,9 @@ and saving it to the database using the Django's ORM `create method <https://doc
24
22
25
23
But if you have a big CSV, Django will rack up database queries and it can take a long time to finish.
26
24
27
-
Lucky for us, PostgreSQL has a built-in tool called `COPY <http://www.postgresql.org/docs/9.4/static/sql-copy.html>`_ that will hammer data into the
28
-
database with one quick query.
25
+
Lucky for us, PostgreSQL has a built-in tool called `COPY <http://www.postgresql.org/docs/9.4/static/sql-copy.html>`_ that will hammer data into the database with one quick query.
29
26
30
-
This package tries to make using COPY as easy any other database routine supported by Django. It is
31
-
largely based on the design of the `LayerMapping <https://docs.djangoproject.com/en/1.8/ref/contrib/gis/layermapping/>`_
32
-
utility for importing geospatial data.
27
+
This package tries to make using COPY as easy any other database routine supported by Django. It is largely based on the design of the `LayerMapping <https://docs.djangoproject.com/en/1.8/ref/contrib/gis/layermapping/>`_ utility for importing geospatial data.
33
28
34
29
.. code-block:: python
35
30
@@ -53,14 +48,12 @@ The package can be installed from the Python Package Index with `pip`.
53
48
54
49
$ pip install django-postgres-copy
55
50
56
-
You will of course have to have Django, PostgreSQL and an adapter between the
57
-
two (like psycopg2) already installed to put this library to use.
51
+
You will of course have to have Django, PostgreSQL and an adapter between the two (like psycopg2) already installed to put this library to use.
58
52
59
53
An example
60
54
----------
61
55
62
-
It all starts with a CSV file you'd like to load into your database. This library
63
-
is intended to be used with large files but here's something simple as an example.
56
+
It all starts with a CSV file you'd like to load into your database. This library is intended to be used with large files but here's something simple as an example.
64
57
65
58
.. code-block:: text
66
59
@@ -87,8 +80,7 @@ If the model hasn't been created in your database, that needs to happen.
87
80
88
81
$ python manage.py migrate
89
82
90
-
Create a loader that uses this library to load CSV data into the model. One place you could
91
-
put it is in a Django management command.
83
+
Create a loader that uses this library to load CSV data into the model. One place you could put it is in a Django management command.
By default, the COPY command cannot transform data on-the-fly as it is loaded into
198
-
the database.
188
+
By default, the COPY command cannot transform data on-the-fly as it is loaded into the database.
199
189
200
-
This library first loads the data into a temporary table
201
-
before inserting all records into the model table. So it is possible to use PostgreSQL's
202
-
built-in SQL methods to modify values during the insert.
190
+
This library first loads the data into a temporary table before inserting all records into the model table. So it is possible to use PostgreSQL's built-in SQL methods to modify values during the insert.
203
191
204
192
As an example, imagine a CSV that includes a column of yes and no values that you wanted to store in the database as 1 or 0 in an integer field.
205
193
@@ -230,9 +218,9 @@ Custom-field transformations
230
218
231
219
One approach is to create a custom Django field.
232
220
233
-
You can set a temporary data type for a column when it is first loaded, and then provide a SQL string for how to transform it during the insert into the model table. The transformation must include a string interpolation keyed to "name", where the name of the database column will be slotted.
221
+
You can provide a SQL statement for how to transform the data during the insert into the model table. The transformation must include a string interpolation keyed to "name", where the title of the database column will be slotted.
234
222
235
-
This example loads in the column as the forgiving `text <http://www.postgresql.org/docs/9.4/static/datatype-character.html>`_ data type and then uses a `CASE statement <http://www.postgresql.org/docs/9.4/static/plpgsql-control-structures.html>`_ to transforms the data using a CASE statement.
223
+
This example uses a `CASE statement <http://www.postgresql.org/docs/9.4/static/plpgsql-control-structures.html>`_ to transforms the data.
236
224
237
225
.. code-block:: python
238
226
@@ -264,9 +252,9 @@ Run your loader and it should finish fine.
264
252
Model-method transformations
265
253
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266
254
267
-
A second approach is to provide a SQL string for how to transform a field during the insert on the model itself. This lets you specific different transformations for different fields of the same type.
255
+
A second approach is to provide a SQL string for how to transform a field during the insert on the model itself. This lets you specify different transformations for different fields of the same type.
268
256
269
-
You must name the method so that the field name is sandwiched between ``copy_`` and ``_template``. It must return a string interpolation keyed to "name", where the name of the database column will be slotted.
257
+
You must name the method so that the field name is sandwiched between ``copy_`` and ``_template``. It must return a SQL statement with a string interpolation keyed to "name", where the name of the database column will be slotted.
270
258
271
259
For the example above, the model might be modified to look like this.
272
260
@@ -292,12 +280,9 @@ And that's it.
292
280
Inserting static values
293
281
-----------------------
294
282
295
-
If your model has columns that are not in the CSV, you can set static values
296
-
for what is inserted using the ``static_mapping`` keyword argument. It will
297
-
insert the provided values into every row in the database.
283
+
If your model has columns that are not in the CSV, you can set static values for what is inserted using the ``static_mapping`` keyword argument. It will insert the provided values into every row in the database.
298
284
299
-
An example could be if you want to include the name of the source CSV file
300
-
along with each row.
285
+
An example could be if you want to include the name of the source CSV file along with each row.
301
286
302
287
Your model might look like this:
303
288
@@ -338,6 +323,61 @@ And your loader would look like this:
338
323
# Then save it.
339
324
c.save()
340
325
326
+
327
+
Extending with hooks
328
+
--------------------
329
+
330
+
The ```CopyMapping`` loader includes optional hooks run before and after the COPY statement that loads your CSV into a temporary table and again before and again the INSERT statement that then slots it into your model.
331
+
332
+
If you have extra steps or more complicated logic you'd like to work into a loading routine, these hooks provide an opportunity to extend the base library.
333
+
334
+
To try them out, subclass ``CopyMapping`` and fill in as many of the optional hook methods below as you need.
335
+
336
+
.. code-block:: python
337
+
338
+
from postgres_copy import CopyMapping
339
+
340
+
341
+
classHookedCopyMapping(CopyMapping):
342
+
defpre_copy(self, cursor):
343
+
print"pre_copy!"
344
+
# Doing whatever you'd like here
345
+
346
+
defpost_copy(self, cursor):
347
+
print"post_copy!"
348
+
# And here
349
+
350
+
defpre_insert(self, cursor):
351
+
print"pre_insert!"
352
+
# And here
353
+
354
+
defpost_insert(self, cursor):
355
+
print"post_insert!"
356
+
# And finally here
357
+
358
+
359
+
Now you can run that subclass as you normally would its parent
360
+
361
+
... code-block:: python
362
+
363
+
from myapp.models import Person
364
+
from myapp.loaders import HookedCopyMapping
365
+
from django.core.management.base import BaseCommand
0 commit comments