-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtagging.html
1019 lines (1014 loc) · 36.4 KB
/
tagging.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<div>
<p>
<i>Release 1.1, 2018-04-30 (<a
href="http://hxlstandard.org/standard/tagging">permalink</a>, <a
href="http://hxlstandard.org/standard/1_0final/tagging/">previous
release</a>)</i>
</p>
<p>
<i>Part of the <a
href="http://hxlstandard.org/standard/1_1final/">HXL 1.1
standard</a>.</i>
</p>
<section id="intro">
<h1>1. Introduction</h1>
<p>
This document is part of the <a
href="http://hxlstandard.org">Humanitarian Exchange Language</a>
(HXL) version 1.1, a standard for increasing the efficiency and
effectiveness of data exchange during humanitarian crises. This
new version is fully backwards-compatible with data produced using
HXL 1.0 (released 18 March 2016), and adds several new features,
including JSON-based encodings and a standard way to refer to
taxonomies/controlled vocabularies. There are also several new
hashtags and attributes in the <a
href="http://hxlstandard.org/standard/1_1final/dictionary/">hashtag
dictionary</a>.
</p>
<p>
The intended audience for this specification is
information-management professionals and software developers who
require a formal definition of the HXL syntax. Most users who
simply want to add hashtags to their data may prefer the <a
href="http://hxlstandard.org/postcards/">HXL postcards</a> and the
tutorial information at <a
href="http://hxlstandard.org">hxlstandard.org</a>, as well as
interactive HXL tool support under development at the <a
href="https://tools.humdata.org">Humanitarian Data Exchange</a>
(HDX).
</p>
<p>
The HXL standard consists of two normative parts:
</p>
<ol>
<li>
HXL tagging conventions (this document) — instructions for
adding HXL hashtags to spreadsheets.
</li>
<li>
<a
href="http://hxlstandard.org/standard/1_1final/dictionary/">HXL
hashtag dictionary</a> — a list of hashtags for
identifying humanitarian data fields.
</li>
</ol>
<section id="design">
<h4>1.1. Design philosophy</h4>
<p>
HXL is a lightweight standard by design. Most data standards
dictate to users how they should collect and format their
data; HXL, on the other hand, encourages organisations to add
hashtags to their existing datasets, without requiring new
skills or software tools, and interferes as little as possible
in their current ways of working.
</p>
<p>
The primary focus of HXL is tabular-style data such as
spreadsheets or API output from database tables, which
represent the vast majority of the operational data collected
in the humanitarian sphere; however, HXL hashtags can
potentially have other applications, including labelling
attributes for map layers or identifying data types in SMS
messages.
</p>
</section>
<section id="audience">
<h3>1.2. Target audience</h3>
<p>
The standard's primary audiences are information-management
specialists who are familiar with spreadsheets or relational
databases, and computer programmers and database specialists
looking to consume data produced by those
information-management specialists.
</p>
</section>
<section id="terms">
<h3>1.3. Terms of use</h3>
<p>
HXL is available as an open standard — the working
groups have designed it for use with humanitarian data, but
people and organisations are welcome to use it for any purpose
they choose. Note, however, that users may not claim support
or endorsement from any members of the HXL working group or
the organisations for which they work. The authors offer no
warranty of any kind, so implementors use the standard at
their own risk.
</p>
<p>
The text of the standard itself is released into the public domain.
</p>
</section>
</section>
<section id="tagging">
<h2>2. Adding HXL hashtags to data</h2>
<section id="tagging.spreadsheet">
<h3>2.1 Spreadsheet Data eg. csv, xls, xlsx</h3>
<p>
Consider the following simple spreadsheet:
</p>
<table>
<thead>
<tr>
<th>LOCATION NAME</th>
<th>LOCATION CODE</th>
<th>NUMBER AFFECTED</th>
</tr>
</thead>
<tbody>
<tr>
<td>Camp A</td>
<td>01000001</td>
<td>2000</td>
</tr>
<tr>
<td>Camp B</td>
<td>01000002</td>
<td>750</td>
</tr>
<tr>
<td>Camp C</td>
<td>01000003</td>
<td>1920</td>
</tr>
</tbody>
</table>
<p>
Datasets like this — longer, of course, and with more
columns — are the backbone of humanitarian information
management, and they provide the input for most reports, maps,
and visualisations coming out of a crisis. Unfortunately,
creating those data products is time-consuming, and responders
have to duplicate the work from crisis to crisis and even
dataset to dataset, because it is hard to build reusable
software tools that can understand the many different ways
responders may choose to label their data. For example, the
text header of the last column could have appeared in dozens
of variants, and in several different languages:
</p>
<ul>
<li>Number affected</li>
<li>Affected</li>
<li>People affected</li>
<li># de personnes concernées</li>
<li>Afectadas/os</li>
<li>عدد الأشخاص المتضررين</li>
</ul>
<p>
Software tools need to be able to recognise that the figures
in the third column refer to the number of people affected
— regardless of how the data provider has decided to
label it in the spreadsheet — so HXL adds a second
header row containing short hashtags:
</p>
<table>
<thead>
<tr>
<th>LOCATION NAME</th>
<th>LOCATION CODE</th>
<th>NUMBER AFFECTED</th>
</tr>
<tr class="hxl">
<td>#loc +name</td>
<td>#loc +code</td>
<td>#affected</td>
</tr>
</thead>
<tbody>
<tr>
<td>Camp A</td>
<td>01000001</td>
<td>2000</td>
</tr>
<tr>
<td>Camp B</td>
<td>01000002</td>
<td>750</td>
</tr>
<tr>
<td>Camp C</td>
<td>01000003</td>
<td>1920</td>
</tr>
</tbody>
</table>
<p>
Now, whether the text at the top of the column reads
"Number affected" or "عدد الأشخاص المتضررين",
software for cleaning, validating, analysing, mapping, or
visualising the data can automatically recognise the hashtag
#affected and use the figures below accordingly.
</p>
<p>
More than one row of headers may appear above the HXL hashtag
row — the hashtags themselves act as a marker to show
automated systems where the headers end and the data begins:
</p>
<table>
<thead>
<tr>
<th colspan="2">CAMP INFORMATION</th>
<th>NEEDS</th>
</tr>
<tr>
<th>LOCATION NAME</th>
<th>LOCATION CODE</th>
<th>NUMBER AFFECTED</th>
</tr>
<tr class="hxl">
<td>#loc +name</td>
<td>#loc +code</td>
<td>#affected</td>
</tr>
</thead>
<tbody>
<tr>
<td>Camp A</td>
<td>01000001</td>
<td>2000</td>
</tr>
<tr>
<td>Camp B</td>
<td>01000002</td>
<td>750</td>
</tr>
<tr>
<td>Camp C</td>
<td>01000003</td>
<td>1920</td>
</tr>
</tbody>
</table>
<p>
HXL software should expect to find the hashtag row anywhere
within the first 25 rows of a dataset and should assume that
all rows below the hashtag row contain data.
</p>
</section>
<section id="tagging.json">
<h3>2.2 JSON data</h3>
<p>
It is becoming increasingly common for organisations to share
data through APIs. HXL is well placed to add interoperability
to that data through its support for JSON, the format
most-commonly used by APIs. HXL is purposely restricted to a
simplified subset of the full JSON specification. In this
simplified subset, the data must be laid out in a
non-hierarchical and tabular form. Two such forms are
currently supported.
</p>
<section id="tagging.json.objects">
<h4>2.2.1. Array of objects JSON style</h4>
<p>
This is a very common way for data to be presented where
each row is a lookup between a hashtag key and a value:
</p>
<pre>[
{
"#hashtag": value,
"#hashtag": value
},
{
"#hashtag": value,
"#hashtag": value
}
]</pre>
<p>
An example of this is shown below:
</p>
<pre>[
{
"#event+id": 1,
"#affected+killed": 1,
"#region": "Mediterranean",
"#meta+source+reliability": "Verified",
"#date+reported": "05/11/2015",
"#geo+lat": 36.891500,
"#geo+lon": 27.287700
},
{
"#event+id": 3,
"#affected+killed": 1,
"#region": "Central America incl. Mexico",
"#meta+source+reliability": "Partially Verified",
"#date+reported": "03/11/2015",
"#geo+lat": 15.956400,
"#geo+lon": -93.663100
}
]</pre>
<p>
For repeated same named hashtags eg. to express multiple
sectors using repeated #sector columns, the equivalent in
this format is a comma separated list of sectors (see <a
href="#special.repeating.list">4.1.1. The +list
attribute</a>), e.g.
</p>
<pre>"#sector": "WASH,health"</pre>
<p>
Note that the array of objects form does not allow for
human-readable headers. If there is a demand for these
— and the array of arrays form outlined below does not
suffice — then they will appear in a future version of
the standard.
</p>
<p>
<b>Note:</b> HXL allows hashtag attributes to appear in any
order, case-insensitive, with or without whitespace
separating them, so these are all considered equivalent:
"#affected+f+children", "#affected
+children +f", and
"#affected+Children+F". In JSON objects, it is
essential that the property names be consistent, so you
should take the following steps when converting a HXL
hashtag specification (hashtag and attributes) for use as a
JSON object property:
</p>
<ol>
<li>Convert to lowercase.</li>
<li>Remove all whitespace.</li>
<li>Present the attributes in US-ASCII alphabetical
order.</li>
</ol>
<p>
Following these rules, the JSON property-name representation
of the above HXL hashtag specification will always be
"#affected+children+f".
</p>
</section>
<section id="tagging.json.arrays">
<h4>2.2.2. Array of arrays JSON style</h4>
<p>
Although not widely used, this form is ideally suited to
visualisations because it is significantly more compact than
the Array of Objects format as the hashtags are only defined
once in the first element of the Array:
</p>
<pre>[
["#hashtag", "#hashtag"],
[value,value],
[value,value]
]</pre>
<p>
Below is an example:
</p>
<pre>[
["#event+id","#affected+killed","#region","#meta+source+reliability", "#date+reported","#geo+lat","#geo+lon"],
[1, 1, "Mediterranean", "Verified", "2015-11-05", 36.891500,27.287700],
[3, 1, "Central America incl. Mexico", "Partially Verified", "2015-11-03", 15.956400, -93.663099]
]</pre>
<p>
If headers are needed, they can be added as an extra array prior to the hashtags.
</p>
</section>
</section>
</section>
<section id="hashtags">
<h2>3. Structure of a HXL hashtag</h2>
<p>
The root of a HXL hashtag follows the same syntactic rules as a
Twitter hashtag: it begins with the octothorpe/pound sign
("#") and contains only unaccented Roman alphabetic
characters (so-called "ASCII letters,"
"a" to "z"), Arabic numerals
("0" to "9"), and the underscore symbol
("_"). The first character must be alphabetic, and
character case does not matter (#ADM1 and #adm1 are the same
hashtag, though lowercase is preferred for stylistic
reasons). Here are some examples of syntactically-valid HXL
hashtags:
</p>
<ul>
<li>#sector</li>
<li>#org</li>
<li>#households</li>
<li>#impact</li>
</ul>
<section id="hashtags.attributes">
<h3>3.1. Hashtag attributes</h3>
<p>
The core shared HXL hashtags describe high-level concepts like
an organisation (#org), geographical coordinates (#geo), a
humanitarian cluster or sector (#sector), the number of people
affected (#affected), or a subdivision of a country
(#adm1). Humanitarian datasets, however, often need to make
finer-grained distinctions. For example, is an organisation
the funder (donor) or the implementing agency? Does a column
contain the name of an administrative subdivision or its code?
</p>
<p>
HXL allows data providers to make these distinctions by attaching attributes to a hashtag.
</p>
<section id="hashtags.attributes.syntax">
<h4>3.1.1. Attribute syntax</h4>
<p>
Attributes follow the same syntactic rules as hashtags,
except that they begin with plus ("+") rather
than the octothorpe/pound sign ("#"), and follow
the hashtag , with optional whitespace separating the
attributes. A hashtag may have any number of attributes, and
order does not matter, so #org +funder +code has exactly the
same meaning as #org +code +funder. The following examples
show attributes attached to hashtags to refine their
meaning:
</p>
<ul>
<li>
#org +funder — the funding organisation (e.g. a donor)
</li>
<li>
#org +funder +code — a machine-readable code (of
some type) for a funding organisation
</li>
<li>
#adm1 +name +i_fr — the name of an administrative
level-one subdivision, in French
</li>
<li>
#adm1 +code +v_pcode — the P-code (place code) of an
administrative level-one subdivision (adding +v_pcode to
+code to further refine the code type)
</li>
<li>
#adm1 +code +v_iso3 — the ISO code of an
administrative level-one subdivision (adding +v_iso3 to
+code to further refine the code type)
</li>
</ul>
<p>
Software processing HXL data may ignore any attributes it
does not recognise and simply process the core hashtag. For
more information on +v_ attributes, see <a
href="#hashtags.attributes.vocab">3.1.4. Attributes for
controlled vocabularies</a>.
</p>
<p>
<b>Note:</b> all attributes beginning with a single alphabetic
character followed by an underscore (e.g. +v_ and +x_) are
reserved for special use in HXL.
</p>
</section>
<section id="hashtags.attributes.common">
<h4>3.1.2. Common attributes</h4>
<p>
Data providers may invent their own attributes to suit their
local data needs; however, there are some recommended common
attributes that will be useful across many data types. There
is a full list in the HXL core tagset<sup><a
href="#cmnt1">[a]</a></sup>, of which the following are some
highlights:
</p>
<blockquote>
+displaced +idps +injured +reached +refugees
</blockquote>
<p>
Classifications for counts or descriptions of people,
e.g. #affected +idps for the number of internally-displaced
people.
</p>
<blockquote>
+code
</blockquote>
<p>
The value is a unique, machine-readable code,
e.g. #adm1+code for an administrative level-one P-code.
</p>
<blockquote>
+f +m +i
</blockquote>
<p>
The value (usually a number) refers specifically to people
of a specific gender (or +i for non-binary), e.g. +affected
+f for the number of female people affected.
</p>
<blockquote>
+start +end
</blockquote>
<p>
The value refers to the beginning or end, e.g. #date+end for
the end date of an activity.
</p>
</section>
<section id="hashtags.attributes.lang">
<h4>3.1.3. Attributes for languages</h4>
<p>
Humanitarian crises often take place in multicultural areas,
where different local groups speak different languages;
furthermore, international responders helping with a crisis
may need to work in their own languages as well. As a
result, humanitarian datasets are sometimes multilingual,
listing the same information in e.g. French and Arabic, or
Dari and Pashto.
</p>
<p>
To make it easy to identify languages in HXL, the standard
recommends that all two-character alphabetic attributes be
reserved to represent ISO 639-1 language codes, such as
+i_en for English. Data providers can use the attributes to
mark the language of a column:
</p>
<table>
<thead>
<tr>
<th>PROJECT TITLE</th>
<th>TITRE DU PROJET</th>
</tr>
<tr class="hxl">
<td>#activity +i_en</td>
<td>#activity +i_fr</td>
</tr>
</thead>
<tbody>
<tr>
<td>Malaria treatments</td>
<td>Traitement du paludisme</td>
</tr>
<tr>
<td>Teacher training</td>
<td>Formation des enseignant(e)s</td>
</tr>
</tbody>
</table>
<p>
The following language attributes (not a comprehensive list)
are examples of those that might appear in international
humanitarian datasets:
</p>
<table>
<tbody>
<tr>
<th>+i_en</th>
<td>English</td>
<th>+i_fa</th>
<td>Dari / Farsi / Persian</td>
</tr>
<tr>
<th>+i_fr</th>
<td>French</td>
<th>+i_ps</th>
<td>Pashto</td>
</tr>
<tr>
<th>+i_ar</th>
<td>Arabic</td>
<th>+i_ms</th>
<td>Malay</td>
</tr>
<tr>
<th>+i_es</th>
<td>Spanish</td>
<th>+i_ur</th>
<td>Urdu</td>
</tr>
<tr>
<th>+i_ru</th>
<td>Russian</td>
<th>+i_tl</th>
<td>Tagalog</td>
</tr>
</tbody>
</table>
</section>
<section id="hashtags.attributes.vocab">
<h4>3.1.4. Attributes for controlled vocabularies</h4>
<p>
New in version 1.1.
</p>
<p>
While the +code HXL attribute indicates that a value is a
machine-readable code of some kind, it does not tell exactly
what vocabulary, code list, or taxonomy is in use. Beginning
with release 1.1, any HXL attribute beginning in +v_
represents a short identifier for a controlled vocabulary,
which you can look up (if desired) from a master HXL dataset
at [url]<sup><a href="#cmnt2">[b]</a></sup>.
</p>
<p>
For example, the HXL hashtag specification #country +code
+v_iso3 indicates that the column contains country codes
from the "v_iso3" vocabulary. To find more
information about that vocabulary, software (or a human
reader) may look it up in the master dataset, and find
information like this:
</p>
<table>
<thead>
<tr>
<th>Vocabulary identifier</th>
<th>Vocabulary name</th>
<th>Controlling org</th>
<th>Home page</th>
</tr>
<tr class="hxl">
<td>#vocab +att</td>
<td>#vocab +name</td>
<td>#org +controlling</td>
<td>#vocab +url +home</td>
</tr>
</thead>
<tbody>
<tr>
<td>+v_iso3</td>
<td>
ISO 3166-1 alpha 3: Codes for the representation of
names of countries and their subdivisions -- Part 1:
Country codes (3-letter identifiers)
</td>
<td>International Organization for Standardization (ISO)</td>
<td>https://www.iso.org/standard/63545.html</td>
</tr>
</tbody>
</table>
<p>
The #vocab hashtag is specific to this purpose, and not part
of the core HXL hashtags. The maintainers will be adding
additional columns to the dataset in the future, but these
four core columns, at a minimum, should always be present.
</p>
<p>
To request registering a new vocabulary identifier, please
post a message to the public <a
href="https://groups.google.com/forum/#!forum/hxlproject">hxlproject@googlegroups.com</a>
mailing list for discussion.
</p>
</section>
</section>
<section id="hashtags.extensions">
<h3>3.2. Creating extension hashtags</h3>
<p>
The HXL core hashtags include hashtags that will be generally
applicable to many humanitarian datasets, but it is impossible
to anticipate every hashtag for every humanitarian need. This
standard makes the following four recommendations<sup><a
href="#cmnt3">[c]</a></sup><sup><a
href="#cmnt4">[d]</a></sup><sup><a href="#cmnt5">[e]</a></sup>
for extending HXL hashtags and attributes:
</p>
<ol start="1">
<li>
Whenever possible, take an existing hashtag with a broader
meaning, and narrow it down with an attribute, e.g. #loc
+hospital.
</li>
<li>
When there is no applicable core hashtag, begin an extension
hashtag with "x_", so that it will not conflict
with any future HXL core hashtags, e.g. #x_toxicity.
</li>
<li>
When software finds a HXL hashtag that it does not
recognise, e.g. #x_toxicity, it should simply ignore the
column of data.
</li>
<li>
When software finds a HXL attribute that it does not
recognise, e.g. #loc+hospital, it should ignore the
attribute but still process the hashtag and any other
attributes it does recognise: in this case, it should behave
as if the dataset had contained simply #loc.
</li>
</ol>
<p>
<b>Note:</b> software designers may choose to warn about
unrecognised hashtags and attributes, to help with error
detection and quality control.
</p>
</section>
<section id="special">
<h1>4. Special cases</h1>
<p>
This section describes how to use HXL to deal with special
cases that do not normally fit well into a tabular data model.
</p>
<section id="special.repeating">
<h3>4.1. Repeating fields</h3>
<p>
A tabular format works poorly for repeated fields (e.g. an
activity taking place in more than one location); however,
using HXL hashtags, it is possible to design a spreadsheet
format that allows for a fixed upper-limit of repetition.
</p>
<p>
For example, the HXL hashtag for a generic geographical code
(like a P-code) is #loc+code. A 3W spreadsheet for a
specific country could allow room for up to three geocodes,
like this:
</p>
<table>
<thead>
<tr>
<th>P-CODE 1</th>
<th>P-CODE 2</th>
<th>P-CODE 3</th>
</tr>
<tr class="hxl">
<td>#loc+code</td>
<td>#loc+code</td>
<td>#loc+code</td>
</tr>
</thead>
<tbody>
<tr>
<td>020503</td>
</tr>
<tr>
<td>060107</td>
<td>060108</td>
</tr>
<tr>
<td>173219</td>
</tr>
<tr>
<td>530012</td>
</tr>
<tr>
<td>530013</td>
<td>530015</td>
<td>279333</td>
</tr>
</tbody>
</table>
<p>
This approach is not currently possible with the <a
href="#tagging.json.objects">Array of objects</a> JSON
style, since any key may appear only once in a JSON
object. Using the Array of arrays JSON encoding, the above
example would look like this:
</p>
<pre>[
["P-CODE 1", "P-CODE 2", "P-CODE 3"],
["#loc+code", "#loc+code", "#loc+code"],
["020503"],
["060107", "060108"],
["173219"],
["530013", "530015", "279333"]
]</pre>
<p>
By reading the HXL hashtag, processing software can easily
recognize that the three columns represent (up to) three
values for the same field, even though the full column
titles differ, and even if the authors of the processing
software knew nothing about the specific conventions in use
in this country.
</p>
<section id="special.repeating.list">
<h4>4.1.1. The +list attribute</h4>
<p>
New in version 1.1.
</p>
<p>
As an alternative, HXL also supports the inclusion of
multiple values in a single field, such as a spreadsheet
cell, separated by a comma (or optionally, other
punctuation). Note that this method—while popular with
spreadsheet users—is not as reliable as using separate
columns, and will not allow you to use standard spreadsheet
functions for filtering and sorting while creating data. It
may also not be as well-supported by HXL-aware tools, such
as mapping and other visualisation services.
</p>
<p>
To include multiple values in a single cell, add the +list
attribute to the hashtag. Here is the earlier example recast
with multiple values in a single cell:
</p>
<table>
<thead>
<tr>
<th>P-CODE</th>
</tr>
<tr class="hxl">
<td>#loc +code +list</td>
</tr>
</thead>
<tbody>
<tr>
<td>020503</td>
</tr>
<tr>
<td>060107, 060108</td>
</tr>
<tr>
<td>173219</td>
</tr>
<tr>
<td>530012</td>
</tr>
<tr>
<td>530013, 530015, 279333</td>
</tr>
</tbody>
</table>
<p>
This is the only approach currently possible for
representing repeating fields using the Array of objects
JSON style:
</p>
<pre>[
{
"#loc+code+list": "020503"
},
{
"#loc+code+list": "060107,060108"
},
{
"#loc+code+list": "173219"
},
{
"#loc+code+list": "530012"
},
{
"#loc+code+list": "530013,530015,279333"
}
]</pre>
<p>
With the <a href="#tagging.json.arrays">Array of arrays</a>
JSON style, the above example would look much like the
spreadsheet table:
</p>
<pre>[
["#loc+code"],
["020503"],
["060107,060108"],
["173219"],
["530013,530015,279333"]
]</pre>
<p>
Note, again, that some HXL-aware software might not process
this approach correctly, and that it can affect the display
of reports, charts, and maps.
</p>
</section>
</section>
<section id="special.wide">
<h3>4.2. "Wide" (series) data</h3>
<p>
"Wide" datasets are optimised for reading rather
than machine processing. They place a series of data (usually
numbers) across a row, showing how information varies over
time, geographical area, demographic groups, or some other
criterion. Here is a simple, non-HXL example listing the
number of people of concern in each region during four
different years.
</p>
<table>
<thead>
<tr>
<th>REGION</th>
<th>2008</th>
<th>2009</th>
<th>2010</th>
<th>2011</th>
</tr>
</thead>
<tbody>
<tr>
<td>Coast District</td>
<td>0</td>
<td>30</td>
<td>100</td>
<td>250</td>
</tr>
<tr>
<td>Mountain District</td>
<td>15</td>
<td>75</td>
<td>30</td>
<td>45</td>
</tr>
</tbody>
</table>
<p>
This table presents some special challenges for tagging,
because the columns headed "2008" to "2011" all represent the
same kind of data, but in different years. Tagging them all as
simply #affected loses important information about the
series. The solution is to use an extra attribute, +label, to
specify that the information in the header is a label for the
data series:
</p>
<table>
<thead>
<tr>
<th>REGION</th>
<th>2008</th>
<th>2009</th>
<th>2010</th>
<th>2011</th>
</tr>
<tr class="hxl">
<td>#adm1 +name</td>
<td>#affected+label</td>
<td>#affected+label</td>
<td>#affected+label</td>
<td>#affected+label</td>
</tr>
</thead>
<tbody>
<tr>
<td>Coast District</td>
<td>0</td>
<td>30</td>
<td>100</td>
<td>250</td>
</tr>
<tr>
<td>Mountain District</td>
<td>15</td>
<td>75</td>
<td>30</td>
<td>45</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="changes">
<h2>Appendix A: Changes from previous versions</h2>
<p>
Major changes from <a
href="https://hxlstandard.org/standard/1_0final">1.0 final</a>
to 1.1 final:
</p>
<ul>
<li>Added JSON encodings.</li>
<li>Added +v_* vocabulary attributes.</li>
<li>Prefix language attributes with +i_ so that +fr in HXL 1.0 becomes +i_fr in HXL 1.1.</li>
<li>Added +list attribute and convention for multiple values in a single cell.</li>
<li>Reserved all attributes beginning with a single letter
followed by underscore for future use.</li>
</ul>
<p>
Changes from <a href="https://hxlstandard.org/standard/1_0beta/tagging/">1.0 beta</a> to 1.0 final:
</p>
<ul>
<li>Explicitly state that the text of the standard is released into the public domain.</li>
<li>Many minor copy-editing and text-formatting changes.</li>
</ul>
<p>
Changes from <a href="https://hxlstandard.org/standard/1_0alpha/tagging/">1.0 alpha</a> to 1.0 beta:
</p>
<ul>
<li>Removed compact disaggregated syntax and language extensions.</li>
<li>Added hashtag attributes.</li>
<li>Added recommendations for language attributes.</li>
</ul>
</section>
<section id="grammar">
<h2>Appendix B: Formal grammar of a HXL hashtag </h2>
<p>
The following Backus-Naur Form grammar, with regular
expressions, defines the allowed content of a HXL hashtag
(terminals are in uppercase):
</p>
<blockqoute>
<pre><hxl-tagspec> ::= <hashtag>
| <hashtag> <attributes>
| <hashtag> WHITESPACE <attributes>
<attributes> ::= <attribute>
| <attributes> <attribute>
| <attributes> WHITESPACE <attribute>
<hashtag> ::= "#" TOKEN
<attribute> ::= "+" TOKEN
TOKEN ::= /[a-zA-Z][a-zA-Z0-9_]*/
WHITESPACE ::= /[ \t\n\r]+/</pre>
</blockqoute>
</section>
<section id="credits">
<h2>Appendix C: Credits</h2>
<p>
HXL is a group effort of many people and organisations,
including a wider community of over 100 members of the <a
href="mailto:hxlproject@googlegroups.com">hxlproject@googlegroups.com</a>
public mailing list. C.J. Hendrix (OCHA) was the founder of the
HXL standards effort, and Carsten Keßler (Hunter College) was
the original technical lead. Since 2013, Sarah Telford (OCHA)
has been overall programme manager and David Megginson (OCHA)
has served as standards lead and chair, while John Crowley
helped with outreach and governance in late 2014 and early 2015,
and Aidan McGuire (ScraperWiki) has provided project management
since 2015. Generous funding for HXL research and development in
2014 came from the Humanitarian Innovation Fund, and OCHA has
supported continuing work. The Paul Allen Foundation has
generously agreed to fund HXL work through 2016.
</p>
<p>
During 2014, the HXL Working Group included Albert Gembara
(USAID), Andrej Verity (OCHA), Andrew Alspach (UNHCR), David