Skip to content

Commit 5865ed9

Browse files
djtfmartincharvolantbrucehyslopmatthewandrewssbearcsiro
authored
Playbooks to support pipelines (#421)
* SOLR quoll updated to 8.60 flag for switching JTS version 'jts_use_1_16' see AtlasOfLivingAustralia/la-pipelines#108 * checkpoint commit for the work on AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 these scripts and inventories are a work in progress * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 fixes or SSH key creation ahd HDFS startup additional'pipelines' role for pipelines installation and additional setup of docker and SHP file resources these scripts and inventories are a work in progress * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 fixes SHP file download * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 jenkins install and loading of jobs. removal of some config files after debian fixes. * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 tags and additional spark env options * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 jenkins job definitions * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 override for inputPath for 'dataset-count-dump' * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 descriptions for datasetId property * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Script fixes to create HDFS directories and to move the docker directories of root partition * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Added a check for an existing installation, and skipping if HDFS data directories already present to avoid existing HDFS installs being wiped. * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Extra steps added to DWCA export job Fixes for hadoop slave config AWS deployment for spark/hadoop * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Extra steps added to help setup jenkins master/slave * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Extra steps added to help setup jenkins jobs * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 Private key locations and additional jenkins jobs * Work for AtlasOfLivingAustralia/la-pipelines#77 and AtlasOfLivingAustralia/la-pipelines#27 zk_host value for quoll environment dataset-archive-list config adding hdfs tools to PATH * Updates for jenkins jobs and collectory API keys * aws-image-service-pipelines-test installation updates to stored procedures (tied to changes on image-service#pipelines-bulk-services branch) removed GC configuration (incompatible with openJDK) additional log files for image-service for indexing and batch service monitoring * api key for image service * service installs for docker instances * set work DNS name to allow admin console navigation * Update sensitive data service version * No X here * additional local config * work for AtlasOfLivingAustralia/la-pipelines#232 Separate master and name node for HDFS * work for AtlasOfLivingAustralia/la-pipelines#232 Config for JackKnife configuration changed deprecated HDFS properties e.g. dfs.data.dir to new version * additions for clustering and solcloud 8.8 * includeClustering typo * life stage vocab config * life stage vocab config fix and sampling config * work for use SNAPSHOT 3 version for biocache-service additional config files required for field mapping updated facts.json SOLR 8.8.1 * additional deprecated field mappings * record_type, suitable_modelling mappings * mappings for IBRA, IMCRA * sort stateProvince by count by default (so that Aussie states float to top) * date_precision mapping * added mapping for occurrence_decade_i * additional deprecated mappings for fields noticed in configs * additional field mappings see AtlasOfLivingAustralia/la-pipelines#308 * removed the join for the export as this is causing a block on the table export when image loading is happening dwcaImportPath setting * mapping of rankID to taxonRankID for consistency with taxonRank term. userAssertions field mapping * assertionUserId mapping enum mapping for deprecated month values * changes for biocache quoll to use namematching-ws-test.ala.org.au and config changes for pipelines-field-config.json. * assertionUserId mapping * usehttp2 auth-test api keys lastAssertionDate * SOLR Cloud install for m6 servers - removal of GC options not compatible with Java11 Pipelines config changes Changes to add X-Request-ID tracking for nginx groups.json changes to biocache logging to reduce noise * fixes for AtlasOfLivingAustralia/la-pipelines#353 * updated name matching docker image * updated name matching service docker image * cassandra 3 install for pipelines * groups.json fixes * ala-namematching-service:v20200214-15 * reinstated output/heatmap directory * copy of pipelines config from biocache-service/config * mulgara inventory and JTS 1.18.1 * sync field mappings for AtlasOfLivingAustralia/la-pipelines#385 * updated mapping from AtlasOfLivingAustralia/biocache-service#626 * version 3.0.11-SNAPSHOT to biocache-test * (minor) remove nonprinting character * jenkins role: update to new repo, support nginx * AtlasOfLivingAustralia/la-pipelines#295 updated config field names to use pipelines fields * usr http2 and solr_downloadquery_maxthreads=10 * biocache version bump and additional pipelines-field-config.json * mapping of Fish -> Fishes, to fix SP Area report * content types facet * version 3.0.14 and use http2 increase in solr_connection_pool_size * docker image bump * include Bryophytes * Plant as parent for Bryophytes * image service increased timeout * solrcloud: add default for jts_use_1_18 * raw_name mapping and nameNotRecognised assertion mapping * pipelines jenkins job updates * more work to help set up test environment * setting banners in SOLR more work to help set up test environment * SDS configuration for test environment * NCI3 biocache set to use https://lists-test.ala.org.au additional tags for convienience * AtlasOfLivingAustralia/la-pipelines#479 sync of pipelines mapping from biocache-service * tomcat/9_connector: last fix, works now. tomcat: add support for tomcat_max_post_size tomcat9_connector support for max_post_size tomcat9_connector: another fix tomcat_connector: add support for max_post_size * solr.targetPath needed for dwca export * permission on java_home script and script approvals * solr.in.sh fix and solr_version=8.9.0 for mulgara * hadoop/spark/pipelines role fixes * fixes for the databox environment including: changing jenkins port to avoid conflicts, ARM docker images AtlasOfLivingAustralia/la-pipelines#519 * fixes for the databox environment including: changing jenkins port to avoid conflicts, ARM docker images AtlasOfLivingAustralia/la-pipelines#519 and HA * AtlasOfLivingAustralia/ala-infrastructure#847 added cron to purge biocache downloads greater then 180 days old * fix for dwca export * AtlasOfLivingAustralia/ala-infrastructure#847 removed user for cron (defaults to root) * Updates for image-service 1.1.0 * localsost -> localhost * work for AtlasOfLivingAustralia/la-pipelines#232 Separate master and name node for HDFS * removed the join for the export as this is causing a block on the table export when image loading is happening dwcaImportPath setting * SOLR Cloud install for m6 servers - removal of GC options not compatible with Java11 Pipelines config changes Changes to add X-Request-ID tracking for nginx groups.json changes to biocache logging to reduce noise * Rebased with master and created separate copies of biocache install to avoid breaking existing. * following PR feedback Co-authored-by: pal155 <Doug.Palmer@csiro.au> Co-authored-by: Bruce Hyslop <bruce.hyslop@csiro.au> Co-authored-by: Matt Andrews <Matt.Andrews@csiro.au> Co-authored-by: Simon Bear <simon.bear@csiro.au>
1 parent 0bb66f0 commit 5865ed9

File tree

147 files changed

+9413
-48
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+9413
-48
lines changed
+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
- hosts: biocache-service-clusterdb
2+
roles:
3+
- common
4+
- java
5+
- { role: tomcat, tomcat: tomcat9 }
6+
- webserver
7+
- biocache3-properties
8+
- biocache3-service
9+
- logger-client

ansible/hadoop.yml

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
- hosts: hadoop
2+
roles:
3+
- java
4+
- hadoop

ansible/library/tomcat9_connector

+10-2
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ options:
3333
- the port to use for connector
3434
- will use default port for protocol (8080 for HTTP or 8009 for AJP) if omitted
3535
required: false
36+
max_post_size:
37+
description:
38+
- max size for a post request
39+
- defaults to 2097152 (2MB)
3640
relaxed_query_chars:
3741
description:
3842
- A string containing the chars to relax to avoid requiring URL encoding
@@ -128,7 +132,7 @@ def default_port(name):
128132
return 8080 if name == "HTTP/1.1" else 8009
129133

130134

131-
def add_connector(aug, protocol, service, bind_addr, port, connection_timeout=20000, uri_encoding="UTF-8", redirect_port=443, relaxed_query_chars=""):
135+
def add_connector(aug, protocol, service, bind_addr, port, max_post_size, connection_timeout=20000, uri_encoding="UTF-8", redirect_port=443, relaxed_query_chars=""):
132136

133137
aug.defvar("service", "/files/server.xml/Server/Service[#attribute/name=\"%(service)s\"]" % locals())
134138

@@ -137,6 +141,8 @@ def add_connector(aug, protocol, service, bind_addr, port, connection_timeout=20
137141

138142
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/port" % locals(), str(port))
139143

144+
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/maxPostSize" % locals(), str(max_post_size))
145+
140146
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/protocol" % locals(), protocol)
141147

142148
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/address" % locals(), bind_addr)
@@ -167,6 +173,7 @@ def main():
167173
bind_addr=dict(default="127.0.0.1"),
168174
relaxed_query_chars=dict(default=""),
169175
port=dict(type="int"),
176+
max_post_size=dict(default=2097152, type="int"),
170177
connection_timeout=dict(default=20000, type="int"),
171178
uri_encoding=dict(default="UTF-8"),
172179
redirect_port=dict(default=443, type="int"),
@@ -185,6 +192,7 @@ def main():
185192
bind_addr = module.params["bind_addr"]
186193
relaxed_query_chars = module.params["relaxed_query_chars"]
187194
port = module.params["port"]
195+
max_post_size = module.params["max_post_size"]
188196
connection_timeout = module.params["connection_timeout"]
189197
uri_encoding = module.params["uri_encoding"]
190198
redirect_port = module.params["redirect_port"]
@@ -209,7 +217,7 @@ def main():
209217
if state == "absent":
210218
remove_connector(aug, name, port)
211219
else:
212-
add_connector(aug, name, service, bind_addr, port, connection_timeout, uri_encoding, redirect_port, relaxed_query_chars)
220+
add_connector(aug, name, service, bind_addr, port, max_post_size, connection_timeout, uri_encoding, redirect_port, relaxed_query_chars)
213221

214222
try:
215223
aug.save()

ansible/library/tomcat_connector

+10-2
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ options:
3333
- the port to use for connector
3434
- will use default port for protocol (8080 for HTTP or 8009 for AJP) if omitted
3535
required: false
36+
max_post_size:
37+
description:
38+
- max size for a post request
39+
- defaults to 2097152 (2MB)
3640
relaxed_query_chars:
3741
description:
3842
- A string containing the chars to relax to avoid requiring URL encoding
@@ -128,7 +132,7 @@ def default_port(name):
128132
return 8080 if name == "HTTP/1.1" else 8009
129133

130134

131-
def add_connector(aug, protocol, service, bind_addr, port, connection_timeout=20000, uri_encoding="UTF-8", redirect_port=443, relaxed_query_chars=""):
135+
def add_connector(aug, protocol, service, bind_addr, port, max_post_size, connection_timeout=20000, uri_encoding="UTF-8", redirect_port=443, relaxed_query_chars=""):
132136

133137
aug.defvar("service", "/files/server.xml/Server/Service[#attribute/name=\"%(service)s\"]" % locals())
134138

@@ -137,6 +141,8 @@ def add_connector(aug, protocol, service, bind_addr, port, connection_timeout=20
137141

138142
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/port" % locals(), str(port))
139143

144+
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/maxPostSize" % locals(), str(max_post_size))
145+
140146
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/protocol" % locals(), protocol)
141147

142148
aug.set("$service/Connector[#attribute/port=\"%(port)s\"]/#attribute/address" % locals(), bind_addr)
@@ -168,6 +174,7 @@ def main():
168174
bind_addr=dict(default="127.0.0.1"),
169175
relaxed_query_chars=dict(default=""),
170176
port=dict(type="int"),
177+
max_post_size=dict(default=2097152, type="int"),
171178
connection_timeout=dict(default=20000, type="int"),
172179
uri_encoding=dict(default="UTF-8"),
173180
redirect_port=dict(default=443, type="int"),
@@ -186,6 +193,7 @@ def main():
186193
bind_addr = module.params["bind_addr"]
187194
relaxed_query_chars = module.params["relaxed_query_chars"]
188195
port = module.params["port"]
196+
max_post_size = module.params["max_post_size"]
189197
connection_timeout = module.params["connection_timeout"]
190198
uri_encoding = module.params["uri_encoding"]
191199
redirect_port = module.params["redirect_port"]
@@ -210,7 +218,7 @@ def main():
210218
if state == "absent":
211219
remove_connector(aug, name, port)
212220
else:
213-
add_connector(aug, name, service, bind_addr, port, connection_timeout, uri_encoding, redirect_port, relaxed_query_chars)
221+
add_connector(aug, name, service, bind_addr, port, max_post_size, connection_timeout, uri_encoding, redirect_port, relaxed_query_chars)
214222

215223
try:
216224
aug.save()

ansible/pipelines.yml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
- hosts: all
2+
roles:
3+
- java
4+
5+
- hosts: all
6+
roles:
7+
- i18n
8+
9+
- hosts: hadoop
10+
roles:
11+
- hadoop
12+
13+
- hosts: spark
14+
roles:
15+
- spark
16+
17+
- hosts: jenkins
18+
roles:
19+
- jenkins-simple
20+
21+
- hosts: pipelines
22+
roles:
23+
- pipelines
24+
25+
- hosts: pipelines_jenkins
26+
roles:
27+
- pipelines_jenkins

ansible/roles/biocache-hub/templates/config/grouped_facets_ala.json

+5
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,11 @@
233233
{
234234
"sort":"index",
235235
"field":"occurrence_status"
236+
},
237+
{
238+
"sort": "index",
239+
"description": "Content types",
240+
"field": "contentTypes"
236241
}
237242
]
238243
},
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
/* Creates the cassandra 0.7.x schema necessary for biocache-store
2+
Run this file using:
3+
./cassandra-cli --host localhost --batch < create_cass_schema.txt
4+
*/
5+
6+
/* all keyspaces are created using the ByteOrderPreservingPartitioner see the cassandra.yaml file */
7+
create keyspace occ;
8+
9+
use occ;
10+
11+
create column family occ with comparator=UTF8Type and default_validation_class=UTF8Type
12+
and comment='The column family for occurrence records'
13+
and key_validation_class = 'UTF8Type'
14+
and compaction_strategy=LeveledCompactionStrategy
15+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
16+
and column_metadata=[{column_name: portalId, validation_class: UTF8Type, index_type: KEYS},
17+
{column_name: uuid, validation_class: UTF8Type, index_type: KEYS}];
18+
19+
create column family loc with comparator=UTF8Type
20+
and default_validation_class=UTF8Type
21+
and key_validation_class = 'UTF8Type'
22+
and comment ='The column family for locations'
23+
and compaction_strategy=LeveledCompactionStrategy
24+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'};
25+
26+
create column family attr with comparator=UTF8Type
27+
and default_validation_class=UTF8Type
28+
and key_validation_class = 'UTF8Type'
29+
and comment='The column family for attribution tracking'
30+
and compaction_strategy=LeveledCompactionStrategy;
31+
32+
create column family taxon with comparator=UTF8Type
33+
and default_validation_class=UTF8Type
34+
and key_validation_class = 'UTF8Type'
35+
and comment='The column family for taxon profile information'
36+
and compaction_strategy=LeveledCompactionStrategy
37+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'};
38+
39+
40+
/* update column family loc with comparator=UTF8Type and keys_cached=1.0 */
41+
42+
create column family qa with comparator=UTF8Type
43+
and default_validation_class=UTF8Type
44+
and key_validation_class = 'UTF8Type'
45+
and comment='The column family for quality assertions'
46+
and column_metadata=[{column_name: userId, validation_class: UTF8Type, index_type: KEYS},
47+
{column_name:code, validation_class: UTF8Type, index_type: KEYS}]
48+
and compaction_strategy=LeveledCompactionStrategy
49+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'};
50+
51+
create column family dellog
52+
with comparator = 'UTF8Type'
53+
and default_validation_class = 'UTF8Type'
54+
and key_validation_class = 'UTF8Type'
55+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
56+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
57+
and comment = 'The column family to log deleted information';
58+
59+
create column family duplicates
60+
with comparator = 'UTF8Type'
61+
and default_validation_class = 'UTF8Type'
62+
and key_validation_class = 'UTF8Type'
63+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
64+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
65+
and comment = 'The column family to store information about duplicates';
66+
67+
create column family occ_duplicates
68+
with comparator = 'UTF8Type'
69+
and default_validation_class = 'UTF8Type'
70+
and key_validation_class = 'UTF8Type'
71+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
72+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
73+
and comment = 'The column family to store information about duplicates';
74+
75+
create column family upload
76+
with comparator = 'UTF8Type'
77+
and default_validation_class = 'UTF8Type'
78+
and key_validation_class = 'UTF8Type'
79+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
80+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
81+
and comment = 'The column family to store information about dynamically uploaded datasets';
82+
83+
create column family outliers with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type'
84+
and comment='The column family for occurrence records' and gc_grace=2000;
85+
86+
create column family occ_outliers with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type'
87+
and comment='The column family for occurrence records' and gc_grace=2000;
88+
89+
update column family outliers with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type'
90+
and column_metadata=[{column_name: portalId, validation_class: UTF8Type, index_type: KEYS},
91+
{column_name: uuid, validation_class: UTF8Type, index_type: KEYS}];
92+
93+
update column family occ_outliers with comparator = 'UTF8Type' and default_validation_class = 'UTF8Type'
94+
and column_metadata=[{column_name: portalId, validation_class: UTF8Type, index_type: KEYS},
95+
{column_name: uuid, validation_class: UTF8Type, index_type: KEYS}];
96+
97+
98+
create column family queryassert
99+
with comparator = 'UTF8Type'
100+
and default_validation_class = 'UTF8Type'
101+
and key_validation_class = 'UTF8Type'
102+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
103+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
104+
and comment = 'The column family to store information about query based assertions';
105+
106+
update column family queryassert with column_metadata=[{column_name: uuid, validation_class: UTF8Type, index_type: KEYS}];
107+
108+
create column family distribution_outliers
109+
with comparator = 'UTF8Type'
110+
and default_validation_class = 'UTF8Type'
111+
and key_validation_class = 'UTF8Type'
112+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
113+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
114+
and comment = 'The column family to store information about expert distribution outlier records';
115+
116+
create column family qid
117+
with comparator = 'UTF8Type'
118+
and default_validation_class = 'UTF8Type'
119+
and key_validation_class = 'UTF8Type'
120+
and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
121+
and compaction_strategy_options = {'sstable_size_in_mb' : '200'}
122+
and comment = 'The column family to store information about stored query requests (qid)';
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CREATE KEYSPACE IF NOT EXISTS biocache WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'} AND durable_writes = true;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
- name: restart cassandra
2+
service: name=cassandra state=restarted enabled="yes"
3+
4+
- name: configure cassandra
5+
shell: 'cassandra-cli < /tmp/cassandra-schema.txt'
6+
7+
- name: configure cassandra
8+
shell: 'cqlsh < /tmp/cassandra3-schema.txt'
+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
- include: ../../common/tasks/setfacts.yml
2+
3+
- name: disable swap
4+
shell: "swapoff --all"
5+
# This fails in LXC containers, see:
6+
# https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/930652
7+
when: ansible_virtualization_type != 'lxc'
8+
tags:
9+
- biocache_db
10+
11+
- name: copy transient files to tmp (schemas etc)
12+
copy: src={{item}} dest=/tmp
13+
with_items:
14+
- cassandra/cassandra-schema.txt
15+
- cassandra/cassandra3-schema.txt
16+
tags:
17+
- biocache_db
18+
19+
- name: restart cassandra
20+
service: name=cassandra state=restarted enabled="yes"
21+
tags:
22+
- biocache_db
23+
24+
- name: ensure cassandra 1.x is running
25+
wait_for: port=9160 delay=30
26+
when: use_cassandra3 is not defined
27+
tags:
28+
- biocache_db
29+
30+
- name: create schema (cassandra 3)
31+
shell: "cqlsh < /tmp/cassandra3-schema.txt"
32+
tags:
33+
- biocache_db

0 commit comments

Comments
 (0)