Skip to content

Commit 491a428

Browse files
dnskrsteveburnett
authored andcommitted
Move Presto on Spark page to Administration and refine details
1 parent 1332125 commit 491a428

File tree

3 files changed

+26
-28
lines changed

3 files changed

+26
-28
lines changed

presto-docs/src/main/sphinx/admin.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,5 @@ Administration
1717
admin/session-property-managers
1818
admin/function-namespace-managers
1919
admin/dist-sort
20+
admin/spark
2021
admin/verifier
Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,41 @@
1-
=========================
2-
Executing Presto on Spark
3-
=========================
1+
===============
2+
Presto on Spark
3+
===============
44

5-
Presto on Spark makes it possible to leverage Spark as an execution framework
6-
for Presto queries. This is useful for queries that we want to run on thousands
7-
of nodes, requires 10s or 100s of terabytes of memory, and consume many CPU years.
5+
Presto on Spark makes it possible to leverage Spark as an execution engine for Presto queries.
6+
This is useful for queries that need to run on thousands of nodes,
7+
require 10s or 100s of terabytes of memory, and consume many CPU years.
88

99
Spark adds several useful features like resource isolation, fine grained resource
1010
management, and a scalable materialized exchange mechanism.
1111

12-
Steps
13-
-----
12+
Installation
13+
------------
1414

1515
Download the Presto Spark package tarball, :maven_download:`spark-package`
16-
and the Presto Spark launcher, :maven_download:`spark-launcher`. Keep both the
17-
files at, say, *example* directory. We assume here a two node Spark cluster
18-
with four cores each, thus giving us eight total cores.
16+
and the Presto Spark launcher, :maven_download:`spark-launcher`. Keep both files in the same directory.
17+
The example assumes there is a two-node Spark cluster with four cores each, which gives a total of eight cores.
1918

2019
The following is an example ``config.properties``:
2120

22-
.. code-block:: none
21+
.. code-block:: properties
2322
2423
task.concurrency=4
2524
task.max-worker-threads=4
2625
task.writer-count=4
27-
26+
2827
The details about properties are available at :doc:`/admin/properties`.
29-
Note that ``task.concurrency``, ``task.writer-count`` and
30-
``task.max-worker-threads`` are set to 4 each, since we have four cores per executor
31-
and want to synchronize with the relevant Spark submit arguments below.
32-
These values should be adjusted to keep all executor cores busy and
28+
Note that ``task.concurrency``, ``task.writer-count`` and ``task.max-worker-threads`` are set to 4 each,
29+
since there are four cores per executor and it aligned with Spark submit arguments below.
30+
These values should be adjusted to keep all executor cores busy and
3331
synchronize with :command:`spark-submit` parameters.
3432

35-
To execute Presto on Spark, first start your Spark cluster, which we will
36-
assume have the URL *spark://spark-master:7077*. Keep your
37-
time consuming query in a file called, say, *query.sql*. Run :command:`spark-submit`
38-
command from the *example* directory created earlier:
33+
Execution
34+
---------
35+
36+
To execute Presto on Spark, first start the Spark cluster, which is assumed to have
37+
the URL *spark://spark-master:7077*. Save the query in a file, for example, with the named *query.sql*.
38+
Run :command:`spark-submit` command from the directory where Presto on Spark is installed:
3939

4040
.. parsed-literal::
4141
@@ -52,12 +52,10 @@ command from the *example* directory created earlier:
5252
--schema default \\
5353
--file query.sql
5454
55-
The details about configuring catalogs are at :ref:`catalog_properties`. In
56-
Spark submit arguments, note the values of *executor-cores* (number of cores per
55+
The details about configuring catalogs are at :ref:`catalog_properties`.
56+
In Spark submit arguments, note the values of *executor-cores* (number of cores per
5757
executor in Spark) and *spark.task.cpus* (number of cores to allocate to each task
58-
in Spark). These are also equal to the number of cores (4 in this case) and are
58+
in Spark). These are also equal to the number of cores (4 in the example) and are
5959
same as some of the ``config.properties`` settings discussed above. This is to ensure that
6060
a single Presto on Spark task is run in a single Spark executor (This limitation may be
61-
temporary and is introduced to avoid duplicating broadcasted hash tables for every
62-
task).
63-
61+
temporary and is introduced to avoid duplicating broadcasted hash tables for every task).

presto-docs/src/main/sphinx/installation.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ Installation
66
:maxdepth: 1
77

88
installation/deployment
9-
installation/spark
109
installation/deploy-docker
1110
installation/deploy-brew
1211
installation/deploy-helm

0 commit comments

Comments
 (0)