Skip to content

Commit 0376055

Browse files
committed
Release 7.051
1 parent b7c38c6 commit 0376055

40 files changed

+46
-42
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# Changelog
2+
# 7.051
3+
* Much faster string table clone and much faster arrow write of string tables.
4+
25
# 7.050
36
* fix bug in stringtable clone.
47

deps.edn

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
:exec-fn codox.main/-main
1515
:exec-args {:group-id "techascent"
1616
:artifact-id "tech.ml.dataset"
17-
:version "7.050"
17+
:version "7.051"
1818
:name "TMD"
1919
:description "A Clojure high performance data processing system"
2020
:metadata {:doc/format :markdown}

docs/000-getting-started.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/100-walkthrough.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/200-quick-reference.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/columns-readers-and-datatypes.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/index.html

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/nippy-serialization-rocks.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/supported-datatypes.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.categorical.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.clipboard.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.column-filters.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.column.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.io.csv.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.io.datetime.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.io.string-row-parser.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.io.univocity.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.join.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.math.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.metamorph.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.modelling.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.print.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.reductions.apache-data-sketch.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.reductions.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.rolling.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.set.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.tensor.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.dataset.zip.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.arrow.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.clj-transit.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.fastexcel.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.guava.cache.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.parquet.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.poi.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/tech.v3.libs.tribuo.html

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/tech/v3/dataset/base.clj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1032,7 +1032,7 @@
10321032
:columns
10331033
(->>
10341034
(columns ds)
1035-
(pmap column->data)
1035+
(hamf/pmap column->data)
10361036
(vec))})
10371037

10381038

src/tech/v3/dataset/reductions.clj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -599,7 +599,7 @@ _unnamed [7 3]:
599599
rs-rfn (hamf-proto/->rfn rs)
600600
rs-init (hamf-proto/->init-val-fn rs)
601601
init-fn (hamf-fn/function _k (rs-init))]
602-
(doall (hamf/pgroups
602+
(dorun (hamf/pgroups
603603
(ds-base/row-count ds)
604604
(fn [^long sidx ^long eidx]
605605
(let [tid (.getId (Thread/currentThread))]

src/tech/v3/dataset/string_table.clj

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,12 +37,13 @@
3737
^objects rv (make-array String sz)
3838
local-int->str int->str
3939
local-data data]
40-
(dorun (hamf/pgroups sz (fn string-table-clone [^long sidx ^long eidx]
40+
(StringTable. (ArrayLists/toList (hamf/object-array int->str)) nil (dtype-proto/clone data))
41+
#_(dorun (hamf/pgroups sz (fn string-table-clone [^long sidx ^long eidx]
4142
(loop [sidx sidx]
4243
(when (< sidx eidx)
4344
(ArrayHelpers/aset rv sidx (.get int->str (.getLong local-data sidx)))
4445
(recur (inc sidx)))))))
45-
(ArrayLists/toList rv)))
46+
#_(ArrayLists/toList rv)))
4647
PStrTable
4748
(get-str-table [_this] {:int->str int->str
4849
:str->int str->int})

src/tech/v3/libs/arrow.clj

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,7 @@ Dependent block frames are not supported!!")
585585
offsets (dtype/make-list :int32)]
586586
(if (nil? prev-str-t)
587587
(dotimes [str-idx (count int->str)]
588-
(let [strdata (int->str str-idx)
588+
(let [strdata (.get int->str str-idx)
589589
_ (when (= strdata :failure)
590590
(throw (Exception. "Invalid string table - missing entries.")))
591591
str-bytes (.getBytes (str strdata))
@@ -1990,7 +1990,7 @@ Please use stream->dataset-seq.")))
19901990
metadata (meta col)]
19911991
(if (nil? prev-ds)
19921992
(assoc ds (metadata :name)
1993-
#:tech.v3.dataset{:data (str-table/string-table-from-strings col)
1993+
#:tech.v3.dataset{:data (tech.v3.dataset.base/ensure-column-string-table col)
19941994
:missing missing
19951995
:metadata metadata
19961996
:name (metadata :name)})

0 commit comments

Comments
 (0)