Skip to content

Commit

Permalink
updated to dataframes v1
Browse files Browse the repository at this point in the history
  • Loading branch information
xiaodaigh committed Apr 25, 2021
1 parent 46a3aa1 commit 12034a4
Show file tree
Hide file tree
Showing 8 changed files with 490 additions and 667 deletions.
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "JDF"
uuid = "babc3d20-cd49-4f60-a736-a8f9c08892d3"
authors = ["Dai ZJ <zhuojia.dai@gmail.com>"]
version = "0.4.1"
version = "0.4.2"

[deps]
Blosc = "a74b3585-a348-5f62-a45c-50e91977d574"
Expand All @@ -21,7 +21,7 @@ Blosc = "0.5, 0.6, 0.7"
BufferedStreams = "1.0"
CategoricalArrays = "0.5, 0.6, 0.7, 0.8, 0.9"
DataAPI = "1"
Missings = "0.4"
Missings = "1"
PooledArrays = "1"
StatsBase = "0.32, 0.33"
Tables = "1"
Expand Down
6 changes: 3 additions & 3 deletions README.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ a = dataset("datasets", "iris");
first(a, 2)


@time jdffile = savejdf("iris.jdf", a)
@time a2 = loadjdf("iris.jdf")
@time jdffile = JDF.save("iris.jdf", a)
@time a2 = DataFrame(JDF.load("iris.jdf"))


all(names(a2) .== names(a)) # true
all(skipmissing([all(a2[!,name] .== Array(a[!,name])) for name in names(a2)])) #true


a2_selected = loadjdf("iris.jdf", cols = [:Species, :SepalLength, :PetalWidth])
a2_selected = DataFrame(JDF.load("iris.jdf", cols = [:Species, :SepalLength, :PetalWidth]))


jdf"path/to/JDF.jdf"
Expand Down
32 changes: 6 additions & 26 deletions README.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,25 +69,16 @@ path_to_JDF = "path/to/JDF.jdf"
JDFFile(path_to_JDF)
```

#### Using `df[rows, cols]` syntax
You can load arbitrary `rows` and `cols` using the `df[rows, cols]` syntax. However, some of these operations are not yet optimized and hence may not be efficient.
#### Using `df[col::Symbol]` syntax
You can load arbitrary `col` using the `df[col]` syntax. However, some of these operations are not
yet optimized and hence may not be efficient.

```julia
afile = JDFFile("iris.jdf")

afile[!, :Species] # load Species column
afile[!, [:Species, :PetalLength]] # load Species and PetalLength column

afile[:, :Species] # load Species column
afile[:, [:Species, :PetalLength]] # load Species and PetalLength column

@view(afile[!, :Species]) # load Species column
@view(afile[!, [:Species, :PetalLength]]) # load Species and PetalLength column
afile[:Species] # load Species column
```

In fact most syntax for `a[rows, cols]` will work **except** for assignments i.e. `a[!, cols] = something` will **not** work.

This was developed to make it possible for [JLBoost.jl](https://github.com/xiaodaigh/JLBoost.jl) to fit models without loading the whole data into memory, and so the functionalities is kept to a minimum for now.

#### JDFFile is Table.jl columm-accessible

Expand Down Expand Up @@ -132,28 +123,17 @@ end
```

#### Metadata Names & Size from disk
You can obtain the column names and size (`nrow` and `ncol`) of a JDF, for
You can obtain the column names and number of columns `ncol` of a JDF, for
example:


```julia
using JDF, DataFrames
df = DataFrame(a = 1:3, b = 1:3)
savejdf(df, "plsdel.jdf")

JDF.save(df, "plsdel.jdf")

names(jdf"plsdel.jdf") # [:a, :b]

nrow(jdf"plsdel.jdf") # 3

ncol(jdf"plsdel.jdf") # 2

size(jdf"plsdel.jdf") # (3, 2)

size(jdf"plsdel.jdf", 1) # 2

size(jdf"plsdel.jdf", 2) # 3

# clean up
rm("plsdel.jdf", force = true, recursive = true)
```
Expand Down
Loading

2 comments on commit 12034a4

@xiaodaigh
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/35239

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.2 -m "<description of version>" 12034a4ac7486de449fb798e5c0e1437e47d6089
git push origin v0.4.2

Please sign in to comment.