Skip to content

Make column-based DataFrame ? #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
singularperturbation opened this issue Oct 1, 2017 · 4 comments
Open

Make column-based DataFrame ? #20

singularperturbation opened this issue Oct 1, 2017 · 4 comments

Comments

@singularperturbation
Copy link

What do you think about creating a DataFrame type that's more column-based (list of Column types) rather than row-based (seq[T] of tuples based on schema)?

Taking some of the example operations from example_01.nim:

# Existing
df.map(x => x.age).mean()

# New
df["age"].mean()

This (and other numeric operations on an individual column) should be faster because we could return a shallowCopy view of the DF's column rather than creating a new MappedDataFrame (copying the needed data) and then running the summary function on the new dataframe.

I'd like to try and do this as part of Hacktoberfest.

@narimiran
Copy link
Contributor

I'd like to try and do this as part of Hacktoberfest.

Any news/progress about this maybe?

@bluenote10
Copy link
Owner

bluenote10 commented Mar 8, 2018

I had started working on a more traditional (i.e. column based) data frame library here. I didn't had much time recently to work on that, so it is still very far away from being usable, but I have plans to pick this up again soon(ish).

@narimiran
Copy link
Contributor

Sorry to bring this up again (one whole year after the original post :)), but is there any progress with this?

It seems that kadro development is not very active. I guess the recommendation is to use NimData? Will it be further developed or your focus is on kadro?

@bluenote10
Copy link
Owner

@narimiran This is also still very high on my wishlist, but I'm just not sure when I will find the time. With kadro I'm still experimenting how I can get dynamically typed columns right (and probably use Arraymancer under the hood). Overall NimData and kadro address different use cases: NimData is iterator based with support for statically typed schemas, kadro is in-memory with dynamically typed schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants