Error changing Dataset using Part

Question

Update

It was a bug in the documentation of V10.0, this functionality was not implemented yet, and V10.1 changed the documentation, removing it. It's a pity, because it's a very useful operations, common in other languages like R. I miss data.frame like notation in Mathematica.

Mathematica graphics

In the new guide Computation With Structured Datasets we can find this part, on how to change a Dataset

enter image description here

But if we create a Dataset like:

ds=Dataset[{<|"a"->1,"b"->"x"|>,<|"a"->2,"b"->"y"|>,<|"a"->6,"b"->"z"|>}];

And then make:

ds[[1, 1]] = 2

Or, closer to my real case test:

ds[[All, "a"]] = Accumulate@Normal@ds[[All, "a"]]

We get an error:

"Part specification ds[[1,1]] is longer than depth of object"

"Part specification ds[[All,1]] is longer than depth of object. "

Is this a Bug?

Setting is not working on Dataset as stated by documentation.

This post on Wolfram Community

This is no longer documented to work as of 10.1.0. As Tali mentions in his answer below, the inclusion of this comment in the original documentation was erroneous. — Stefan R
– Stefan R, Commented Jun 8, 2015 at 15:07
@StefanR I know about that. But this would be a nice way to handler data, and should be considered in the future. In R, it's a very natural way to do Data Frame manipulations. — Murta
– Murta, Commented Jun 8, 2015 at 18:41

J. M.'s missing motivation · Accepted Answer · 2017-04-15 15:49:57Z

35

I'm the developer of Dataset.

Yes, this is a gross documentation oversight. We planned this functionality but had to push it back to a point release. Somehow no-one caught this piece of legacy documentation.

I've filed a bug on the documentation problem right now, it's easy to fix.

As for when L-value assignment will be available, I'm hoping 10.0.1 or 10.0.2, which are in the next month or two. It gets complicated, because you might well want to write things like:

dataset[ Select[#age > 30&] , "salary"] *= 2

That's certainly a powerful kind of operation, but also hard to implement. Even part-like assignments can get complicated when you are assigning multidimensional datasets to each other.

Thanks for trying the functionality, though!

edited Apr 15, 2017 at 15:49

J. M.'s missing motivation

127k11 gold badges411 silver badges591 bronze badges

answered Jul 10, 2014 at 18:43

Taliesin Beynon

10.8k47 silver badges52 bronze badges

3

$\begingroup$ Tks for your clarification. I'll wait for it, it's a very useful operation and I'm happy that I won't need to wait for V11. Using this opportunity, have you saw this post in Wolfram Community about Dataset memory consumption? There are plans to efficient Tabular Data in V10? $\endgroup$

Murta
– Murta

2014-07-10 19:31:52 +00:00
Commented Jul 10, 2014 at 19:31
2

$\begingroup$ @Murta Yes, moving to column-oriented will make things much better. But before I could do that I had to lay the groundwork in the form of a type system that could represent the "logical shape", even if the "physical layout" is different. And of course Leonid is working on making this whole process scale to out-of-core computation against data that lives on disk. $\endgroup$

Taliesin Beynon
– Taliesin Beynon

2014-07-10 20:26:03 +00:00
Commented Jul 10, 2014 at 20:26
3

$\begingroup$ I'm wondering what's the status of the problem. Looks like it's not in V11.1. $\endgroup$

xslittlegrass
– xslittlegrass

2017-04-01 16:47:17 +00:00
Commented Apr 1, 2017 at 16:47
1

$\begingroup$ @xslittlegrass which problem? mutable updating of Datasets? i've implemented the kernel functionality that is required for it, but I don't have immediate plans to do it for Dataset. However, see the answer I just posted. $\endgroup$

Taliesin Beynon
– Taliesin Beynon

2017-04-05 15:55:16 +00:00
Commented Apr 5, 2017 at 15:55
3

$\begingroup$ @xslittlegrass Mr. Wizard's answer is correct. With a lot of optimization work we could make Dataset opportunistically store tables in column-oriented form. Indeed that's always been the plan. But that will take months to implement properly and currently my priorities lie with neural networks. $\endgroup$

Taliesin Beynon
– Taliesin Beynon

2017-04-05 16:59:47 +00:00
Commented Apr 5, 2017 at 16:59

| Show 3 more comments

Taliesin Beynon · Accepted Answer · 2017-04-05 15:58:17Z

22

I have implemented the underlying kernel functionality that is needed to make this possible. However it is not yet implemented on the Dataset side. I don't think this will happen in the immediate future owing to other priorities.

Here is a stop-gap that implements a simple version of mutable updating, this of course is not production-grade. I'm happy for anyone who wants to modify this answer to extend its functionality, add error handling, etc.

Unprotect[Dataset];
Language`SetMutationHandler[Dataset, DatasetMutationHandler];

SetAttributes[DatasetMutationHandler, HoldAllComplete];
DatasetMutationHandler[Set[sym_Symbol[[args___]], newvalue_]] := Block[{tmp},
    tmp = Normal[sym];
    tmp[[args]] = If[Dataset`ValidDatasetQ[newvalue], Normal[newvalue], newvalue];
    sym = Dataset[tmp];
];

You can use it as follows:

In[51]:= d = Dataset[{1, 2, 3}];
d[[2 ;; 3]] = 99;
Normal[d]

Out[53]= {1, 99, 99}

answered Apr 5, 2017 at 15:58

Taliesin Beynon

10.8k47 silver badges52 bronze badges

$\begingroup$ I see that this function was added in 10.4. Is it already reliable and usable there? (I do not have a use for it at this moment, just asking for the future.) $\endgroup$

Szabolcs
– Szabolcs

2017-04-05 18:35:51 +00:00
Commented Apr 5, 2017 at 18:35
2

$\begingroup$ @Szabolcs yes, it is. it's used in production to implement CloudExpression. you may notice however that Language`HasMutationHandlerQ returns the opposite of the correct answer, but it's not a very important function. $\endgroup$

Taliesin Beynon
– Taliesin Beynon

2017-04-06 00:25:02 +00:00
Commented Apr 6, 2017 at 0:25

Add a comment |

WReach · Accepted Answer · 2014-07-27 00:10:43Z

19

In lieu of Set, the Query syntax offers various ways to update selective elements of a dataset. For example, we can change the value of the field a in the first row like this:

ds[{1 -> (<| #, "a" -> 999|> &)}]

dataset screenshot

or like this:

ds[{1 -> Query[{"a" -> (999 &)}]}]

dataset screenshot

Multiple fields can be updated simultaneously:

ds[{1 -> (<| #, "a" -> 999, "b" -> "ZZZ" |> &)}]

dataset screenshot

We can update selective rows, in this case field "b" in rows with even a:

ds[All, If[EvenQ[#a], <| #, "b" -> "!!!!"|>, #] &]

dataset screenshot

The accumulation use case can be accomplished like this:

With[{a = ds[Accumulate, "a"]}
, ds @ MapIndexed[<| #, "a" -> a[[First@#2]] |> &]
]

dataset screenshot

or like this:

Module[{acc = 0}, ds[All, {"a" -> (acc += # &)}]]

dataset screenshot

Note that none of these operations involve destructively altering the dataset, so they should all read ds = ds[...] if desired. Presumably Set will eventually perform destructive updates in those restricted circumstances that Mathematica tolerates mutation.

answered Jul 27, 2014 at 0:10

WReach

69.8k4 gold badges167 silver badges275 bronze badges

1

$\begingroup$ Examples such as these are sure to increase the fun factor for the WRI employees working to compile the Query language into SQL ;) $\endgroup$

WReach
– WReach

2014-07-27 00:11:26 +00:00
Commented Jul 27, 2014 at 0:11
1

$\begingroup$ Nice examples. +1. $\endgroup$

Murta
– Murta

2014-07-27 00:15:50 +00:00
Commented Jul 27, 2014 at 0:15
$\begingroup$ @WReach, are WRI employees working to compile the Query language into SQL? Will this be brought into DatabaseLink? $\endgroup$

ArgentoSapiens
– ArgentoSapiens

2014-11-06 16:24:08 +00:00
Commented Nov 6, 2014 at 16:24
1

$\begingroup$ @ArgentoSapiens I have no current information about this. My glib comment was based upon the fact that pre-release versions of the V10 documentation contained extensive references to such capability. Those references were withdrawn very late, just before the official V10 release. I speculate that the functionality under discussion in the question would (or did) prove to be challenging to support across multiple back-end technologies. $\endgroup$

WReach
– WReach

2014-11-06 16:55:05 +00:00
Commented Nov 6, 2014 at 16:55

Add a comment |

Silvia · Accepted Answer · 2014-07-10 10:45:27Z

17

Though I don't know what is the efficiency impact of it, a workaround could be converting the Dataset to Association by Normal, making the update on the Association, then converting it back to Dataset.

ds = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>, <|"a" -> 6, "b" -> "z"|>}]

ds = Module[{temp = Normal[ds]},
            temp[[All, "a"]] = Accumulate[temp[[All, "a"]]];
            temp // Dataset]

Dataset updating

answered Jul 10, 2014 at 10:45

Silvia

27.7k3 gold badges86 silver badges172 bronze badges

Add a comment |

kilasuelika · Accepted Answer · 2022-03-01 04:56:09Z

2

It looks like that in Mathematica 13, there is still no easy way to modify values in a Dataset.

answered Mar 1, 2022 at 4:56

kilasuelika

2291 silver badge4 bronze badges

1

$\begingroup$ Could you please add a short illustrative example using v13? Thanks. $\endgroup$

Syed
– Syed

2022-03-01 05:08:22 +00:00
Commented Mar 1, 2022 at 5:08

Add a comment |

Stack Exchange Network

Error changing Dataset using Part

5 Answers 5

Linked

Hot Network Questions

Error changing Dataset using Part

5 Answers 5

Linked

Related

Hot Network Questions