41
$\begingroup$

Update

It was a bug in the documentation of V10.0, this functionality was not implemented yet, and V10.1 changed the documentation, removing it. It's a pity, because it's a very useful operations, common in other languages like R. I miss data.frame like notation in Mathematica.

Mathematica graphics


In the new guide Computation With Structured Datasets we can find this part, on how to change a Dataset

enter image description here

But if we create a Dataset like:

ds=Dataset[{<|"a"->1,"b"->"x"|>,<|"a"->2,"b"->"y"|>,<|"a"->6,"b"->"z"|>}];

And then make:

ds[[1, 1]] = 2

Or, closer to my real case test:

ds[[All, "a"]] = Accumulate@Normal@ds[[All, "a"]]

We get an error:

"Part specification ds[[1,1]] is longer than depth of object"

"Part specification ds[[All,1]] is longer than depth of object. "

Is this a Bug?

Setting is not working on Dataset as stated by documentation.

This post on Wolfram Community

$\endgroup$
7
  • 2
    $\begingroup$ Unfortunately not in V10.0.1 yet... $\endgroup$ Commented Sep 17, 2014 at 1:42
  • 1
    $\begingroup$ StringReplace[%,"V10.0.1"-> "V10.0.2"] $\endgroup$ Commented Dec 11, 2014 at 1:35
  • 1
    $\begingroup$ StringReplace[%%,"V10.0.1"-> "V10.1.0"] $\endgroup$ Commented Mar 30, 2015 at 21:01
  • 1
    $\begingroup$ This is no longer documented to work as of 10.1.0. As Tali mentions in his answer below, the inclusion of this comment in the original documentation was erroneous. $\endgroup$ Commented Jun 8, 2015 at 15:07
  • 1
    $\begingroup$ @StefanR I know about that. But this would be a nice way to handler data, and should be considered in the future. In R, it's a very natural way to do Data Frame manipulations. $\endgroup$ Commented Jun 8, 2015 at 18:41

5 Answers 5

35
$\begingroup$

I'm the developer of Dataset.

Yes, this is a gross documentation oversight. We planned this functionality but had to push it back to a point release. Somehow no-one caught this piece of legacy documentation.

I've filed a bug on the documentation problem right now, it's easy to fix.

As for when L-value assignment will be available, I'm hoping 10.0.1 or 10.0.2, which are in the next month or two. It gets complicated, because you might well want to write things like:

dataset[ Select[#age > 30&] , "salary"] *= 2

That's certainly a powerful kind of operation, but also hard to implement. Even part-like assignments can get complicated when you are assigning multidimensional datasets to each other.

Thanks for trying the functionality, though!

$\endgroup$
8
  • 3
    $\begingroup$ Tks for your clarification. I'll wait for it, it's a very useful operation and I'm happy that I won't need to wait for V11. Using this opportunity, have you saw this post in Wolfram Community about Dataset memory consumption? There are plans to efficient Tabular Data in V10? $\endgroup$ Commented Jul 10, 2014 at 19:31
  • 2
    $\begingroup$ @Murta Yes, moving to column-oriented will make things much better. But before I could do that I had to lay the groundwork in the form of a type system that could represent the "logical shape", even if the "physical layout" is different. And of course Leonid is working on making this whole process scale to out-of-core computation against data that lives on disk. $\endgroup$ Commented Jul 10, 2014 at 20:26
  • 3
    $\begingroup$ I'm wondering what's the status of the problem. Looks like it's not in V11.1. $\endgroup$ Commented Apr 1, 2017 at 16:47
  • 1
    $\begingroup$ @xslittlegrass which problem? mutable updating of Datasets? i've implemented the kernel functionality that is required for it, but I don't have immediate plans to do it for Dataset. However, see the answer I just posted. $\endgroup$ Commented Apr 5, 2017 at 15:55
  • 3
    $\begingroup$ @xslittlegrass Mr. Wizard's answer is correct. With a lot of optimization work we could make Dataset opportunistically store tables in column-oriented form. Indeed that's always been the plan. But that will take months to implement properly and currently my priorities lie with neural networks. $\endgroup$ Commented Apr 5, 2017 at 16:59
22
$\begingroup$

I have implemented the underlying kernel functionality that is needed to make this possible. However it is not yet implemented on the Dataset side. I don't think this will happen in the immediate future owing to other priorities.

Here is a stop-gap that implements a simple version of mutable updating, this of course is not production-grade. I'm happy for anyone who wants to modify this answer to extend its functionality, add error handling, etc.

Unprotect[Dataset];
Language`SetMutationHandler[Dataset, DatasetMutationHandler];

SetAttributes[DatasetMutationHandler, HoldAllComplete];
DatasetMutationHandler[Set[sym_Symbol[[args___]], newvalue_]] := Block[{tmp},
    tmp = Normal[sym];
    tmp[[args]] = If[Dataset`ValidDatasetQ[newvalue], Normal[newvalue], newvalue];
    sym = Dataset[tmp];
];

You can use it as follows:

In[51]:= d = Dataset[{1, 2, 3}];
d[[2 ;; 3]] = 99;
Normal[d]

Out[53]= {1, 99, 99}
$\endgroup$
2
  • $\begingroup$ I see that this function was added in 10.4. Is it already reliable and usable there? (I do not have a use for it at this moment, just asking for the future.) $\endgroup$ Commented Apr 5, 2017 at 18:35
  • 2
    $\begingroup$ @Szabolcs yes, it is. it's used in production to implement CloudExpression. you may notice however that Language`HasMutationHandlerQ returns the opposite of the correct answer, but it's not a very important function. $\endgroup$ Commented Apr 6, 2017 at 0:25
19
$\begingroup$

In lieu of Set, the Query syntax offers various ways to update selective elements of a dataset. For example, we can change the value of the field a in the first row like this:

ds[{1 -> (<| #, "a" -> 999|> &)}]

dataset screenshot

or like this:

ds[{1 -> Query[{"a" -> (999 &)}]}]

dataset screenshot

Multiple fields can be updated simultaneously:

ds[{1 -> (<| #, "a" -> 999, "b" -> "ZZZ" |> &)}]

dataset screenshot

We can update selective rows, in this case field "b" in rows with even a:

ds[All, If[EvenQ[#a], <| #, "b" -> "!!!!"|>, #] &]

dataset screenshot

The accumulation use case can be accomplished like this:

With[{a = ds[Accumulate, "a"]}
, ds @ MapIndexed[<| #, "a" -> a[[First@#2]] |> &]
]

dataset screenshot

or like this:

Module[{acc = 0}, ds[All, {"a" -> (acc += # &)}]]

dataset screenshot

Note that none of these operations involve destructively altering the dataset, so they should all read ds = ds[...] if desired. Presumably Set will eventually perform destructive updates in those restricted circumstances that Mathematica tolerates mutation.

$\endgroup$
4
  • 1
    $\begingroup$ Examples such as these are sure to increase the fun factor for the WRI employees working to compile the Query language into SQL ;) $\endgroup$ Commented Jul 27, 2014 at 0:11
  • 1
    $\begingroup$ Nice examples. +1. $\endgroup$ Commented Jul 27, 2014 at 0:15
  • $\begingroup$ @WReach, are WRI employees working to compile the Query language into SQL? Will this be brought into DatabaseLink? $\endgroup$ Commented Nov 6, 2014 at 16:24
  • 1
    $\begingroup$ @ArgentoSapiens I have no current information about this. My glib comment was based upon the fact that pre-release versions of the V10 documentation contained extensive references to such capability. Those references were withdrawn very late, just before the official V10 release. I speculate that the functionality under discussion in the question would (or did) prove to be challenging to support across multiple back-end technologies. $\endgroup$ Commented Nov 6, 2014 at 16:55
17
$\begingroup$

Though I don't know what is the efficiency impact of it, a workaround could be converting the Dataset to Association by Normal, making the update on the Association, then converting it back to Dataset.

ds = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>, <|"a" -> 6, "b" -> "z"|>}]

ds = Module[{temp = Normal[ds]},
            temp[[All, "a"]] = Accumulate[temp[[All, "a"]]];
            temp // Dataset]

Dataset updating

$\endgroup$
2
$\begingroup$

It looks like that in Mathematica 13, there is still no easy way to modify values in a Dataset.

$\endgroup$
1
  • 1
    $\begingroup$ Could you please add a short illustrative example using v13? Thanks. $\endgroup$ Commented Mar 1, 2022 at 5:08

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.