3

This problem contains 3 separate dataframes. df1 represents the 'Total' of products 1,2,3, containing 'value1', 'value2' df2 represents the 'Customer1' of products 1,2,3, containing 'value1', 'value2' df3 represents the 'Customer2' of products 1,2,3, containing 'value1', 'value2'

df2 & df3 are essentially subsets of df1.

I would like to create another dataframe that subtracts df2&df3 from df1 and label this df4. I want df4 to be 'remaining customers' within the 'Market' Column.

This is what I have done so far

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

This produces the following result..

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

To create df4, I try the following code and get an error 'TypeError: unsupported operand type(s) for -: 'str' and 'str'' Can anyone help?

df4 = df1-(df2+df3)

print(df4)

4 Answers 4

3

Drop Market, set Product Code as the index, and perform index aligned arithmetic on the product codes. After that, just reset the index and insert Market into the result.

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9
Sign up to request clarification or add additional context in comments.

Comments

3

Not exactly what OP asked for but it seemed to me that this might be a better way to manage the data.

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

And

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

If we insist on the requested output

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

Or

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

Comments

2

Here is one way:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

Explanation

  • df1[cols].subtract(df2[cols].add(df3[cols])) performs calculation on specified columns only.
  • assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]}) adds extra columns required for results dataframe.
  • sort_index(axis=1) reorders columns for desired output.

1 Comment

Works perfectly. Thanks
2

Maybe we can use select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

1 Comment

This definitely works but you should segment your answer across multiple lines :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.