Skip to main content
1 vote
0 answers
48 views

I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph(). From sources on the web, I see that ...
gaut's user avatar
  • 6,038
1 vote
1 answer
66 views

I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime. import polars as pl df = pl.from_dict({'...
dikesh's user avatar
  • 3,135
0 votes
0 answers
72 views

I have a Python file import polars as pl import requests from pathlib import Path url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....
Akira's user avatar
  • 2,820
1 vote
3 answers
168 views

I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...
Akira's user avatar
  • 2,820
2 votes
1 answer
133 views

I am looking for the nearest non exact match on the dates column: import polars as pl df = pl.from_repr(""" ┌─────┬────────────┐ │ uid ┆ dates │ │ --- ┆ --- │ │ i64 ┆ date ...
rainerpf's user avatar
-2 votes
1 answer
81 views

I have a dictionary of polars.DataFrames called data_dict. All dataframes inside the dict values are having an extra index column ''. I want to drop that column and set a new column named 'name_ID' ...
Tudi72's user avatar
  • 31
2 votes
1 answer
78 views

Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive": import polars as pl from datetime ...
the_economist's user avatar
1 vote
0 answers
76 views

I have a table that looks like this import polars as pl df = pl.DataFrame( { "col1": [1, 2, 3, 4, 5], "col2": [10, 20, 30, 40, 50], "col3": [...
Lethnis's user avatar
  • 31
Advice
0 votes
7 replies
115 views

I use the polars, urllib and tldextract packages in python to parse 2 columns of URL strings in zstd-compressed parquet files (averaging 8GB, 40 million rows). The parsed output include the scheme, ...
norcalpedaler's user avatar
12 votes
0 answers
331 views

It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
Javad Faraji's user avatar
1 vote
1 answer
106 views

I've built a dataset in Polars (python), attempting to plot it as a stacked horizontal bar chart using Polars' built-in Altair plot function, however trying to specify a custom sort order for the ...
ExactaBox's user avatar
  • 3,425
1 vote
1 answer
111 views

Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe. I came up ...
robertspierre's user avatar
2 votes
2 answers
91 views

I have a CSV of energy consumption data over time (each month for several years). I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...
Buckley's user avatar
  • 151
1 vote
3 answers
100 views

When you join two tables, STATA prints the number of rows merged and unmerged. For instance, take Example 1 at page 13 of the STATA merge doc: use https://www.stata-press.com/data/r19/autosize merge 1:...
robertspierre's user avatar
3 votes
0 answers
147 views

I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below: import polars as pl import time ...
Y. Gao's user avatar
  • 1,049

15 30 50 per page
1
2 3 4 5
188