2,817 questions
1
vote
0
answers
48
views
How to show the streaming parts of a polars query using explain()?
I am trying to explain() a Polars query to see which operations can be executed using the streaming engine. Currently, I am only able to do this using show_graph().
From sources on the web, I see that ...
1
vote
1
answer
66
views
Polars parse multiple datetime format [duplicate]
I have string column in polars dataframe with multiple datetime formats and I am using following code to convert datatype of column from string into datetime.
import polars as pl
df = pl.from_dict({'...
0
votes
0
answers
72
views
polars.LazyFrame.sink_csv does not give CRLF line termination [duplicate]
I have a Python file
import polars as pl
import requests
from pathlib import Path
url = "https://raw.githubusercontent.com/leanhdung1994/files/main/processedStep1_enwiktionary_namespace_0_43....
1
vote
3
answers
168
views
Polars: how to write a column of strings into a txt file without escaping?
I have a .ndjson files with millions of rows. Each row has a field html which contains html strings. I would like to write all such html into a .txt file. One html is into one line of the .txt file. I ...
2
votes
1
answer
133
views
Why does a nearest join_asof() return exact matches despite allow_exact_matches=False?
I am looking for the nearest non exact match on the dates column:
import polars as pl
df = pl.from_repr("""
┌─────┬────────────┐
│ uid ┆ dates │
│ --- ┆ --- │
│ i64 ┆ date ...
-2
votes
1
answer
81
views
polars.exceptions.DuplicateError: column with name 'name_ID' has more than one occurrence [closed]
I have a dictionary of polars.DataFrames called data_dict.
All dataframes inside the dict values are having an extra index column ''.
I want to drop that column and set a new column named 'name_ID'
...
2
votes
1
answer
78
views
Change color of single line in altair line chart based on other indicator column
Imagine having the following polars dataframe "df" that contains the temperature of a machine that is either "active" or "inactive":
import polars as pl
from datetime ...
1
vote
0
answers
76
views
Is it possible to drop/select columns where col.n_unique > 1 with native polars syntax [duplicate]
I have a table that looks like this
import polars as pl
df = pl.DataFrame(
{
"col1": [1, 2, 3, 4, 5],
"col2": [10, 20, 30, 40, 50],
"col3": [...
Advice
0
votes
7
replies
115
views
High volume URL parsing in Python
I use the polars, urllib and tldextract packages in python to parse 2 columns of URL strings in zstd-compressed parquet files (averaging 8GB, 40 million rows). The parsed output include the scheme, ...
12
votes
0
answers
331
views
Not displaying DataFrame's name in Data Wrangler extension of VSCode, displaying "Data grid"
It is a while that I am using Data Wrangler extension in VS Code; it is very useful for analyzing datasets and filtering some columns to see the features. When I opened a dataframe in it, it used to ...
1
vote
1
answer
106
views
Altair stacked bar chart in custom order
I've built a dataset in Polars (python), attempting to plot it as a stacked horizontal bar chart using Polars' built-in Altair plot function, however trying to specify a custom sort order for the ...
1
vote
1
answer
111
views
Polars print changed values between 2 dataframes
Given two polars dataframes of the same shape, I would like to print the number of values different between the two, including missing values that are not missing in the other dataframe.
I came up ...
2
votes
2
answers
91
views
Seeking more efficient method in Python & Polars to perform monthly comparison within each year
I have a CSV of energy consumption data over time (each month for several years).
I want to determine the percentage (decimal portion) for each month across that year; e.g., August was 12.3% of the ...
1
vote
3
answers
100
views
Show matched rows in polars join
When you join two tables, STATA prints the number of rows merged and unmerged.
For instance, take Example 1 at page 13 of the STATA merge doc:
use https://www.stata-press.com/data/r19/autosize
merge 1:...
3
votes
0
answers
147
views
Why polars join function performance deteriorates so much from version 1.30.0 to 1.31.0?
I noticed a significant performance deterioration when using polars dataframe join function after upgrading polars from 1.30.0 to 1.31.0. The code snippet is below:
import polars as pl
import time
...