Skip to main content
0 votes
1 answer
50 views

// Enable all bucketing optimizations spark.conf.set("spark.sql.requireAllClusterKeysForDistribution", "false") spark.conf.set("spark.sql.sources.bucketing.enabled&...
user2417458's user avatar
2 votes
0 answers
59 views

I have set up a small Xubuntu machine with intention of making it my single-node playaround Spark cluster. The cluster seems to be set up correctly - I can access the WebUI at port 8080, it shows a ...
Paweł Sopel's user avatar
0 votes
0 answers
17 views

I have a EMR spark cluster, on which I have enabled EMR managed auto scaling as auto scaling configuration and primary - c5a.xlarge Core - c5a.xlarge Task - c5a.xlarge With these cluster ...
Koushik's user avatar
0 votes
1 answer
101 views

I have two spark scripts, first as a bronze script need to data form kafka topics each topic have ads platform data ( tiktok_insights, meta_insights, google_insights ). Structure are same, ( id, ...
Kuldeep KV's user avatar
0 votes
0 answers
70 views

First, my question is not on increasing disk space to avoid no space left error, but to understand what spark does, and hopefully how to improve my code. In short, here is the pseudo code: JavaRDD&...
Juh_'s user avatar
  • 15.8k
1 vote
2 answers
101 views

I want to use a compression in bigdata processing, but there are two compression codecs. Anyone know the difference?
Angle Tom's user avatar
  • 1,150
Advice
0 votes
4 replies
83 views

I want to connect to a Snowflake database from the Data Bricks notebook. I have an RSA key(.pem file) and I don't want to use a traditional method like username and password as it is not as secure as ...
Prafulla's user avatar
0 votes
1 answer
110 views

I'm using Databricks SQL and have SQL UDFs for GeoIP / ISP lookups. Each UDF branches on IPv4 vs IPv6 using a CASE expression like: CASE WHEN ip_address LIKE '%:%:%' THEN -- IPv6 path ... ...
YJCMS's user avatar
  • 3
1 vote
0 answers
120 views

Why do I get multiple warnings WARN delta_kernel::engine::default::json] read_json receiver end of channel dropped before sending completed when scanning (pl.scan_delta(temp_path) a delta table that ...
gaut's user avatar
  • 6,048
1 vote
1 answer
52 views

I have a class that extends SparkListener and has access to SparkContext. I'm wondering if there is any way to check in onApplicationEnd whether the Spark application stopped because of an error or ...
tnazarew's user avatar
0 votes
0 answers
39 views

I am working on a custom materialization in dbt using the dbt-spark adapter (writing to Delta tables on S3). The goal is to handle a hybrid SCD Type 1 and Type 2 strategy. The Logic I compare the ...
HoanggLB2k2's user avatar
2 votes
0 answers
64 views

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...
hage's user avatar
  • 6,213
0 votes
1 answer
74 views

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...
Nakeuh's user avatar
  • 1,933
-1 votes
2 answers
63 views

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...
Ziggy's user avatar
  • 43
1 vote
1 answer
144 views

I am very new in Spark (specifically, have just started with learning), and I have encountered a recursion error in a very simple code. Background: Spark Version 3.5.7 Java Version 11.0.29 (Eclipse ...
GINzzZ100's user avatar

15 30 50 per page
1
2 3 4 5
5509