Skip to main content
1 vote
0 answers
49 views

I have been trying hard to test my PySpark transformation on my local Windows machine. Here is what I have done so far. I installed the latest version of Spark, downloaded hadoop.dll and winutils, ...
Rishabh's user avatar
  • 88
2 votes
0 answers
59 views

I have set up a small Xubuntu machine with intention of making it my single-node playaround Spark cluster. The cluster seems to be set up correctly - I can access the WebUI at port 8080, it shows a ...
Paweł Sopel's user avatar
1 vote
1 answer
56 views

I have an AWS Glue job that processes thousands of small JSON files from S3 (historical data load for Adobe Experience Platform). The job is taking approximately 4 hours to complete, which is ...
Jayron Soares's user avatar
0 votes
1 answer
101 views

I have two spark scripts, first as a bronze script need to data form kafka topics each topic have ads platform data ( tiktok_insights, meta_insights, google_insights ). Structure are same, ( id, ...
Kuldeep KV's user avatar
0 votes
0 answers
70 views

I’m reading various file types in Databricks using Spark — including PDF, DOCX, PPTX, XLSX, and CSV. Some inputs are ZIP archives that contain multiple files, including SPSS .sav files. My workflow is:...
BarzanHayati's user avatar
1 vote
0 answers
120 views

Why do I get multiple warnings WARN delta_kernel::engine::default::json] read_json receiver end of channel dropped before sending completed when scanning (pl.scan_delta(temp_path) a delta table that ...
gaut's user avatar
  • 6,048
-1 votes
0 answers
54 views

My team is new to Pyspark and we are specifically using Azure Databricks. We have a piece of code where we are essentially Displaying a dataframe Saving it to a table Displaying the output of the ...
chilly8063's user avatar
2 votes
1 answer
90 views

Summary: I want to be able to recreate my SQL code via Python such that I dont have to manually type out each join for situations when the combinations get too large to handle I have one table import ...
qwerty12's user avatar
-1 votes
2 answers
65 views

I need to join two RDDs as part of my programming assignment. The problem is that the first RDD is nested, while the other is flat. I tried different things, but nothing seemed to work. Is there any ...
Ahmed Sohail Aslam PhDCS 2025 's user avatar
0 votes
1 answer
76 views

I have MongoDB collections forms and submissions where forms define dynamic UI components (textfield, checkbox, radio, selectboxes, columns, tables, datagrids) and submissions contain the user data in ...
Aniruth N's user avatar
  • 106
2 votes
0 answers
64 views

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...
hage's user avatar
  • 6,213
0 votes
1 answer
74 views

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...
Nakeuh's user avatar
  • 1,933
0 votes
0 answers
35 views

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...
Sruthi Gopalakrishnan's user avatar
-1 votes
2 answers
63 views

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...
Ziggy's user avatar
  • 43
1 vote
1 answer
144 views

I am very new in Spark (specifically, have just started with learning), and I have encountered a recursion error in a very simple code. Background: Spark Version 3.5.7 Java Version 11.0.29 (Eclipse ...
GINzzZ100's user avatar

15 30 50 per page
1
2 3 4 5
2735