Spark Jobs
I need a filter similar to the attached one (Coby's Brain on IG). Should be an easy gig for someone who is already comfortable with Meta Spark. Assets are already available
Need anyone who has hands-on Experience in spark and python. Minimum of 2 years experience. Should be able to code in pyspark.
I built an ETL pipeline to process terabytes of data. To achieve that goal, I setup a Spark Cluster (Scala) and MinIO server for object data storage. I can process and save 200 gigabytes in roughly 30 minutes using 10 virtual machines, for Spark Processing. The issue I have is that I am not able to scale that Processing. Meaning if I double the number of spark virtual machines, this does not affect processing time. I need a Data Architect who has enough expertise to help me identify the bottleneck and fix the issue. ARCHITECTURE SUMMARY. • I use virtual machines set up on-premises using VMWare ESXi 6 • Physical machines (which host VMs) are on a 1 GB network. • There is no over commitment for vCPU nor RAM • Spark VMs. 16VCPU, 64 GB RAM • MinIO (Storage). 16vCPU, ...
Need to collect streaming data from S3 using Kafka and do some analytics using Spark sql and dump into a database