Spark SQL is used to process structured data. I faced the problem, when I wanted to do operations per partition (connect to a web service etc.), and add fields to the original data, when i read the data from the new Dataframe…
If you use Scala be aware of short circuit evaluation. Since C/C++/Java are not that kind of functional you cant get into trouble with function parameters, which are only evaluated once.
After some months dealing with Apache Spark 1.6, i want to write down sum tips, which i really like to have read before.
Maven can download sources and javadoc (if available) by default, you only need to change the …/home/.m2/settings.xml: Add the profile:
Add it to the active Profiles:
While using ehcache to cache RESTful calls to webservices, i dealt with the problem, that ehcache was sometimes not able to cache generated my xjc generated java classes from xsd, because in default they are not implementing the serializable Interface. To…