When i started to make use of Scala with Apache Spark, the first thing I thought about the language was “WTF is this?”. I just want to give you an very simple example:
val map = list.map(item => (item.getId(), item)) .filter(_._1 % 2 == 0) .sortBy(_._1).reverse .toMap
For a C/C++/Java programmer this piece of code is confusing and seems horrible, but after digging into it, you will never renounce all the syntactic sugar and the functional style. Many people think that functional programming makes your code ugly and is useless at all. Maybe this is true for some languages, but I am happy Java 8 adopted the scala-like Stream API.
Anyway, at the same time I also had to learn Python for machine learning with Pandas and scikit-learn and now I have to use it with tensorflow. Python is a very scientific driven language and easy to use, everything is about copy pasting code from new published papers and libraries, which are state of the art and released with the papers. There are some very cool things about Python. You can almost hack everything together in minutes, there is a library for everything and there are very powerful tools for visualization and learning things (Jupyter Notebook is great).
Python is a quick-and-dirty language
But to be honest, Python is a hacky language. There are so much Python projects, which suffer from bad code quality and bad design. The most oft the time python code is used for proof of concepts, just to show if ideas work, mostly in scientific context. Documentation? Forget it. For itself, this is no point against python but there are some reasons, why this happens more likely in python projects, than in C++ or Java projects.
- missing interfaces, usage of global variables.
- No standard code documentation.
- variable declaration and usage.
- missing constants.
- tuples, dicts, lists, list-like, NDarray, array trial and error accessing.
- Missing auto completion in IDEs lead to useless names for variables and methods (a, b, c).
- spaghetti is popular, no one needs OOP or even structure.
- lambda, which is much worse than it should be.
- concurrency and parallelism in python is a horrible.
- Python is really slow, even with numpy oder Cython and all the ideas to make a interpreted, dynamic typed language faster.
- Thousands of packages, no standards and the pip package hell. Just type “pip install”, what a joke.
- python3 vs. python2.
So, what happens is, if you want to use Python code, which is not implemented in to big core libraries like numpy, matplotlib, scikit-learn… you need to re-implement everything yourself most likely into a compiled and static typed language, if you want to use it for production applications.
Scala the better Python
It´s not that switching the language makes you a better programmer, but some languages tempt users to write bad code, python is one of these. In my opinion this is a situation, which could be improved with Scala. Scala has everything Python has, but it is fast (JVM), is static typed and compiled, while it can take advantage of REPL. Dynamic typed languages are misleading, because they feign something, which is not true.
Moreover, Scala can be used with Java and opens the door to every java library. You can simply switch between those two languages. If there is a ecosystem, which is often well designed, extensible and documented its the java ecosystem. Another advantage is the concurrency model and the powerful DSL support. When talking about Data Science or BigData, there are several frameworks, built up on Scala (Spark, Kafka).
Sure, the JVM has its pitfalls, sometimes also sucks and yes it would be required to put some more effort into learning the hole ecosystem, but in the end it would be worth it.