Java Archives - Digital Thinking

[How-To] Workload aware deployment of deep learning models

Java | Machine Learning | Python | Software Engineering

After weeks of training and optimizing a neural net at some point it might be ready for production. Most deep learning projects never reach this point and for the rest it’s time to think about frameworks and technology stack. In…

May 31, 2019 4 Comments Read more

[How-To] Machine Learning in Practice: Monitoring

Java | Machine Learning | Software Engineering

This is the last post in the series about machine learning in practice. This time the post will be about productionizing machine learning models. I want to share my experience from several production machine learning systems and show how it…

April 11, 2019 Read more

[How-To] Machine Learning in Practice: Consistent Preprocessing

Java | Machine Learning | Software Engineering

Preprocessing and data transformation are the most important parts of all machine learning pipelines. No matter what type of model you use, if the preprocessing pipeline is buggy, your model will deliver wrong predictions . This remains also true, if…

January 12, 2019 Read more

[How-To] Machine Learning in Practice: Data Acquisition

Java | Machine Learning | Software Engineering

In this How-To series, I want to share my experience with machine learning models in productions environments. This starts with the general differences to typical software projects and how to acquire and deal with data sets in such projects, goes…

January 4, 2019 1 Comment Read more

[Java] Push based JMX reporting to logstash/elastic with jmx-trans

Java | Performance | Software Engineering

In this post I will show, how to report jmx metrics to logstash via TCP on a push based way, without changing java code from an existing application.

September 23, 2017 Read more

[Apache Spark] Yarn and preemption and fair schedulers

Big Data | Java | Performance

When running Spark 1.6 on yarn clusters, i ran into problems, when yarn preempted spark containers and then the spark job failed. This happens only sometimes, when yarn used a fair scheduler and other queues with a higher priority submitted…

July 15, 2016 Read more

[Apache Spark] Machine Learning from Disaster: multilayer perceptrons

Big Data | Java | Machine Learning

After getting good results with the Random Forest algorithm in the last post, we will take a look at feed forward networks, which are artificial neural networks. Artificial neural networks consist of many artificial neurons, which are based on the…

June 30, 2016 1 Comment Read more

[Apache Spark] Machine Learning from Disaster: Random Forest

Big Data | Java | Machine Learning

In the previous post i showed how to use the Support Vector Machine in Spark and apply the PCA to the features. In this post i wills show how to use Decisions Trees on the titanic data and why its better…

June 19, 2016 1 Comment Read more

[Apache Spark] Performance: Partitioning

Big Data | Java | Performance

In my previous post i showed how to increase the parallelism of spark processing by increasing the number of executors on the cluster. In this post i will try to show how to distribute the data in a way, that the cluster…

June 18, 2016 Read more

[Apache Spark] Machine Learning from Disaster: SVM

Big Data | Java | Machine Learning

This ist the third part of the Kaggle´s Machine Learning by Disaster challenge where i show, how you can use Apache Spark for model based prediction (supervised learning). This post is about support vector machines. The Support Vector Machine (SVM) is…

June 15, 2016 1 Comment Read more