Is Data Important

Today, many systems may generate huge amounts of data such as system logs, financial transactions, customer profiles, security incidents, and so on. It is encouraged by the advancement of some technologies like IoT, mobile devices, and cloud computing. There are also fields that specifically learn to manage and process a lot of data like data science and machine learning.

A set of data can be processed to produce certain results like detecting anomalies, predicting the future, or describing the state of a system. To generate such a result, the typical phases are collecting data, data preparation, visualization, and data analysis or generating results.

In collecting data, we have to take some considerations including the location where the data will be stored, the type of stored data, and the retrieval method or how other systems can consume the data. When we want to select a location, we should consider whether the storage is available in the cloud or on-premise infrastructure, whether it will be deployed in a single instance or in a cluster, or whether it uses a document or relational database. After we decide on the location, we should think about how the data is stored, what is the form, and the data type. Data pipelines may become a topic in this step to tackle issues in scalability, data source integration, and automating the collecting process.

In data preparation, we may execute several tasks including tidying up data, removing duplication, correcting data types, and handling missing values. When we find out some missing values in a record, first we need to think about the possible cause of it, then we can choose between inputting appropriate values or completely dropping the record. The appropriate value can be the mean, median, or maximum/minimum value depending on the case.

Then, visualization is needed so that the prepared data can be easily understood by representing it in a suitable format or helping analysts gain insight or describing the condition of something. Things to be considered in preparing visualization like accessibility and readability.

The final phase is result generation which can be in various forms depending on the initial intention. We may perform simple analytical procedures or advance machine learning techniques to generate complex results such as to make predictions or object clustering. We may run an A/B test when we want to understand the impact of changes in certain aspects of a system. We may run a supervised machine-learning technique to make a prediction based on predefined labels and available features. When we are not sure what information can be retrieved from the set of data, an unsupervised machine learning technique may be performed to provide clustering of data so that we can be helped in making the conclusion.

Based on the phases explained above, there are several roles that focus on a specific phase in data processing. A data engineer focuses on creating a data pipeline and preparing data so that data can be stored and consumed by any parties in the process. A data analyst focus on creating the visualization and data preparation for describing the retrieved information. An analyst may utilize tools such as Power BI or spreadsheets. In gaining insight or making predictions, a data scientist comes in. Programming skills and knowledge of statistics are necessary in this case. When it comes to generating prediction, reasoning, or classification, a machine learning scientist is required.

Comments

Deploying a Web Server on UpCloud using Terraform Modules

In my earlier post , I shared an example of deploying UpCloud infrastructure using Terraform from scratch. In this post, I want to share how to deploy the infrastructure using available Terraform modules to speed up the set-up process, especially for common use cases like preparing a web server. For instance, our need is to deploy a website with some conditions as follows. The website can be accessed through HTTPS. If the request is HTTP, it will be redirected to HTTPS. There are 2 domains, web1.yourdomain.com and web2.yourdomain.com . But, users should be redirected to "web2" if they are visiting "web1". There are 4 main modules that we need to set up the environment. Private network. It allows the load balancer to connect with the server and pass the traffic. Server. It is used to host the website. Load balancer. It includes backend and frontend configuration. Dynamic certificate. It is requ...

What's Good About Strapi, a Headless CMS

Recently, I've been revisiting Strapi as a solution for building backend systems. I still think this headless CMS can be quite useful in certain cases, especially for faster prototyping or creating common websites like company profiles or e-commerce platforms . It might even have the potential to handle more complex systems. With the release of version 5, I'm curious to know what updates it brings. Strapi has launched a new documentation page, and it already feels like an improvement in navigation and content structure compared to the previous version. That said, there's still room for improvement, particularly when it comes to use cases and best practices for working with Strapi. In my opinion, Strapi stands out with some compelling features that could catch developers' attention. I believe three key aspects of Strapi offer notable advantages. First, the content-type builder feature lets us design the data structure of an entity or database model , including ...

Installing VSCode Server Manually on Ubuntu

I've ever gotten stuck on updating the VSCode server on my remote server because of an unstable connection between my remote server and visualstudio.com that host the updated server source codes. The download and update process failed over and over so I couldn't remotely access my remote files through VSCode. The solution is by downloading the server source codes through a host with a stable connection which in my case I downloaded from a cloud VPS server. Then I transfer the downloaded source codes as a compressed file to my remote server through SCP. Once the file had been on my remote sever, I extracted them and align the configuration. The more detailed steps are as follows. First, we should get the commit ID of our current VSCode application by clicking on the About option on the Help menu. The commit ID is a hexadecimal number like 92da9481c0904c6adfe372c12da3b7748d74bdcb . Then we can download the compressed server source codes as a single file from the host. ...

Increase of Malicious Activities and Implementation of reCaptcha

In recent time, I've seen the increase of malicious activities such as login attempts or phishing emails to some accounts I manage. Let me list some of them and the actions taken. SSH Access Attempts This happened on a server that host a Gitlab server. Because of this case, I started to limit the incoming traffic to the server using internal and cloud firewall provided by the cloud provider. I limit the exposed ports, connected network interfaces, and allowed protocols. Phishing Attempts This typically happened through email and messaging platform such as Whatsapp and Facebook Page messaging. The malicious actors tried to share a suspicious link lured as invoice, support ticket, or something else. Malicious links shared Spammy Bot The actors leverage one of public endpoint on my website to send emails. Actually, the emails won't be forwarded anywhere except to my own email so this just full my inbox. This bot is quite active, but I'm still not sure what...

Configuring Swap Memory on Ubuntu Using Ansible

If we maintain a Linux machine with a low memory capacity while we are required to run an application with high memory consumption, enabling swap memory is an option. Ansible can be utilized as a helper tool to automate the creation of swap memory. A swap file can be allocated in the available storage of the machine. The swap file then can be assigned as a swap memory. Firstly, we should prepare the inventory file. The following snippet is an example, you must provide your own configuration. [server] 192.168.1.2 [server:vars] ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa Secondly, we need to prepare the task file that contains not only the tasks but also some variables and connection information. For instance, we set /swapfile as the name of our swap file. We also set the swap memory size to 2GB and the swappiness level to 60. - hosts: server become: true vars: swap_vars: size: 2G swappiness: 60 For simplicity, we only check the...

Rangkaian Sensor Infrared dengan Photo Dioda

Keunggulan photodioda dibandingkan LDR adalah photodioda lebih tidak rentan terhadap noise karena hanya menerima sinar infrared, sedangkan LDR menerima seluruh cahaya yang ada termasuk infrared. Rangkaian yang akan kita gunakan adalah seperti gambar di bawah ini. Pada saat intensitas Infrared yang diterima Photodiode besar maka tahanan Photodiode menjadi kecil, sedangkan jika intensitas Infrared yang diterima Photodiode kecil maka tahanan yang dimiliki photodiode besar. Jika tahanan photodiode kecil maka tegangan V- akan kecil . Misal tahanan photodiode mengecil menjadi 10kOhm. Maka dengan teorema pembagi tegangan: V- = Rrx/(Rrx + R2) x Vcc V- = 10 / (10+10) x Vcc V- = (1/2) x 5 Volt V- = 2.5 Volt Sedangkan jika tahanan photodiode besar maka tegangan V- akan besar (mendekati nilai Vcc). Misal tahanan photodiode menjadi 150kOhm. Maka dengan teorema pembagi tegangan: V- = Rrx/(Rrx + R2) x Vcc V- = 150 / (1...

LUKI NOTES

Search This Blog