Skip to main content

Is Data Important

Today, many systems may generate huge amounts of data such as system logs, financial transactions, customer profiles, security incidents, and so on. It is encouraged by the advancement of some technologies like IoT, mobile devices, and cloud computing. There are also fields that specifically learn to manage and process a lot of data like data science and machine learning.

A set of data can be processed to produce certain results like detecting anomalies, predicting the future, or describing the state of a system. To generate such a result, the typical phases are collecting data, data preparation, visualization, and data analysis or generating results.

In collecting data, we have to take some considerations including the location where the data will be stored, the type of stored data, and the retrieval method or how other systems can consume the data. When we want to select a location, we should consider whether the storage is available in the cloud or on-premise infrastructure, whether it will be deployed in a single instance or in a cluster, or whether it uses a document or relational database. After we decide on the location, we should think about how the data is stored, what is the form, and the data type. Data pipelines may become a topic in this step to tackle issues in scalability, data source integration, and automating the collecting process.

In data preparation, we may execute several tasks including tidying up data, removing duplication, correcting data types, and handling missing values. When we find out some missing values in a record, first we need to think about the possible cause of it, then we can choose between inputting appropriate values or completely dropping the record. The appropriate value can be the mean, median, or maximum/minimum value depending on the case.

Then, visualization is needed so that the prepared data can be easily understood by representing it in a suitable format or helping analysts gain insight or describing the condition of something. Things to be considered in preparing visualization like accessibility and readability.

The final phase is result generation which can be in various forms depending on the initial intention. We may perform simple analytical procedures or advance machine learning techniques to generate complex results such as to make predictions or object clustering. We may run an A/B test when we want to understand the impact of changes in certain aspects of a system. We may run a supervised machine-learning technique to make a prediction based on predefined labels and available features. When we are not sure what information can be retrieved from the set of data, an unsupervised machine learning technique may be performed to provide clustering of data so that we can be helped in making the conclusion.

Based on the phases explained above, there are several roles that focus on a specific phase in data processing. A data engineer focuses on creating a data pipeline and preparing data so that data can be stored and consumed by any parties in the process. A data analyst focus on creating the visualization and data preparation for describing the retrieved information. An analyst may utilize tools such as Power BI or spreadsheets. In gaining insight or making predictions, a data scientist comes in. Programming skills and knowledge of statistics are necessary in this case. When it comes to generating prediction, reasoning, or classification, a machine learning scientist is required.


Comments

Popular posts from this blog

Rangkaian Sensor Infrared dengan Photo Dioda

Keunggulan photodioda dibandingkan LDR adalah photodioda lebih tidak rentan terhadap noise karena hanya menerima sinar infrared, sedangkan LDR menerima seluruh cahaya yang ada termasuk infrared. Rangkaian yang akan kita gunakan adalah seperti gambar di bawah ini. Pada saat intensitas Infrared yang diterima Photodiode besar maka tahanan Photodiode menjadi kecil, sedangkan jika intensitas Infrared yang diterima Photodiode kecil maka tahanan yang dimiliki photodiode besar. Jika  tahanan photodiode kecil  maka tegangan  V- akan kecil . Misal tahanan photodiode mengecil menjadi 10kOhm. Maka dengan teorema pembagi tegangan: V- = Rrx/(Rrx + R2) x Vcc V- = 10 / (10+10) x Vcc V- = (1/2) x 5 Volt V- = 2.5 Volt Sedangkan jika  tahanan photodiode besar  maka tegangan  V- akan besar  (mendekati nilai Vcc). Misal tahanan photodiode menjadi 150kOhm. Maka dengan teorema pembagi tegangan: V- = Rrx/(Rrx + R2) x Vcc V- = 150 / (150+10) x Vcc V- = (150/160) x 5

Rangkaian Sensor Cahaya dengan LDR

LDR(Light Depending Resistor) adalah resistor yang nilai hambatannya bergantung dari intensitas cahaya yang ia terima. Jika intensitas cahaya rendah (gelap) maka nilai resistansinya akan menjadi sangat besar (mencapai 1MOhm atau lebih), sedangkan jika intensitas cahaya tinggi (terang) nilai resistansinya menjadi kecil (mencapai 10kOhm atau kurang). Sifat ini dapat kita pergunakan dalam rangkaian sensor cahaya. Misalkan jika kita menginginkan sensor cahaya yang akan menyalakan lampu indikasi ketika ada cahaya dan mematikan lampu indikasi ketika tidak ada cahaya. Kita dapat menggunakan rangkaian seperti gambar di bawah ini. Transistor NPN berfungsi sebagai gate. Arus dari kolektor akan mengalir menuju emitor jika arus dari base besar namun jika arus pada base kecil maka arus dari kolektor tidak akan menuju emitor. Pada rangkaian sensor cahaya dengan LDR, ketika intensitas cahaya tinggi (terang) maka arus dari VCC akan melewati LDR kemudian melewati RESISTOR dan masuk ke

Installing APCu in PHP 7

APCu is one of caching application for PHP. In this case, I use PHP 7.0 on Ubuntu 16.04. In PHP 7.0, this application is provided via PEAR. First, install PEAR. $ sudo apt-get install php-pear Install APCu. If an error occured state that there's no phpize, you need to install PHP 7.0-dev which provide phpize support. $ sudo apt-get install php7.0-dev $ sudo pecl install apcu Create APCu module configuration in PHP modules directory. $ sudo echo "extension = apcu.so" >> /etc/php/7.0/mods-available/apcu.ini Add that configuration to PHP FPM and CLI. $ sudo ln -s /etc/php/7.0/mods-available/apcu.ini /etc/php/7.0/fpm/conf.d/30-apcu.ini $ sudo ln -s /etc/php/7.0/mods-available/apcu.ini /etc/php/7.0/cli/conf.d/30-apcu.ini Restart PHP FPM.

Configuring Swap Memory on Ubuntu Using Ansible

If we maintain a Linux machine with a low memory capacity while we are required to run an application with high memory consumption, enabling swap memory is an option. Ansible can be utilized as a helper tool to automate the creation of swap memory. A swap file can be allocated in the available storage of the machine. The swap file then can be assigned as a swap memory. Firstly, we should prepare the inventory file. The following snippet is an example, you must provide your own configuration. [server] 192.168.1.2 [server:vars] ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa Secondly, we need to prepare the task file that contains not only the tasks but also some variables and connection information. For instance, we set /swapfile  as the name of our swap file. We also set the swap memory size to 2GB and the swappiness level to 60. - hosts: server become: true vars: swap_vars: size: 2G swappiness: 60 For simplicity, we only check the exi

Setting Up Next.js Project With ESLint, Typescript, and AirBnB Configuration

If we initiate a Next.js project using the  create-next-app tool, our project will be included with ESLint configuration that we can apply using yarn run lint . By default, the tool installs eslint-config-next and extends next/core-web-vitals in the ESLint configuration. The Next.js configuration has been integrated with linting rules for React and several other libraries and tools. yarn create next-app --typescript For additional configuration such as AirBnB, it is also possible. First, we need to install the peer dependencies of eslint-config-airbnb . We also add support for Typescript using eslint-config-airbnb-typescript . yarn add --dev eslint-config-airbnb eslint-plugin-import eslint-plugin-jsx-a11y eslint-plugin-react eslint-plugin-react-hooks yarn add --dev eslint-config-airbnb-typescript @typescript-eslint/eslint-plugin @typescript-eslint/parser After that, we can update the .eslintrc.json file for the new configuration. { "extends": [ "airb

Managing MongoDB Records Using NestJS and Mongoose

NestJS is a framework for developing Node.js-based applications. It provides an additional abstraction layer on top of Express or other HTTP handlers and gives developers a stable foundation to build applications with structured procedures. Meanwhile, Mongoose is a schema modeling helper based on Node.js for MongoDB. There are several main steps to be performed for allowing our program to handle MongoDB records. First, we need to add the dependencies which are @nestjs/mongoose , mongoose , and @types/mongoose . Then, we need to define the connection configuration on the application module decorator. import { MongooseModule } from '@nestjs/mongoose'; @Module({ imports: [ MongooseModule.forRoot('mongodb://localhost:27017/mydb'), ], controllers: [AppController], providers: [AppService], }) Next, we create the schema definition using helpers provided by NestJS and Mongoose. The following snippet is an example with a declaration of index setting and an o