Cara Mesin Pencari Bekerja

Ada tiga bagian dasar dari mesin pencari yaitu crawling, indexing, dan sorting.

1. Crawling

Sebelum bisa menampilkan hasil pencarian, mesin pencari harus dapat menemukan terlebih dahulu di mana lokasi halaman web tersebut. Untuk itu mesin pencari memerlukan software yang disebut Web Crawler. Web Crawler juga disebut spider atau robot. Biasanya proses penelusuran dimulai dari halaman web yang popular. Begitu web tersebut ditemukan, Crawler akan mengindeks konten dan atribut di halaman tersebut, serta melakukan identifikasi. Setiap menemukan link dalam satu halaman, Crawler akan menuju halaman yang ditunjuk link tersebut, dan kembali mengindeks dan mengidentifikasi halaman. Halaman web dan dokumen diibaratkan sebagai titik, dan link menggambarkan hubungan antar titik tersebut. Web Crawler mengunjungi dari titik ke titik melalui jaringan yang menghubungkan titik tersebut.

Identifikasi halaman yang dijelajahi dilakukan dengan mengenali bagian dan konten halaman. Kata-kata yang berupa judul halaman, subjudul, meta tags, alamat URL dan bagian lain yang diaanggap penting untuk menentukan kategori dan keyword halaman akan diindeks. Obyek yang dapat diindeks oleh Web Crawler hanya teks. Karena itu, isi obyek lain seperti gambar, frame, atau flash, tidak akan dikenali oleh mesin pencari. Penting untuk membuat website berbasis teks, bukan flash atau yang lain.

2. Indexing

Informasi yang didapat ketika crawling harus dapat disimpan agar bisa digunakan. Di sinilah proses indexing berperan. Proses ini dimulai saat Web Crawler melemparkan hasil penelusurannya ke bagian program yang berfungsi untuk mengindeks. Informasi yang diikutkan dalam indeks tidak hanya tentang halaman, seperti konten, judul, meta tag, alamat URL, karena ini akan membuat kerja mesin pencari terbatas. Mesin pencari juga menyimpan informasi tentang berapa kali halaman sudah tampil di hasil pencarian dan juga informasi terkait dengan sistem pembobotan tiap halaman, yang menentukan urutan di hasil pencarian. Kombinasi ukuran indeks yang ramping dan metode dalam mengindeks menentukan seberapa cepat mesin pencari menemukan apa yang dicari oleh pengguna.

3. Sistem Ranking

Ketika pengguna melakukan pencarian, katakanlah mencari "elektro". Maka mesin pencari akan mencari halaman di indeksnya yang mengandung kata tersebut dan menampilkannya. Kelihatannya sederhana? Mungkin, namun bayangkan mesin pencari seperti Google harus mencari kata tersebut dalam 25 milyar halaman di indeksnya, dan halaman yang ditemukan bisa saja ada ribuan atau jutaan. Jika hanya diurutkan begitu saja atau berdasar abjad, pengguna mungkin harus mencari lagi dalam waktu lama dalam jutaan hasil pencarian.

Karena itulah, sistem pe-rangking-an menjadi hal penting untung mendapatkan hasil yang relevan. Untuk itu perlu dilakukan pembobotan untuk tiap halaman. Setiap mesin pencari memiliki metode pembobotan yang berbeda. Misalnya Google, salah satu cara yang digunakan dalam pembobotan adalah pagerank. Pagerank ini ditentukan dari jumlah link yang masuk (inbound link) dan link yang keluar (outbound link) yang ada dalam halaman website. Reputasi dari halaman yang dituju oleh inbound dan outbound link juga mempengaruhi bobot. Jika reputasi halaman tujuan link adalah buruk maka halaman kita juga ikut terpengaruh reputasinya. Selain itu, relevansi dari tema halaman yang dituju link dengan halaman kita sendiri juga harus baik. Misalnya jika tema halaman kita 'elektro' maka link yang baik adalah link yang menuju ke halaman bertema 'elektro' juga.

Referensi :

Jurus SEO Gaet Pengunjung Situs - Adnan H.P.

Comments

Deploying a Web Server on UpCloud using Terraform Modules

In my earlier post , I shared an example of deploying UpCloud infrastructure using Terraform from scratch. In this post, I want to share how to deploy the infrastructure using available Terraform modules to speed up the set-up process, especially for common use cases like preparing a web server. For instance, our need is to deploy a website with some conditions as follows. The website can be accessed through HTTPS. If the request is HTTP, it will be redirected to HTTPS. There are 2 domains, web1.yourdomain.com and web2.yourdomain.com . But, users should be redirected to "web2" if they are visiting "web1". There are 4 main modules that we need to set up the environment. Private network. It allows the load balancer to connect with the server and pass the traffic. Server. It is used to host the website. Load balancer. It includes backend and frontend configuration. Dynamic certificate. It is requ...

Armin and Eren VS Colossal Titan

The trick was unexpected and caught Bertolt off guard.

Installing VSCode Server Manually on Ubuntu

I've ever gotten stuck on updating the VSCode server on my remote server because of an unstable connection between my remote server and visualstudio.com that host the updated server source codes. The download and update process failed over and over so I couldn't remotely access my remote files through VSCode. The solution is by downloading the server source codes through a host with a stable connection which in my case I downloaded from a cloud VPS server. Then I transfer the downloaded source codes as a compressed file to my remote server through SCP. Once the file had been on my remote sever, I extracted them and align the configuration. The more detailed steps are as follows. First, we should get the commit ID of our current VSCode application by clicking on the About option on the Help menu. The commit ID is a hexadecimal number like 92da9481c0904c6adfe372c12da3b7748d74bdcb . Then we can download the compressed server source codes as a single file from the host. ...

How To Verify Phone Number for Free Using WhatsApp

If you have a product or business that maintains user information like phone numbers, verifying the validity or ownership of the phone number could become important, as the phone number can be used as an authentication method or targeted marketing channel. The typical phone verification procedure is by generating a code or OTP in our application, sending that OTP to the user's phone, and then the user should insert the OTP in our application for verification. The OTP can be sent to the users through services like SMS or WhatsApp that require a valid phone number. For internet-based communication, WhatsApp has become the de facto standard for sending the OTP. WhatsApp requires its users to have a valid phone number during account creation, and it already has a huge number of users, approximately 3 billion in 2025. Using that common procedure, WhatsApp will charge us for each OTP sent. The cost depends on the country of the target phone number. For Indonesia...

What's Good About Strapi, a Headless CMS

Recently, I've been revisiting Strapi as a solution for building backend systems. I still think this headless CMS can be quite useful in certain cases, especially for faster prototyping or creating common websites like company profiles or e-commerce platforms . It might even have the potential to handle more complex systems. With the release of version 5, I'm curious to know what updates it brings. Strapi has launched a new documentation page, and it already feels like an improvement in navigation and content structure compared to the previous version. That said, there's still room for improvement, particularly when it comes to use cases and best practices for working with Strapi. In my opinion, Strapi stands out with some compelling features that could catch developers' attention. I believe three key aspects of Strapi offer notable advantages. First, the content-type builder feature lets us design the data structure of an entity or database model , including ...

Increase of Malicious Activities and Implementation of reCaptcha

In recent time, I've seen the increase of malicious activities such as login attempts or phishing emails to some accounts I manage. Let me list some of them and the actions taken. SSH Access Attempts This happened on a server that host a Gitlab server. Because of this case, I started to limit the incoming traffic to the server using internal and cloud firewall provided by the cloud provider. I limit the exposed ports, connected network interfaces, and allowed protocols. Phishing Attempts This typically happened through email and messaging platform such as Whatsapp and Facebook Page messaging. The malicious actors tried to share a suspicious link lured as invoice, support ticket, or something else. Malicious links shared Spammy Bot The actors leverage one of public endpoint on my website to send emails. Actually, the emails won't be forwarded anywhere except to my own email so this just full my inbox. This bot is quite active, but I'm still not sure what...

LUKI NOTES

Search This Blog