Skip to main content

Cara Mesin Pencari Bekerja

Ada tiga bagian dasar dari mesin pencari yaitu crawling, indexing, dan sorting.

1. Crawling
Sebelum bisa menampilkan hasil pencarian, mesin pencari harus dapat menemukan terlebih dahulu di mana lokasi halaman web tersebut. Untuk itu mesin pencari memerlukan software yang disebut Web Crawler. Web Crawler juga disebut spider atau robot. Biasanya proses penelusuran dimulai dari halaman web yang popular. Begitu web tersebut ditemukan, Crawler akan mengindeks konten dan atribut di halaman tersebut, serta melakukan identifikasi. Setiap menemukan link dalam satu halaman, Crawler akan menuju halaman yang ditunjuk link tersebut, dan kembali mengindeks dan mengidentifikasi halaman. Halaman web dan dokumen diibaratkan sebagai titik, dan link menggambarkan hubungan antar titik tersebut. Web Crawler mengunjungi dari titik ke titik melalui jaringan yang menghubungkan titik tersebut.
Identifikasi halaman yang dijelajahi dilakukan dengan mengenali bagian dan konten halaman. Kata-kata yang berupa judul halaman, subjudul, meta tags, alamat URL dan bagian lain yang diaanggap penting untuk menentukan kategori dan keyword halaman akan diindeks. Obyek yang dapat diindeks oleh Web Crawler hanya teks. Karena itu, isi obyek lain seperti gambar, frame, atau flash, tidak akan dikenali oleh mesin pencari. Penting untuk membuat website berbasis teks, bukan flash atau yang lain.

2. Indexing
Informasi yang didapat ketika crawling harus dapat disimpan agar bisa digunakan. Di sinilah proses indexing berperan. Proses ini dimulai saat Web Crawler melemparkan hasil penelusurannya ke bagian program yang berfungsi untuk mengindeks. Informasi yang diikutkan dalam indeks tidak hanya tentang halaman, seperti konten, judul, meta tag, alamat URL, karena ini akan membuat kerja mesin pencari terbatas. Mesin pencari juga menyimpan informasi tentang berapa kali halaman sudah tampil di hasil pencarian dan juga informasi terkait dengan sistem pembobotan tiap halaman, yang menentukan urutan di hasil pencarian. Kombinasi ukuran indeks yang ramping dan metode dalam mengindeks menentukan seberapa cepat mesin pencari menemukan apa yang dicari oleh pengguna.

3. Sistem Ranking
Ketika pengguna melakukan pencarian, katakanlah mencari "elektro". Maka mesin pencari akan mencari halaman di indeksnya yang mengandung kata tersebut dan menampilkannya. Kelihatannya sederhana? Mungkin, namun bayangkan mesin pencari seperti Google harus mencari kata tersebut dalam 25 milyar halaman di indeksnya, dan halaman yang ditemukan bisa saja ada ribuan atau jutaan. Jika hanya diurutkan begitu saja atau berdasar abjad, pengguna mungkin harus mencari lagi dalam waktu lama dalam jutaan hasil pencarian.
Karena itulah, sistem pe-rangking-an menjadi hal penting untung mendapatkan hasil yang relevan. Untuk itu perlu dilakukan pembobotan untuk tiap halaman. Setiap mesin pencari memiliki metode pembobotan yang berbeda. Misalnya Google, salah satu cara yang digunakan dalam pembobotan adalah pagerank. Pagerank ini ditentukan dari jumlah link yang masuk (inbound link) dan link yang keluar (outbound link) yang ada dalam halaman website. Reputasi dari halaman yang dituju oleh inbound dan outbound link juga mempengaruhi bobot. Jika reputasi halaman tujuan link adalah buruk maka halaman kita juga ikut terpengaruh reputasinya. Selain itu, relevansi dari tema halaman yang dituju link dengan halaman kita sendiri juga harus baik. Misalnya jika tema halaman kita 'elektro' maka link yang baik adalah link yang menuju ke halaman bertema 'elektro' juga. 


Referensi : 
Jurus SEO Gaet Pengunjung Situs - Adnan H.P.

Comments

Popular posts from this blog

Increase of Malicious Activities and Implementation of reCaptcha

In recent time, I've seen the increase of malicious activities such as login attempts or phishing emails to some accounts I manage. Let me list some of them and the actions taken. SSH Access Attempts This happened on a server that host a Gitlab server. Because of this case, I started to limit the incoming traffic to the server using internal and cloud firewall provided by the cloud provider. I limit the exposed ports, connected network interfaces, and allowed protocols. Phishing Attempts This typically happened through email and messaging platform such as Whatsapp and Facebook Page messaging. The malicious actors tried to share a suspicious link lured as invoice, support ticket, or something else. Malicious links shared Spammy Bot The actors leverage one of public endpoint on my website to send emails. Actually, the emails won't be forwarded anywhere except to my own email so this just full my inbox. This bot is quite active, but I'm still not sure what...

Configuring Swap Memory on Ubuntu Using Ansible

If we maintain a Linux machine with a low memory capacity while we are required to run an application with high memory consumption, enabling swap memory is an option. Ansible can be utilized as a helper tool to automate the creation of swap memory. A swap file can be allocated in the available storage of the machine. The swap file then can be assigned as a swap memory. Firstly, we should prepare the inventory file. The following snippet is an example, you must provide your own configuration. [server] 192.168.1.2 [server:vars] ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa Secondly, we need to prepare the task file that contains not only the tasks but also some variables and connection information. For instance, we set /swapfile  as the name of our swap file. We also set the swap memory size to 2GB and the swappiness level to 60. - hosts: server become: true vars: swap_vars: size: 2G swappiness: 60 For simplicity, we only check the...

Deliver SaaS According Twelve-Factor App

If you haven't heard of  the twelve-factor app , it gives us a recommendation or a methodology for developing SaaS or web apps structured into twelve items. The recommendation has some connections with microservice architecture and cloud-native environments which become more popular today. We can learn the details on its website . In this post, we will do a quick review of the twelve points. One Codebase Multiple Deployment We should maintain only one codebase for our application even though the application may be deployed into multiple environments like development, staging, and production. Having multiple codebases will lead to any kinds of complicated issues. Explicitly State Dependencies All the dependencies for running our application should be stated in the project itself. Many programming languages have a kind of file that maintains a list of the dependencies like package.json in Node.js. We should also be aware of the dependencies related to the pla...

Kenshin VS The Assassin

It is an assassin versus assassin.

Handling PDF Generation in Web Service

If we are building a website that requires a PDF generation feature, there are several options for implementing it based on the use cases or user requirements. First, we can generate the PDF on the client side using any available client library. It is suitable if the use case is to print out some data that is already available inside certain website components, and we want to maintain the styles of the components in the document. Second, we can do it fully in the back-end using any library available, such as PDF-lib, jsPDF, and so on. This approach is suitable if we want to keep the data processing or any related business functions in the back-end server. This second approach might have disadvantages, such as the difficulty of maintaining the design assets and styles which are already on our website. Third, it is using a hybrid approach, where certain processes are handled on the client side, and some are handled on the back-end. In this post, I want to discuss more about the...

Free Cloud Services from UpCloud

Although I typically deploy my development environment or experimental services on UpCloud , I do not always stay updated on its announcements. Recently, I discovered that UpCloud has introduced a new plan called the Essentials plan, which enables certain cloud services to be deployed at no cost. The complimentary services are generally associated with network components or serve as the foundation for other cloud services. This feature is particularly useful when retaining foundational services, such as a load balancer, is necessary, while tearing down all services and reconfiguring the DNS and other application settings each time we temporarily clean up infrastructure to reduce costs is undesirable.  When reviewing the service specifications of the cloud services in the Essentials plan, they appear to be very similar to those in the Development plan. The difference in service levels is unclear, but it could be related to hardware or resource allocation. For instance, the loa...