The TF-IDF value calculation formula
provided by TEXTOM is as follows.
TF: Frequency of the word
ln : natural logarithm
D : Total number of documents
DF: Number of documents containing that word
TF-IDF (Term Frequency – Inverse Document Frequency) : A statistical number that indicates how important a word is within a particular document when there is a collection of documents, often used in conjunction with morpheme analysis.
- It is an algorithm that scores every word used in a sentence, and the frequency of words is higher within a particular document, and the TF-IDF value is higher as fewer of all documents contain that word.
This value allows you to filter out the words that appear frequently in all documents and extract the keywords of the document.
- TF (Term Frequency) : Indicates how many times the word appears in the entire document.
- IDF (Inverse Document Frequency) : This is the reciprocal of DF, meaning the total number of documents / the number of documents in which the word appears, which indicates how common a word is throughout the document set.
And if the word itself is frequently used within a set of documents, it means that the word appears frequently.