{"id":1540,"date":"2018-12-16T19:12:19","date_gmt":"2018-12-16T19:12:19","guid":{"rendered":"https:\/\/www.codeastar.com\/?p=1540"},"modified":"2018-12-22T15:37:48","modified_gmt":"2018-12-22T15:37:48","slug":"word-cloud-easy-python-job-seekers","status":"publish","type":"post","link":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/","title":{"rendered":"Word Cloud for Job Seekers in Python"},"content":{"rendered":"\n

We tried Python web scraping project<\/a> using scrapy<\/a> and text mining project<\/a> using TFIDF<\/a> in past. This time, we are going to raise the bar, by combing two projects together. So we have our — text mining web scraper. Like our early post<\/a> in the CodeAStar blog, it is always good to build something useful with fewer lines of code. Then we will use our text mining web scraper to make a Word Cloud for Job Seekers.<\/p>\n\n\n\n\n\n\n

“Cloud” funding<\/h3>\n

The cloud we go to build is a word cloud that containing key information for job seekers. Thus they can make better strategies based on what they see on the cloud. We will use web scraping technique to get job information from indeed.com<\/a>. Indeed is chosen because of its popularity, clean layout and its wide range of countries support (we will use the default US site, indeed.com, in this project).<\/span><\/p>\n

So we are going to: <\/span><\/p>\n

    \n
  1.  Gather user’s job search query<\/span><\/li>\n
  2.  Scrape info from indeed.com using inputs from (1) <\/span><\/li>\n
  3.  Use TFIDF to weight the importance of words<\/span><\/li>\n
  4.  Generate a word cloud using outputs from (3)<\/li>\n<\/ol>\n

    And our output should look like: \"Word<\/p>\n

    This project is straight-forward and easy. What are we waiting? Let’s code it!<\/p>\n\n\n

    Code A Cloud<\/h3>\n\n\n\n

    Now we not only code a star, we code a cloud as well :]] . Like the way we did in the EZ Weather Flask app<\/a> project, we use the pipenv<\/a> to start our development environment. <\/p>\n\n\n\n

    $pipenv --three\n$pipenv shell<\/code><\/pre>\n\n\n\n

    Get the package list file, Pipfile, from here<\/a> and put it in your development folder. Then we can install all the required modules in just one line of command. <\/p>\n\n\n\n

    $pipenv install<\/code><\/pre>\n\n\n\n

    We name our file as “indeedminer.py” and the file name just says it all. Inside our indeedminer file, first, we import required packages and code the “gather job search query” part.<\/p>\n\n\n

    from bs4 import BeautifulSoup\nfrom urllib.request import urlopen\nfrom urllib.parse import urlencode\nfrom tqdm import tqdm\nimport nltk\nfrom nltk.corpus import stopwords\nfrom sklearn.feature_extraction.text import TfidfVectorizer\nimport matplotlib.pyplot as plt\nfrom wordcloud import WordCloud\nimport sys, re, string, datetime\n\nif (len(sys.argv) < 3): \n    print(\"\\n\\tUsage: indeedminer.py [search keywords] [location] [optional: search page count]\")\n    print('\\te.g. $pythonw indeedminer.py \"HR Manager\" \"New York\"\\n')\n    exit()\n\nsearch_page = 1\nif (len(sys.argv) > 3):\n    search_page = int(sys.argv[3]) \n\nsearch_keyword= sys.argv[1]\nlocation = sys.argv[2]\nparams = {\n        'q':search_keyword,\n        'l':location\n    }<\/pre>\n\n\n

    The above code snippet is pretty straight-forward. We accept 3 arguments, search keywords, location and page count, from command line. Search keywords can be job title, industry or company name. And page count is the number of search result pages that we use to build our word cloud. So a usage example can be: <\/p>\n\n\n\n

    $pythonw indeedminer.py \"property manager\" \"Phoenix, AZ\" 3<\/code><\/pre>\n\n\n\n

    i.e. We build a word cloud for “property manager” jobs in “Phoenix, AZ” using 3 search result pages.  <\/p>\n\n\n\n

    Please note that “pythonw” is used instead of “python” from above command. As a word cloud is a graphical component, we need to open a window terminal to display it. <\/p>\n\n\n\n

    Scraping with “Soup”<\/h3>\n\n\n\n

    From our past project, we used scrapy to build a platform and use it to scrape daily deal from eBay. Since we are building an easy scraper this time, not a scraping platform, so the Beautiful Soup<\/a> is the right tool for us.  <\/p>\n\n\n

    url_prefix = \"https:\/\/www.indeed.com\" #replace url_prefix with your favorite country from https:\/\/www.indeed.com\/worldwide\nurl = url_prefix + \"\/jobs?\"+urlencode(params)\n\ndef getJobInfoLinks(url, next_page_count, url_prefix):\n    job_links_arr = []   \n    while True: \n        if (next_page_count < 1):\n            break     \n        next_page_count -= 1        \n        html = urlopen(url)\n        soup = BeautifulSoup(html, 'lxml')        \n        job_links_arr += getJobLinksFromIndexPage(soup)\n        pagination = soup.find('div', {'class':'pagination'})  \n        next_link = \"\"\n\n        for page_link in reversed(pagination.find_all('a')):\n            next_link_idx = page_link.get_text().find(\"Next\")\n            if (next_link_idx >= 0):\n                 next_link = page_link.get('href')   \n                 break \n        if (next_link == \"\"):\n            break        \n        url = url_prefix+next_link   \n    return job_links_arr\n        \ndef getJobLinksFromIndexPage(soup): \n    jobcards = soup.find_all('div', {'class':'jobsearch-SerpJobCard row result'})    \n    job_links_arr = []    \n    for jobcard in tqdm(jobcards): \n        job_title_obj = jobcard.find('a', {'class':'turnstileLink'})\n        job_title_link = job_title_obj.get('href')\n        job_links_arr.append(job_title_link)        \n    return job_links_arr\n\ncurrent_datetime = datetime.datetime.today().strftime('%Y-%m-%d %H:%M:%S')\nprint(\"Getting job links in {} page(s)...\".format(search_page))\njob_links_arr = getJobInfoLinks(url, search_page, url_prefix)\n<\/pre>\n

    As mentioned above, we are using the default US indeed job searching website, you may change the url_prefix<\/em> value to your favorite country using address from indeed’s worldwide<\/a> page. (e.g. url_prefix = “http:\/\/www.indeed.co.uk” for UK, url_prefix = “http:\/\/www.indeed.co.in” for India, etc.)<\/p>\n\n\n

    We use the 3 arguments to get one or many result pages (depending on the “result page count” parameter) from the indeed website. The result pages are a matched job list from our our inputs. Then we use Beautiful Soup to parse the result pages. Actually, we are not scraping all the content in result pages, we only scrape job detail links on each page.<\/p>\n\n\n\n

    Now we have an array, job_links_arr<\/em>, storing all the job detail links. The next step to do is getting all the details from those links.<\/p>\n\n\n\n

    Before we go to get  job details, we need to replace punctuation with a ”  ” (blank) character and remove job ad meta information, like salary and job detailed location. Once those things have been done, we can use Beautiful Soup again to scape the job content.<\/p>\n\n\n

    punctuation = string.punctuation\njob_desc_arr=[] \nprint(\"Getting job details in {} post(s)...\".format(len(job_links_arr)))\nfor job_link in tqdm(job_links_arr): \n    job_link = url_prefix+job_link\n    job_html = urlopen(job_link)\n    job_soup = BeautifulSoup(job_html, 'lxml')\n    job_desc = job_soup.find('div', {'class':'jobsearch-JobComponent-description'})\n    job_meta = job_desc.find('div', {'class':'jobsearch-JobMetadataHeader-item'})\n    #remove job meta\n    if (job_meta is not None): \n        job_meta.decompose()\n    for li_tag in job_desc.findAll('li'):\n        li_tag.insert(0, \" \")      #add space before an object\n    job_desc = job_desc.get_text()  \n    job_desc = re.sub('https?:\\\/\\\/.*[\\r\\n]*', '', job_desc, flags=re.MULTILINE)\n    job_desc = job_desc.translate(job_desc.maketrans(punctuation, ' ' * len(punctuation)))    \n    job_desc_arr.append(job_desc) \n<\/pre>\n\n\n

    TFIDF the job content <\/h3>\n\n\n\n

    We have the job content scraped from indeed.com, it is the time we do the text mining with TFIDF. On our past Avito<\/a> project, we import Russian stop words to avoid counting meaningless words. So we do the same process again, but this time, we import English stop words plus common words found from job details like “may”, “must”, “position”, etc.. <\/p>\n\n\n

    try:\n    nltk.data.find('tokenizers\/punkt') #if nltk is not initialized, go download it\nexcept LookupError:\n    nltk.download('punkt') \n\nstop_words =  stopwords.words('english')   \nextra_stop_words = [\"experience\", \"position\", \"work\", \"please\", \"click\", \"must\", \"may\", \"required\", \"preferred\", \n                    \"type\", \"including\", \"strong\", \"ability\", \"needs\", \"apply\", \"skills\", \"requirements\", \"company\", \n                    \"knowledge\", \"job\", \"responsibilities\", location.lower()] + location.lower().split()\nstop_words += extra_stop_words\nprint(\"Generating Word Cloud...\")\ntfidf_para = {\n    \"stop_words\": stop_words,\n    \"analyzer\": 'word',   #analyzer in 'word' or 'character' \n    \"token_pattern\": r'\\w{1,}',    #match any word with 1 and unlimited length \n    \"sublinear_tf\": False,  #False for smaller data size  #Apply sublinear tf scaling, to reduce the range of tf with 1 + log(tf)\n    \"dtype\": int,   #return data type \n    \"norm\": 'l2',     #apply l2 normalization\n    \"smooth_idf\":False,   #no need to one to document frequencies to avoid zero divisions\n    \"ngram_range\" : (1, 2),   #the min and max size of tokenized terms\n    \"max_features\": 500    #the top 500 weighted features\n}\ntfidf_vect = TfidfVectorizer(**tfidf_para)\ntransformed_job_desc = tfidf_vect.fit_transform(job_desc_arr)\n<\/pre>\n\n\n

    Next, we transform the job detail through TFIDF model and pick the top 500 weighted words in transformed_job_desc<\/em>.<\/p>\n\n\n\n

    Visualize the Word Cloud<\/h3>\n\n\n\n

    Finally, we have all required data and ready to generate a word cloud. You can modify  width<\/em>, height<\/em> and figsize<\/em> variables to adjust the word cloud’s display size. <\/p>\n\n\n

    freqs_dict = dict([(word, transformed_job_desc.getcol(idx).sum()) for word, idx in tfidf_vect.vocabulary_.items()])\nw = WordCloud(width=800,height=600,mode='RGBA',background_color='white',max_words=500).fit_words(freqs_dict)\nplt.figure(figsize=(12,9))\nplt.title(\"Keywords:[{}] Location:[{}] {}\".format(search_keyword,location, current_datetime))\nplt.imshow(w)\nplt.axis(\"off\")\nplt.show() \n<\/pre>\n\n\n

    A picture is worth a thousand words, let’s run the program and see what we get. We are going to search “programmer” job in “California” using 10 result pages. <\/p>\n\n\n\n

    $pythonw indeedminer.py \"programmer\" \"California\" 10<\/code><\/pre>\n\n\n\n

    And here we go:<\/p>\n\n\n\n

    \"Programmer<\/figure>\n\n\n\n

    We can find some important wordings there like “C”, “Java”, “php”, “database”, etc. Job seekers looking for “programmer” job in “California” should pay more attention on those words.<\/p>\n\n\n\n

    The program is about 100 lines of code. It can provide you a taste combined with Beautiful Soup, TFIDF and Word Cloud. And of course, it also provides you a visualized way on the current job market trend.<\/p>\n\n\n\n

    <\/div>\n\n\n\n

    What have we learnt in this post?<\/h3>\n\n\n\n
    1. Usage of Beautiful Soup on scraping Indeed.com<\/li>
    2. Usage of TFIDF to get weighted words <\/li>
    3. Generation of word cloud to visual our findings<\/li><\/ol>\n\n\n\n

      (the complete source can be found at https:\/\/github.com\/codeastar\/webminer_indeed<\/a> or https:\/\/gitlab.com\/codeastar\/webminer_indeed<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"

      We tried Python web scraping project using scrapy and text mining project using TFIDF in past. This time, we are going to raise the bar, by combing two projects together. So we have our — text mining web scraper. Like our early post in the CodeAStar blog, it is always good to build something useful with […]<\/p>\n","protected":false},"author":1,"featured_media":1574,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"Build a Job info Word Cloud for Job Seekers in Easy Python","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[2],"tags":[43,119,123,122,8,120,89,45,42,121],"jetpack_publicize_connections":[],"yoast_head":"\nWord Cloud for Job Seekers in Python ⋆ Code A Star<\/title>\n<meta name=\"description\" content=\"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Word Cloud for Job Seekers in Python ⋆ Code A Star\" \/>\n<meta property=\"og:description\" content=\"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\" \/>\n<meta property=\"og:site_name\" content=\"Code A Star\" \/>\n<meta property=\"article:publisher\" content=\"codeastar\" \/>\n<meta property=\"article:author\" content=\"codeastar\" \/>\n<meta property=\"article:published_time\" content=\"2018-12-16T19:12:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-12-22T15:37:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1239\" \/>\n\t<meta property=\"og:image:height\" content=\"623\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Raven Hon\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codeastar\" \/>\n<meta name=\"twitter:site\" content=\"@codeastar\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Raven Hon\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\"},\"author\":{\"name\":\"Raven Hon\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"headline\":\"Word Cloud for Job Seekers in Python\",\"datePublished\":\"2018-12-16T19:12:19+00:00\",\"dateModified\":\"2018-12-22T15:37:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\"},\"wordCount\":937,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"keywords\":[\"Beautiful Soup\",\"easy\",\"indeed\",\"job search\",\"Python\",\"text mining\",\"TFIDF\",\"tutorial\",\"web scraping\",\"word cloud\"],\"articleSection\":[\"We code therefore we are\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\",\"url\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\",\"name\":\"Word Cloud for Job Seekers in Python ⋆ Code A Star\",\"isPartOf\":{\"@id\":\"https:\/\/www.codeastar.com\/#website\"},\"datePublished\":\"2018-12-16T19:12:19+00:00\",\"dateModified\":\"2018-12-22T15:37:48+00:00\",\"description\":\"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codeastar.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Word Cloud for Job Seekers in Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codeastar.com\/#website\",\"url\":\"https:\/\/www.codeastar.com\/\",\"name\":\"Code A Star\",\"description\":\"We don't wish upon a star, we code a star\",\"publisher\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codeastar.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd\",\"name\":\"Raven Hon\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1\",\"width\":70,\"height\":70,\"caption\":\"Raven Hon\"},\"logo\":{\"@id\":\"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/\"},\"description\":\"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.\",\"sameAs\":[\"https:\/\/www.codeastar.com\",\"codeastar\",\"https:\/\/twitter.com\/codeastar\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Word Cloud for Job Seekers in Python ⋆ Code A Star","description":"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/","og_locale":"en_US","og_type":"article","og_title":"Word Cloud for Job Seekers in Python ⋆ Code A Star","og_description":"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.","og_url":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/","og_site_name":"Code A Star","article_publisher":"codeastar","article_author":"codeastar","article_published_time":"2018-12-16T19:12:19+00:00","article_modified_time":"2018-12-22T15:37:48+00:00","og_image":[{"width":1239,"height":623,"url":"https:\/\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker1.png","type":"image\/png"}],"author":"Raven Hon","twitter_card":"summary_large_image","twitter_creator":"@codeastar","twitter_site":"@codeastar","twitter_misc":{"Written by":"Raven Hon","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#article","isPartOf":{"@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/"},"author":{"name":"Raven Hon","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"headline":"Word Cloud for Job Seekers in Python","datePublished":"2018-12-16T19:12:19+00:00","dateModified":"2018-12-22T15:37:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/"},"wordCount":937,"commentCount":1,"publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"keywords":["Beautiful Soup","easy","indeed","job search","Python","text mining","TFIDF","tutorial","web scraping","word cloud"],"articleSection":["We code therefore we are"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/","url":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/","name":"Word Cloud for Job Seekers in Python ⋆ Code A Star","isPartOf":{"@id":"https:\/\/www.codeastar.com\/#website"},"datePublished":"2018-12-16T19:12:19+00:00","dateModified":"2018-12-22T15:37:48+00:00","description":"For job seekers to make better decisions, let's build a word cloud from indeed.com using Beautiful Soup on web scraping and TFIDF on text mining.","breadcrumb":{"@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.codeastar.com\/word-cloud-easy-python-job-seekers\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codeastar.com\/"},{"@type":"ListItem","position":2,"name":"Word Cloud for Job Seekers in Python"}]},{"@type":"WebSite","@id":"https:\/\/www.codeastar.com\/#website","url":"https:\/\/www.codeastar.com\/","name":"Code A Star","description":"We don't wish upon a star, we code a star","publisher":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codeastar.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/832d202eb92a3d430097e88c6d0550bd","name":"Raven Hon","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/","url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/08\/logo70.png?fit=70%2C70&ssl=1","width":70,"height":70,"caption":"Raven Hon"},"logo":{"@id":"https:\/\/www.codeastar.com\/#\/schema\/person\/image\/"},"description":"Raven Hon is\u00a0a 20 years+ veteran in information technology industry who has worked on various projects from console, web, game, banking and mobile applications in different sized companies.","sameAs":["https:\/\/www.codeastar.com","codeastar","https:\/\/twitter.com\/codeastar"]}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker1.png?fit=1239%2C623&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8PcRO-oQ","jetpack-related-posts":[{"id":1593,"url":"https:\/\/www.codeastar.com\/hong-kong-python-word-cloud-job-seekers\/","url_meta":{"origin":1540,"position":0},"title":"Hong Kong Edition: Python Word Cloud for Job Seekers","author":"Raven Hon","date":"December 25, 2018","format":false,"excerpt":"Last time, we coded a Python Word Cloud Generator for Indeed.com users. This time, since Christmas is here, I would like to code a job seeking word cloud for my hometown --- Hong Kong! So no matter you are living in Hong Kong or looking for jobs in Hong Kong,\u2026","rel":"","context":"In "We code therefore we are"","block_context":{"text":"We code therefore we are","link":"https:\/\/www.codeastar.com\/category\/we-code-therefore-we-are\/"},"img":{"alt_text":"Hong Kong Edition","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker3_hk.png?fit=1200%2C603&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker3_hk.png?fit=1200%2C603&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker3_hk.png?fit=1200%2C603&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker3_hk.png?fit=1200%2C603&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/12\/cloudmaker3_hk.png?fit=1200%2C603&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":612,"url":"https:\/\/www.codeastar.com\/web-scraping-python\/","url_meta":{"origin":1540,"position":1},"title":"Tutorial: How to do web scraping in Python?","author":"Raven Hon","date":"December 30, 2017","format":false,"excerpt":"When we go for data science projects, like the Titanic Survivors and Iowa House Prices\u00a0projects, we need data sets to process our predictions. In above cases, those data sets have already been collected and prepared. We only need to download the data set files then start our projects. But when\u2026","rel":"","context":"In "We code therefore we are"","block_context":{"text":"We code therefore we are","link":"https:\/\/www.codeastar.com\/category\/we-code-therefore-we-are\/"},"img":{"alt_text":"How to do web scraping?","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/web_scraper.png?fit=1115%2C694&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/web_scraper.png?fit=1115%2C694&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/web_scraper.png?fit=1115%2C694&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/web_scraper.png?fit=1115%2C694&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2017\/12\/web_scraper.png?fit=1115%2C694&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1682,"url":"https:\/\/www.codeastar.com\/2019-top-programming-languages\/","url_meta":{"origin":1540,"position":2},"title":"2019 Top Programming Languages to code","author":"Raven Hon","date":"January 30, 2019","format":false,"excerpt":"Like last time in 2018, we do the Top Programming Languages (TPL) to code again, this time, in 2019! After reviewing the past TPL posts, we simplify the TPL criteria into 2 categories: Popularity - how popular the language isCareer Value - how does the language help you develop your\u2026","rel":"","context":"In "We code therefore we are"","block_context":{"text":"We code therefore we are","link":"https:\/\/www.codeastar.com\/category\/we-code-therefore-we-are\/"},"img":{"alt_text":"2019 Languages to Code","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/01\/2019_l2c.png?fit=1200%2C473&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/01\/2019_l2c.png?fit=1200%2C473&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/01\/2019_l2c.png?fit=1200%2C473&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/01\/2019_l2c.png?fit=1200%2C473&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/01\/2019_l2c.png?fit=1200%2C473&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":670,"url":"https:\/\/www.codeastar.com\/top-programming-languages-2018\/","url_meta":{"origin":1540,"position":3},"title":"[2018] Top Programming Languages to code in 2018","author":"Raven Hon","date":"January 21, 2018","format":false,"excerpt":"We posted a post on top programming languages in 2017, six months ago. It's 2018, it's a New Year, yes it is! So we are going to do the same thing again. Last year, we found our top programming languages based on following criteria: popularity - how popular the language\u2026","rel":"","context":"In "We code therefore we are"","block_context":{"text":"We code therefore we are","link":"https:\/\/www.codeastar.com\/category\/we-code-therefore-we-are\/"},"img":{"alt_text":"Top programming languages in 2018","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/star2018.png?fit=1000%2C833&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/star2018.png?fit=1000%2C833&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/star2018.png?fit=1000%2C833&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2018\/01\/star2018.png?fit=1000%2C833&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1835,"url":"https:\/\/www.codeastar.com\/elastic-beanstalk-with-react-frontend-and-flask-backend-part-3\/","url_meta":{"origin":1540,"position":4},"title":"Elastic Beanstalk with React Frontend and Flask Backend \u2013 Part 3","author":"Raven Hon","date":"March 25, 2019","format":false,"excerpt":"We have our Flask backend at the back to handle magic (weather forecast logic) and we have our React frontend dealing with user interfaces. So, what are we missing here? Every great team needs a leader to bring teammates working together. Thus we need a service to align both backend\u2026","rel":"","context":"In "We code therefore we are"","block_context":{"text":"We code therefore we are","link":"https:\/\/www.codeastar.com\/category\/we-code-therefore-we-are\/"},"img":{"alt_text":"AWS Integration","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/03\/wq_hayley.png?fit=1000%2C500&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/03\/wq_hayley.png?fit=1000%2C500&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/03\/wq_hayley.png?fit=1000%2C500&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/03\/wq_hayley.png?fit=1000%2C500&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1895,"url":"https:\/\/www.codeastar.com\/word-embedding-in-nlp-and-python-part-1\/","url_meta":{"origin":1540,"position":5},"title":"Word Embedding in NLP and Python – Part 1","author":"Raven Hon","date":"April 30, 2019","format":false,"excerpt":"We have handled text in machine learning using TFIDF. And we can use it to build word cloud for analytic purpose. But is it the capability of a machine can do on text? Definitely not, as we just haven't let machine to \"learn\" about text yet. TFIDF is a statistics\u2026","rel":"","context":"In "Learn Machine Learning"","block_context":{"text":"Learn Machine Learning","link":"https:\/\/www.codeastar.com\/category\/machine-learning\/"},"img":{"alt_text":"Happy word embedding","src":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/04\/happy_embedding.png?fit=800%2C779&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/04\/happy_embedding.png?fit=800%2C779&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/04\/happy_embedding.png?fit=800%2C779&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/www.codeastar.com\/wp-content\/uploads\/2019\/04\/happy_embedding.png?fit=800%2C779&ssl=1&resize=700%2C400 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1540"}],"collection":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/comments?post=1540"}],"version-history":[{"count":37,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1540\/revisions"}],"predecessor-version":[{"id":1592,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/posts\/1540\/revisions\/1592"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media\/1574"}],"wp:attachment":[{"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/media?parent=1540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/categories?post=1540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codeastar.com\/wp-json\/wp\/v2\/tags?post=1540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}