Generate C# Corner Statistics Data Based On Available Information On Website

Introduction
In this series of articles, we will see the complete process of preparing the C# Corner Statistics Dashboard. This is the first part of the series. Below is the flow of the series - 
  1. How to Generate or prepare C# Corner Statistics data based on available information on the community website by web scrapping using cheerio js. The data is generated in JSON Format.
  2. Use this generated C# Corner Statistics data and prepare charts based on that.
In this article, we will learn how to generate C# Corner statistics data using web scraping with the help of Cheerio.js. Most of you may have these questions in mind,
  • Why should we prepare or generate C# Corner Statistics?
  • Does C# Corner have any public API for developers to use site statistics?
I had these questions in my mind and I reached out to the C# Corner Team as well but due to other priority work or different current goals, they don't have Web API or data which I or other developers like me wanted to prepare or show the charts based on the C# Corner statistics.
C# Corner is the largest tech community and millions of developers from all over the globe visit this website monthly for reading and sharing new technical news, articles, blogs, videos, forum questions etc. based on cutting-edge technologies and latest developments. Also, people can connect to each other so as to gain more knowledge and theirs. So, we can say that C# Corner is a "Social Network Tech Community for Developers".
Since there is no API available I have searched and explored the C# Corner website to grab the potential and useful statistics for getting started and then got other related information too which is enough for preparing a C# Corner Charts Dashboard.
Which data we will prepare from C# Corner site?
  • Total Number of technology categories served by the website (Approx. 152 with a diverse range of technologies till 05-Jan-2019)
  • Each technology category's contribution count in Article, Video, News, Blogs etc.
Based on this prepared JSON data, we can create lots of charts and some statistics as well.
Prerequisites
The following tools and technologies I have used to prepare the C# Corner site data.
  • Node.js (As a simple Back-end Server)
  • Cheerio JS (Core Part- As a Web Scrapping Utility Library)
  • Fetch API ( For Async HTTP Request to web Page from the server side) 
Let's come to an important part of our article.
Step 1
Request the C# Corner web page for fetching all the technology categories from the node.js app.
Step 2
Once you get a response from the request, make another Async request to fetch each technology category data and store it to JSON variables.
Step 3
Once all the Requests are completed, save the final result to a JSON file.  The above-mentioned three steps are explained in the below code snippet with a proper comment.
We have used ES6 async and await in our requirement to call recursive function for the web request.
App.js 
  1. // Import Perquisite    
  2. var fs = require('fs');  
  3. var cheerio = require('cheerio');  
  4. var fetch = require("node-fetch");  
  5.   
  6. // Attaching a new function  toShortFormat()  to any instance of Date() class  
  7. //dd-mm-yyyy Format  
  8. Date.prototype.toShortFormat = function() {  
  9.   
  10.         var day = this.getDate();  
  11.         var month = this.getMonth();  
  12.         var year = this.getFullYear();  
  13.   
  14.         return "" + day + "-" + (month + 1) + "-" + year;  
  15.     }  
  16.   // URL to request all c-sharpcorner technologies
  17. var csharpURL = "https://www.c-sharpcorner.com/technologies";  
  18. var categoryResult = [];  
  19.   
  20. //Utility Function wrapper for async request Written with ES6   
  21. const getArticleCountByCategory = async(url, article) => {  
  22.     const response = await fetch(url);  
  23.     const responseHtml = await response.text();  
  24.       
  25.     return {  
  26.         html: responseHtml,  
  27.         article: article  
  28.     };  
  29. };  
  30.   
  31.   
  32. // Step-1 Request C-SharpCorner Web Page  
  33. getArticleCountByCategory(csharpURL, {}).then((html) => {  
  34. // Response Received from Step 1  
  35.     const $ = cheerio.load(html.html);  
  36.     var article = {};  
  37.     var executedCount = 0;  
  38. // Find total Category Count  
  39.     var totalCategory = $(".LinkNormalGray").length;  
  40. // Fetch Category name and hyperlink for Step 2  
  41.     $(".LinkNormalGray").each((index, item) => {  
  42. // Recursive Call or async request in each loop  
  43.         if (true || index < 2) {  
  44.             // console.log($(item).text());  
  45.             article = {  
  46.                 category: $(item).text(),  
  47.                 url: $(item).attr("href")  
  48.             };  
  49.  // Step 2 Once Get Response from Step 1 , Start Requesting Each category data  
  50.             var categoryCount = getArticleCountByCategory($(item).attr("href"), article).then((data) => {  
  51.                 // Increament counter for executed request  
  52.                 executedCount++;  
  53.                 const $$ = cheerio.load(data.html);  
  54.                 // console.log(data.article);  
  55. // Prepare JSON Data based on resonse  
  56.                 data.article.categoryData = [];  
  57.                 var articleCount = $$('.count').each((index1, item1) => {  
  58.                     if (true || index1 < 2) {  
  59. // Push Category wise data to JSON Object  
  60.                         var categoryWithCount = $$(item1).parent().attr('title').split(" ");  
  61.                         data.article.categoryData.push({  
  62.                             "categoryType": categoryWithCount[1],  
  63.                             "count": categoryWithCount[0]  
  64.                         });  
  65.                        
  66.                     }  
  67.                 });  
  68.                 //console.log(`Category :${categoryWithCount[1]},Count:${categoryWithCount[0]}`);  
  69.                 // console.log(data.article)  
  70.                 //console.log(",")  
  71.                 categoryResult.push(data.article);  
  72.                 // Do not use index of first request as execution of request is asynchroneous so order of index is not correct.  
  73.                 // console.log(totalCategory, index);  
  74.                 console.log(totalCategory, executedCount);  
  75.                 if (totalCategory == executedCount) {  
  76. // Step -3 Once All Request completed then Save Final Result to JSON File  
  77.                     console.log("When we reach at Last category then generate file for");  
  78.                     //When we reached at Last category request then generate file json file   
  79.                     fs.writeFile('cSharpcornerStatistics--' + new Date().toShortFormat() + '.json', JSON.stringify(categoryResult), "utf8"function(err) {  
  80.                        
  81.                         if (err) {  
  82.                             console.log(err)  
  83.                         };  
  84.                         console.log('Saved!,c-sharpcorner-statistics.json');  
  85.                     });  
  86.                 }  
  87.             })  
  88.   
  89.         }  
  90.     });  
  91. });  
The full source code is also available in GitHub Repository here. Clone the above repository and run this app by running the below command as an administrator in Windows or as a sudo user in Linux.
  • npm install (To install the dependency package)
  • node app.js
The demo application to represent c-sharpcorer statistics dashboard using these is available here.
I have used Toast UI Charts Library to prepare all these different bars and donut charts.

Conclusion
In this article, we learned about preparing C# Corner statistics data by webscraping using cheerio js, explored some problems and tools to achieve the desired data.

Comments

Popular Posts