Ryan Harrison My blog, portfolio and technology related ramblings

A Better Alternative to Google Authenticator

2-Factor authentication is, for very good reasons, becoming increasingly popular as a way to further protect yourself online. The sole use of passwords has long been inadequate for secure authentication and so has been augmented by additional systems. A lot of online services provide SMS messages a a main method for 2-factor authentication, whereby a code will be sent to your phone. This solves part of the problem, but is still susceptible to the inherent insecurity of SMS as a whole, let alone SIM cloning and number spoofing issues.

As a better alternative, many providers have been offering the use of TOTP (Time-based One Time Passwords) to generate such codes. The protocol behind this is open, however the most popular implementation is by far the Google Authenticator app, which allows you to scan QR codes to add accounts and will constantly generate one-time-use codes as needed. Its popularity has also meant that most online services directly link to the app and include it in their usage instructions for 2FA auth.

Google Authenticator app

The Problem

The Google Authenticator app is all well and good, works well and is very easy to use. It does however open up another problem - what do you do when you lose your phone? It’s pretty plausible that for a significant number of users, their phone will either be lost, broken or stolen whilst they are using it to generate 2FA codes. What can you do when you can no longer login to many of your accounts because you aren’t able to generate the TOTP?

Many websites will also give you another security code when you enable 2-factor authentication, that you can use in this exact case. But isn’t that kind of defeating the whole point? Where are people going to store this code? You’re pretty screwed if you lose this recovery code, so you might end up writing it down somewhere insecure or store it online somewhere equally insecure. In my opinion, this is solving a problem by creating a new one.

And that’s only taking into account those sites which do offer you a recovery code. For the no doubt significant number which do not, you are locked out of your account if you lose your phone. It’s going to be in a case by case basis that some providers may let you back in if you contact them, but I’m not sure how they are going to know it’s you. For any site that stores sensitive data, I don’t see this as an option.

A Solution - Authy

Maybe a lot of users will be put off enabling 2FA for this reason, or more likely a lot of people have never really thought about the potential consequences. Either way, just like your main data, you need to also have a solid backup solution for your 2FA codes.

I mentioned before that the TOTP protocol is not proprietary - so can be implemented by anyone. A think many think that this technology is something Google have magicked up, but in reality there are a number of alternate apps out there.

One such app is called Authy, which aims to solve the problem mentioned above. In the basic sense, it is very similar to Google Authenticator, whereby you scan the same QR codes and it generates TOTP codes for you. The difference however, is that it provides a method of automatic backup of your accounts. In a similar manner to conventional password managers, such as LastPass which you should definitely be using, Authy will encrypt and upload your account strings up to their servers when you add them to the app. This is tied to a password you specify, which they don’t ever know - so if you trust password managers then this should be no different.

Your account itself is tied to your phone number, so when you lose you physical device, you can recover all your accounts as long as you move over you number. There are also features which allow sharing of your accounts to your other devices in a similar manner.

Authy app

Yes, I know you can just screenshot the QR codes which are generated, or add them to your other devices at the same time, but this is putting all the pressure of the backup on the user. Where are you meant to store the QR codes (how do you backup the backup?), will you encrypt them, how are you going to keep them in sync etc? Again, in this case you are solving a problem by generating another problem - for yourself.

It’s not perfect

The app isn’t perfect. For such a simple set of use cases, I have no idea why the app misses on some key features to make it more user friendly (and more approachable over the Google offering).

  • You can tie your accounts to a predefined set of providers that the Authy developers maintain (e.g Facebook, Google, Amazon etc). By doing so you can get a nice looking logo and some customised colours for your troubles. This does make the app look a lot nicer, but you rely on the site being in the set that the developers give you. Why the hell can I not provide my own logo? Why the hell can other users not upload their own customisations? Why the hell isn’t the existing set bigger? I mean seriously, the look and feel of the app is one of the main selling points given by the devs themselves, this should be so easy to add and contributes to one of your main features. The Google Authenticator app does look bland in comparison - but only when I don’t have to use the crappy ‘other account’ template.
  • You can rename your accounts to what you like, but this name doesn’t seem to be used when you choose the grid view. Why? Do you think I changed the name just for fun? If I changed it then it’s because I want to see it. The changed names are even used in the list view!
  • The QR scanner isn’t great. I mean, it’s definitely functional for sure, but it’s nowhere near as good as the one used in the Google Authenticator app. You have to really line up the code in the camera and get it into focus for it to work. In the Google app I can just point it somewhere close and it picks it up immediately.

For sure I am knitpicking with these annoyances, but if you want to draw people away from an app provided by Google, then you’re going to have to get it completely right. Hopefully the devs can get on top of this, because for me the main selling point - automated backups - works very well. For most users I would still definitely recommend the Authy app (or others which offer similar features) over the Google Authenticator app.

Read More

New PC Build

Late last year I finally got around to buying and building my new computer after many months of research, waiting for releases and price monitoring on PCPartPicker. It was definitely massively overdue as I was running an AMD Phenom II X4 955 (3.2ghz) on an AM3 board for the preceeding 7 years! It’s age was definitely beginning to show, whereby new AAA games would be massively CPU bottlenecked and Battlefield 1 would hardly run at all due to missing some modern instructions. Not to mention how a Youtube video running in the background would intermittently freeze when doing basic work in IntelliJ.

My overall plan and thought process was to go full out on the new parts - which should hopefully last for the next years years easily. I could also make use of one of the best things about building your own desktop computers - reusing old parts to save money. Follows is an explanation of each of the parts I chose and how they have been performing ~1 month after the build:

CPU - Intel Core i7 8700k @ 4.8ghz

Intel Core i7 8700k

Starting with perhaps the main part which ultimately determines the platform you will need - this one changed significantly over the course of last year. With the very successful launch of the Ryzen series of processors by AMD last year, I was initially planning on getting a Ryzen 1700 (8 cores/16 threads) as for raw price/performance you couldn’t (and still can’t really) do much better. The launch didn’t go without a few hiccups, mainly around memory compatibility on a new platform, but it seems to have gotten a lot better. With a very significant ~50%+ improvement to IPC, AMD are finally able to compete with Intel again in the CPU space. The problem however is the clock speeds are still very limited compared to their Intel counterparts. They provide a staggering number of cores for the price point, but the vast majority of users are never going to utilise them all unless you are a hardcore video editor etc. For me personally, having 16 threads isn’t all that important when compared to clock speed - which undeniably still causes the biggest performance difference in todays (mainly single threaded) applications. The low powered 14nm process used in the Ryzen processors can barely break 4ghz even with an overclock, although hopefully this will improve with the next generation on the 12nm high powered process.

Regardless of the low clocks, I was still planning on a Ryzen build until Intel obviously felt threatened by AMD and significantly brought forward the release of their 8th generation Coffee Lake CPU’s. The rumours were that they were adding cores whilst maintaining their high clock speeds so I decided to wait until they were (paper) released and see how they performed in the reviews. And man am I glad that I did, because the high end parts in particular destroy Ryzen in most workloads. I was always focused only on the 8700k which has 12 threads (more than enough for me) and at stock clocks at 3.7ghz but turbos all the way up to 4.7ghz on one core. According to Reddit, most Ryzen 1700/1700x can overclock to around 3.8ghz max. That ~1ghz+ delta in clock speeds is very substantial and means the 8700k can still keep up in multithreaded workloads even when it lacks 2 full cores.

The release of Coffee Lake wasn’t without it’s problems either though. Global stock of the 8700k in particular was extremely short, probably because Intel hadn’t had enough time to manufacture them after brining forward the release date. As such, the prices were massively inflated initially due to poor supply. After waiting out the initial rush I did manage to get my unit for a reasonable price - even if I did have to order it from the Czech Republic. I did pay more for it, even compared to current pricing, but I still think it was worth it and I was pretty desperate to upgrade last year!

The only thing I can really say about the 8700k is that it’s a complete beast. It doesn’t take a lot to be a significant improvement over my last system, but the 8700k chews through any workload that I can throw at it without hardly breaking a sweat. Games are completely GPU bottlenecked again (as they should be) and overall performance is excellent. I’ve currently dialled in a 4.8ghz overclock on my CPU across all cores which is pretty mad really. I think I also got a golden chip as well because a 5ghz overclock was also possible at reasonable voltages/temperatures. I’m still playing around with the overclock though so more to come on that front. If I can get a good 5ghz overclock though, that’s a mad amount of performance on a 6 core chip. We will see what happens around the whole Meltdown and Spectre thing, which looks like it might impact performance by a couple percent, but overall I definitely recommend the 8700k. Hopefully AMD can once again catch up though with Pinnacle Ridge and then Zen 2 which should promise much higher clocks. For the meantime though, the performance crown still belongs to Intel.

CPU Cooler - Noctua NH-D15S

Noctua NH-D15S

The 8700k runs hot and that isn’t an overstatement. There has been a fair amount of controversy online about the bad TIM (Thermal Interface Material) that Intel uses to combine the CPU die with the heatspreader and there are also mentions of air gaps between the two causing issues. Why they choose not to solder like AMD have with Ryzen I don’t know (although I’m sure there are reasons above just cost saving), but the result is that the 8700k runs hot and requires top end cooling to keep under control - especially if you also want to overclock and it’s a K-series unlocked chip so you should want to (P.S the i7 8700 is pretty great price/performance if you don’t want to overclock).

Most people with the 8700k are using all in one liquid CPU coolers or even custom loops, but using water in a computer still seems strange to me and I don’t particularly like the idea of the pump suddenly dying and the increased maintenance required. Luckily however, there are now air coolers available which, although might look worse, offer similar performance to water cooling whilst being cheaper and very quiet.

It didn’t take much research to find that Noctua is the clear winner in this department. Their coolers are very well manufactured, perform brilliantly and also use their own fans which are already some of the best in the market. Put all that together and you get something that can easily tame even the 8700k. I eventually chose the NH-D15S dual tower cooler over the very similar NH-D15, which although only includes one fan, still performs very similarly and is slightly less large.

The Noctua coolers aren’t cheap by any means, but I think it’s definitely worth the price. The packaging is great and their unique mounting system is probably the best out of any manufacturer. At idle, my 8700k barely breaks 30C (it downclocks to 800mhz) and at full load doesn’t go much above 80C even when running Prime95 (which is the worst case scenario that won’t be met in every day use). The best thing however, is just how quiet it is. At idle I can’t really hear it at all and even at full load it remains surprisingly quiet considering how much heat it manages to dissipate. Again, I would definitely recommend these Noctua coolers, just make sure you have enough room in your case to accommodate them.

Motherboard - Gigabyte Z370 Gaming 5

Gigabyte Z370 Gaming 5

The Z370 chipset is the flavour of choice for Coffee Lake at the moment pending the lower end chipset releases early this year (although for an 8700k you really want a pretty high end Z370). I ended up with the Gaming 5 as it had some good reviews and has a well rounded feature set for the price point. I also got £20 worth of Steam vouchers through a promotion offered by Gigabyte and am about to get another £20 through another promotion to leave a review. In real terms that makes this board excellent value for money.

It’s a very solid board and I have no complaints so far after a month of good use. The VRM’s are some of the best at this price range and easily support my 8700k running at 4.8ghz (and event at 5ghz) with good temperatures - something which cannot be said of some other cheaper Z370 motherboards. No issues setting up with an NVME drive either (which can also be placed above the graphics card not only below for better thermals).

Overall connectivity is definitely a strong point as some competitors seem to be lacking in USB ports on the back panel. The inclusion of a USB type C port on the back plus a header is also a nice to have to future-proof yourself. AC WiFi is also definitely a good selling point (and is interestingly missing from the Gaming 7 model) and works as expected for those of us who are unable to have wired connections.

The BIOS is nothing outstanding, but has all of the settings you could pretty much ever need. The XMP profile on my RAM kit was easy to enable and runs at 3200mhz without issue, overclocking is also straightforward with multiple guides available if you need pointers. It’s good to see Multi-core enhancement (MCE which auto overclocks k-series processors to max turbo across all cores) turned off by default as it should be - something which cannot be said of the Asus boards. Fan control is very easy and the board has plenty of hybrid fan ports - which is great to see for complex watercooling setups.

Build quality is good and the reinforced PCI-e slots are nice for heavy graphics cards. My favourite feature has to be the ALC1220 audio though which (coming admittedly from poor onboard audio) sounds fantastic in comparison.

I’m not that into the whole RGB lighting game, but this board definitely suits those who are, as there are plenty of lights scattered all over. There are also options for adding additional lighting strips if that’s your thing. Everything can be configured both in the BIOS and in the extra software (including turning it all off if needed) but my case doesn’t have a window so I don’t see it anyway.

Overall at this price point this board is a very solid all rounder and I would recommend to any prospective Coffee Lake buyers. The more expensive Gaming 7 is also an option which includes very beefy VRM’s and better onboard audio. For me though these features weren’t worth the extra money and the loss of WiFi/Bluetooth.

Memory - Corsair Vengeance LPX 16gb @ 3200mhz

Corsair Vengeance LPX 16gb

RAM prices at the moment are crazy. Monitoring the pricing of this kit via PCPartPicker showed multiple price hikes over the course of last year which now put this kit at over £200. I got it for a bit cheaper than that, but it definitely hurt a bit. Hopefully the situation improves as the new NAND factories open this year (and maybe some investigation into possible price fixing).

The kit itself is pretty standard and nothing really to write home about. It has a plain black look and wide support across many motherboards. I’m not interested in fancy RGB memory or large heatspreaders, so it fits my build well.

16gb is the sweetspot at the moment with 32gb being incredibly expensive and unnecessary for most workloads. Meanwhile, 8gb is starting to become too little in some modern games and applications. I don’t expect to need additional memory in the near future. The speed however is something I was willing to pay more for. The difference between stock DDR4 2133mhz and 3200mhz can be quite substantial - even more so in Ryzen due to Infinity Fabric, but also makes a difference in Intel systems. I think 3200mhz is currently the max I would recommend whilst staying reasonably priced and easy to apply as an XMP profile in your motherboard. I had no issues enabling it and run at the rated speed in my system. Moving forward I would definitely stay above 3000mhz for any new builds, and ideally settle at 3200mhz+ to get some future-proofing.

Graphics Card - MSI GeForce GTX 960 2G

I’m currently reusing the GPU from my old machine and yes, I know this card is massively underpowered considering I am pairing it with an overclocked 8700k. It definitely starts to struggle a bit in some modern games, but I only play at 1080p 60hz anyway so it does the job for the time being. Nevertheless, I can still maintain reasonable frame rates in most games at high settings. The fact that the fans only start spinning when load is applied also means that the build is virtually silent at idle.

I was planning on upgrading the GPU at the same time (to a 1070, maybe a 1080), but I didn’t see the point as these cards have already been out for multiple years now and at the rate the industry is moving, will be obsolete when the next generation gets released. On that note, I expect Nvidia will be releasing their new Volta (or Ampere?) cards at some point this year, so I will likely upgrade to one of them. Hopefully crypto mining doesn’t inflate the pricing too much. We have already been teased about Volta with the new Titan V, so we should expect a decent performance bump with the new models.

Storage - Samsung 960 EVO 250gb NVME Drive + Crucial 256gb SATA SSD & 3TB Western Digital HDD

Samsung 960 EVO 250gb

I really wanted to get a good M.2 NVME drive for this new build and I settled on the popular Samsung EVO lineup. They are very expensive so I only got the 250gb model, but this drive just holds the OS and applications so it’s more than enough. This drive is blisteringly fast. It took barely 1.5 minutes to install Windows and to be honest pretty much everything loads extremely quickly. Boot times are also pretty crazy even compared to using a SATA based SSD. It’s almost definitely overkill for me, but I love it nonetheless and would recommend if you are an speed enthusiast and have the budget.

In addition to the NVME drive I also took my old SATA SSD and spinning hard drive from my old system. The SSD holds games and the HDD is the main data storage drive. This configuration works very well I think. It isn’t unreasonably expensive and still gives you a good overall amount of storage and great speeds. The NAND shortage hasn’t seemed to effect the 960 EVO drives too much either which is good.

Case - Fractal Design Define S

Fractal Design Define S

For some, choosing a case can be one of the most tricky parts. Personally however, I settled on the Define S very early on. In terms of looks it’s a very no thrills case (even more so because I chose the windowless model), but the build quality is great, it’s easy to build in and best of all it’s very cheap for what you get.

The packaging the case came in was good, with very little chance of damage during transit and the included manual is nicely detailed to make building within the case very simple. It’s clear that each section has been thought out well and it definitely shows in the generally excellent reviews it gets. There are a number of similar models by Fractal Design as well including the Define C and variants which include windows.

A couple of things to note include the fact that the case does not include space for any 5.25” drives. Initially I thought this to be a downside, but really it doesn’t matter a lot to me as, to be honest, I can’t remember the last time I used the CD/DVD drive in my old machine. The bonuses of removing the cage mean that the interior of the case is extremely roomly with plenty of space for large watercooling setups and good ventilation for air coolers. There are plenty of good cable management holes which make a tidy system relatively straightforward. The case comes with two 140mm case fans which are extremely quiet and perform well.

Overall, I think this case definitely lives up to the reviews. It would be nice to have an enclosure around the PSU to hide some of the cables (more of an issue for those with the window), but considering I got it for less than £60, I think it’s great.

Power Supply - EVGA SuperNOVA G2 650W 80+ Gold

EVGA SuperNOVA G2 650W

Nothing too fancy in the power supply department. The EVGA SuperNOVA G2 however is an 80+ gold rated unit with great reviews (especially from JonnyGuru) and should be solid for many years to come. Interestingly, there is an updated G3 variant, but it doesn’t seem to be particularly popular here in the UK, with many retailers preferring to stock the G2 model. The unit also has a mode whereby the fan will only turn on when needed (similar to most modern GPU’s) which makes the system quieter. There are a number of other models for those who need more wattage. I did consider the 750w model, but the 650w is cheaper and should be more than enough for me with a single GPU. Being gold rated also means it should stay efficient even when drawing near the top end of the rated wattage. Remember - never cheap out on your power supply.

Conclusion

Overall, I’m very happy with the build. All in, the parts above (excluding those parts which I am reusing) came to ~£1100 including delivery costs which, when considering the performance it gives, is good value for money. You can easily spend considerably more on a prebuilt with lower quality components and less overall performance.

It still seems like building your own system would be a hard thing to do, but these days the process is rather simple. The hardest thing is selecting your components, but there are so many guides and sources of help online for this that you should end up with compatible components if you have any sense. The actual process of building has become significantly simpler of the years and these days literally just consists of plugging everything together. With the number of Youtube build guides to follow, again building your system should be open to everyone.

As is (excluding a GPU upgrade) this systems should remain performant for many years to come. Hopefully this time however I will get around to upgrading it before the 7 year mark.

PCPartPicker for this build

Read More

How to use Google DNS Servers

If you are frequently running into the Resolving Host status message in Chrome and/or are generally having slow page loads, it could be because your DNS lookups are taking longer than they should. Unsurprisingly, the DNS servers provided by your ISP can be pretty bad, but you are free to use other open alternatives (the two most common being Google and OpenDNS) which could give you faster responses.

Follow these steps to use the open Google DNS servers within Windows 10 (there are plenty of alternative guides online for other OS’s):

Start -> Settings -> Network & Internet

Click on Change adapter options

Select which network adapter you are using (WiFi/Ethernet depending on your setup). Right click and choose Properties.

In the list of configuration options select Internet Protocol Version 4 (TCP/IPv4). Then click Properties.

Adapter Properties

In the bottom section, select Use the following DNS server addresses. Fill the boxes with the following depending on which provider you wish to use:

Google DNS

Preferred: 8.8.8.8
Alternate: 8.8.4.4

OpenDNS

Preferred: 208.67.222.222
Alternate: 208.67.220.220

For Google DNS, it should look like the following:

Configure DNS Servers

Hit OK and you should be good to go. There are also equivalent IP addresses for IPv6 if you need them. Hopefully your DNS lookups will not be a little more performant. You might have the potential downside of Google knowing even more about your browsing habits, but if you use Chrome then they probably know all that already - so you might as well enjoy a faster experience!

Read More

Python - RESTful server with Flask

The Flask library for Python is a great microframework for setting up simple web servers. Larger sites or REST interfaces might want to tend towards the Django framework instead, but I’ve found Flask excellent for putting together small sites or a couple endpoints with next to no effort. The API is very Pythonic so of course you can get up and running with very few lines of code. I currently use Flask for the backend API services for this site - which powers the search page, contact page and automated Jekyll builds using Github hooks.

Installation

To install and start using Flask, just use pip:

$ pip install Flask

Simple Example

The most basic endpoint looks like:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

We imported the main Flask class and created a new instance passing in the name of the current module as an identifier (so Flask knows where to look for static files and templates). A simple function, which in this case just returns a String, can be decorated with route to define the URL which will trigger the function.

Running on a development server

There are a couple ways to run the above example. The first is the way recommended by the Flask team:

$ export FLASK_APP=hello.py
$ flask run
* Running on http://127.0.0.1:5000/

This is fine on Linux boxes (you can also use set instead of export on Windows), but setting an environment variable on Windows is a bit of a pain, so instead you can start the server via code. Apparently this might cause issues with live reload, but Flask starts up so quickly it’s not too much of an issue:

if __name__ == '__main__':
    app.run(host='0.0.0.0')

You can then navigate to http://localhost:5000 and you will see the return value of the hello_world function. You can easily return HTML or a JSON objects as needed depending on what services you wish to build.

Handling GET Requests

I have focused mainly on using Flask to create basic RESTful web endpoints instead of serving HTML - which Flask can do very well using the Jinja2 templating engine. The below snippet shows how to create a simple endpoint to handle GET requests to retrieve a user by their unique id. The returned object from our dummy service is converted into a JSON response via the built in jsonify function:

from Flask import jsonify

# here the user_id parameter is restricted to an int type
@app.route('/user/<int:user_id>', methods=['GET'])
def get_user(user_id):
    # get the user from some service etc
    user = user_service.find_user(user_id)
    return jsonify(user) # return the user as a JSON object

Handling POST Requests

The below snippet shows how we can handle a POST request, taking in a JSON object and returning a response from our service:

from flask import request

@app.route('/user/' methods=['POST'])
def save_user():
    # retrieve the json from the request
    new_user = request.get_json(silent=True)
    created_user = user_service.create_user(new_user)
    # return the newly created user as a json object
    return jsonify(created_user)

As you can see, setting up simple endpoints is very quick and easy using Flask. The framework also offers a ton of other useful features including:

  • built-in development server and debugger
  • integrated unit testing support
  • RESTful request dispatching
  • Jinja2 templating
  • support for secure cookies (client side sessions)
  • great documentation

Flask website

Quickstart guide

Read More

ElasticSearch for your Jekyll Blog

Search functionality is very helpful to have in pretty much any website, but something that’s not particularly easy to do in a static Jekyll site. Fully fledged blog solutions such as Wordpress give you a partial solution (no full text search) for free, however you have to also deal with all the associated bloat and need for a database running in the background. On statically generated sites, you have to role your own. Most of the solutions on the internet seem to lean towards doing full text search completely on the client side using a library such as LunrJs. This will work well, but you end up having to ship your whole site to the client as JSON blob before you perform the search. For smaller sites this might be fine, but otherwise that file can get quite large when you have to include all content across your entire site - no thanks.

My, perhaps heavy handed, solution (which won’t work for GitHub Pages) is to use a small ElasticSearch instance on the server side to provide great full text search across your site. It takes a little more work to set up, but once you have it all automated you can just leave it and still take advantage of all the capabilities of ElasticSearch.

I put together elastic-jekyll which is a small Python library that you can use to automatically index and search across your entire Jekyll blog. I’ll cover below how it all fits together and how to use it.

Parsing your Posts

The first step in the process is to find all of your posts within your site and create an in-memory representation of them with all the attributes we require. In this case the library will try to go through ~/blog/_posts unless you pass in another path to main.py. Once all of the markdown files are found, each one is parsed using BeautifulSoup to extract the title and text content (find_posts.py):

def parse_post(path):
    with open(path, encoding="utf8") as f:
        contents = f.read()

        soup = BeautifulSoup(contents, 'html.parser')
        title = soup.find('h1', { "class" : "post-title" }).text.strip()
        
        post_elem = soup.find("div", {"class": "post"})
        post_elem.find(attrs={"class": "post-title"}).decompose()
        post_elem.find(attrs={"class": "post-date"}).decompose()

        paras = post_elem.find_all(text=True)

        body = " ".join(p.strip() for p in paras).replace("  ", " ").strip()
        return (title, body)

    raise "Could not read file: " + path

The output is passed into create_posts which creates a generator of Post instances. Each contains:

  • Id - A unique identifier to let ElasticSearch keep track of this document (modified version of the post filename)
  • Url - The relative url of this post so we can create links in the search results (again uses the filename and site base directory)
  • Title - The title of the post extracted from the frontmatter of the markdown file
  • Text - The text content of the post. Note that this is still in markdown format so contains all of the associated special characters. A future extension might be to do some sort of sanitization on this text

Indexing your Posts

Once we have all of the current posts properly parsed, we’re ready to dump them into ElasticSearch so it can perform its indexing magic on them and let us search through it. In Python this is very straightforward to do using the Python ElasticSearch client library.

First we establish a connection to the ElasticSearch server you should already have running on your system. It defaults to port 9200 although you can override it if you want.

from elasticsearch import Elasticsearch

def connect_elastic(host="localhost", port=9200):
    return Elasticsearch([{'host': host, 'port': port}])

For simplicity, the library will currently blow away any existing blog index that may already exist on the Elastic instance and recreate a new one from scratch. You could of course figure out delta’s from the version control history etc, but for a small set of data it’s way easier just to re-index everything each time:

# remove existing blog index and create a new blank one
def refresh_index(es):
    if es.indices.exists(index=index_name):
        es.indices.delete(index=index_name)
    es.indices.create(index=index_name)

Then we just loop through each of the posts we got from the previous step and push them into the index:

for post in posts:
    doc = {
        "title": post.title,
        "url": post.url,
        "body": post.body
    }

    es.index(index=index_name, doc_type=doc_type, id=post.id, body=doc)

At this point we now have an index sitting in ElasticSearch that is ready to receive search queries from your users and turn them into a set of search results for relevant posts.

Searching for your Posts

To actually provide users the ability to search through your index you will need to have some kind of web service open ready to receive such Ajax calls. In my case I have a lightweight Flask server running which has an endpoint for searching. It simply passes the query string into ElasticSearch and returns the response as a JSON object. It is of course up to you how you want to do this so I’ve just provided a generic way of querying your index within searcher.py:

from elasticsearch import Elasticsearch

es =  Elasticsearch([{'host': 'localhost', 'port': 9200}])

user_query = "python"

query = {
    "query": {
    "multi_match": {
        "query": user_query,
        "type": "best_fields",
        "fuzziness": "AUTO",
        "tie_breaker": 0.3,
        "fields": ["title^3", "body"]
    }
    },
    "highlight": {
        "fields" : {
            "body" : {}
        }
    },
    "_source": ["title", "url"]
}

res = es.search(index="blog", body=query)
print("Found %d Hits:" % res['hits']['total'])

for hit in res['hits']['hits']:
    print(hit["_source"])

This snippet will connect to your ElasticSearch instance running under localhost and query the blog index with a search term of python. The query object is an Elastic specific search DSL which you can read more about in their documentation. ElasticSearch is a complicated and powerful beast with a ton of options at your disposal. In this case we are doing a simple multi_match query on the title and body fields (providing more weight onto the title field). We also use fuzziness to resolve any potential spelling mistakes in the user input. ElasticSearch will return us a set of hits which consist of objects containing just the title and url fields as specified in the _source field. We have no use for the others so no point in bloating the response. One cool feature is the use of highlighting which will add <i> tags into the body field within the response. This can then be used to apply styling on the client side to show much sections of text the engine has matched on.

This search query seems to work well for my use cases and I’ve literally just copied the above into the corresponding Flask endpoint. On the client side in your Jekyll search page, I’ve just used a but of good old JQuery to perform the Ajax call and fill in a list with the search results. Keep it simple. You can find the JS I use in the search page source.

As far as automating the process, I have a script which will rebuild my Jekyll blog after a Git push has been performed into GitHub (via hooks). After the main site is rebuilt I just call python main.py and everything is kept up to date. As I said before, it takes a bit of work to set up things up, but once you have it will sync itself every time you make an update.

Full source code can be found in the GitHub repository

Read More