Pre-Parsed Lists For Everyone!

As many are aware, most lists available are not fully compatible with Pi-Hole, and need to be parsed.

There are several ways to parse these lists.

I parse lists locally in a few different ways,,,,

As a way to give back to the community, I put together a parser that uploads to Github, so that others may use the lists I've parsed.

Main Repository where you can see how the script works

https://github.com/deathbybandaid/piholeparser

Pre-Parsed lists for anybody to use

https://github.com/deathbybandaid/piholeparser/tree/master/parsed

All of those lists combined into one (because I know people will ask for it)

https://raw.githubusercontent.com/deathbybandaid/piholeparser/master/parsedall/1111ALLPARSEDLISTS1111.txt

Additional Info

  • I set this up so that lists can be added very easily, and the end result has (mildy) nice filenames.
  • Unparsed Lists are mirrored in the mirroredlists directory.
  • Lists that have to be extracted have a methodology as well.
  • This runs daily!

I am also fairly certain that I have more blocked domains than I have seen throughout the web.

My wife hasn't complained about websites being blocked, and I haven't had any major issues.

EDIT:

To clarify, adding 1111ALLPARSEDLISTS1111 does NOT give you the 3 million domains.


PSA

I missed a line in my installer, and the crojob will still work,,, however if you want it to update correctly, either reinstall or update the cronjob to

20 0 * * * sudo bash /etc/updaterunpiholeparser.sh

Look at the comments in the thread, lots of good questions and answers.

4 Likes

Thanks for making them in one list, will do some testing before i deploy them at work, i dont want 10k angry clients :slight_smile:

Thats... that's a lot of clients

I'd like to see some stats from a typical day.

What hardware do you use? the Pi?

1 Like

Hi,

nice job.

There are 2 txt files you have parsed from Easylist-GER, but I don't think that the first bigger one is parsed from the original Easylist-GER file, or am I wrong? The second file has only 59 entries?
maybe you have some time to take a look.

thx, Frank

The one is easylistgermany, the other is easylistgermany+easylist

I pulled most of the lists from filterlists.com

Use to run whit 2x raspberry pi 3, SD cards died after like 7 days, was getting over 400mil dns query a day :slight_smile:

1 Like

The newer versions of raspbian can actually run off of a flashdrive,, or a usb drive,, maybe ssd!

If in an enterprise environment, why not have the logs and lists, the ones that get abused the most, stored on a share or iSCSI target from a corporate NAS ?
eg.

mount -t nfs /NAS/pihole/var/log /var/log
mount -t nfs /NAS/pihole/etc/pihole /etc/pihole

In fstab of course.
Or use formatted iSCSI targets and mount similar.

Or setup free Xen virtualisation, a baremetal type 1 hypervisor, on two hosts so you can do failover/balancing and dont depend on SD cards anymore.
The Xenserver distro (Citrix owned) saves you setting up the entire Xen-project hypervisor environment manually, with I believe a nice GUI and so on.

1 Like

Ohw Ps. nice job @deathbybandaid :+1:

Is 404 now? did you change the url?

I made some changes to the script yesterday, and it apparently took that file down,, I am working on fixing it right now though.

1 Like

It's fixed now,,, and I'm now saving a chunk of processing time!

Hi,

deleting all double/triple... entries will reduce urls from 269448 to 48791 :wink:

I knew I forgot something! this should be fixed now,,, I'm doing a test run now

I think that you also have to add a -L option in parser.sh in the curl activity for a file that is moved on the website and generate a 3XX response code. I have had that with one of the files and with the -L option I could download / process the file.
For example sudo curl --silent -L ad (Daniel Apatin) ยท GitHub.........

See https://curl.haxx.se/docs/manpage.html#-L

I added that tweak, and I'm doing a test run

If anybody thinks that there is a better way to parse them, the file loop is the stongsuit of the script.

Just stumbled upon this and really like the look of it.

Couple of questions...

I guess I run it daily in early hours using cron?

What is the purpose of the mirror files?

This script is a work in progress, and the mirroredlists is there to have a reference of what a list was before parsings,, it also gives credit where it is deserved.

1 Like