Learning Topics
About This Page
This page is a general list of everything I might potentially be interested in, so lacks priority and structure.
See Study Leave for more specifics on overall purpose and guidelines, and for a jumping-off point to topics I'm actually devoting time to.
The topics on this page are very geared towards technology/maths rather than "softer" topics such as organisational psychology, motivation etc. This is not a lack of interest on my part on softer topics - it's more that I am already engaged on a programme of reading around such topics, which I can fit in on evenings, commutes, and so on. But that approach doesn't work on more technical topics: they require more focused and devoted effort with an unavoidable experiential flavour. That is, you can't (for instance) learn Racket by reading through a textbook on the Underground - you have to actually program in the language and work your way through a structured set of examples - and this requires time set aside in front of a computer.
Fuzz
- www.meteor.com – for when I don’t want to think about integrating a complete web stack (yet, again). Security; crud; ui; reactive etc
- http://blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/
- http://kafka.apache.org/
- the design patterns in http://www.datomic.com/
Geert's Comments
Coursea courses:
- Data Scientist course (Johns Hopkins)
- Andrew Ng Machine Learning course
- Mining Massive Datasets (also Stamford)
University of Illinois:
- Pattern discovery in Data Mining
- Text retrieval and Search Engines
Stamford:
- Statistical Learning (R based)
Also SPARQL and Graph databases.
Focus and don't try too much at once.
- Look at machine learning algos (the Andrew Ng course is really good). He also has slides which are more mathematical if you want to on the Stanford website - Google them. The Algos are in Octave. One could usefully reimplement in R/python/etc
- R + packages (depending on what you want to do: there are tons of interesting things)
- Python + pandas + packages to do the same exercises as in R
Other interesting things:
- Apache Spark (instead of Hadoop), Mahout. You can program Spark via Python (it’s written in Scala).
- Of the languages: Scala, potentially Rust. Go and Julia: interesting but quite niche.
Checked out a couple of public APIs: his impression was that Facebook API was largely closed down now & that Twitter API perhaps not worth spending a lot of time on (what is the data really good for?). Think I should check this out (however briefly) myself though.
General Areas of Interest
- R and associated tools
- Machine Learning: main approaches
- Big Data / Scalability: storage, access APIs, sets, visualization tools, container tech
- Algorithms: get some decent in-depth knowledge past
- Algo Trading
- Languages
- iPhone Development
- API Design (REST, non-REST, general design considerations)
Categories
First Of All
- DONE set up github account
- DONE Monitoring time spent: get TOGGL
- DONE Goal statements and re-evaluation points
- DONE Start Journal
- Sort out hardware
- Laptop
- Calc boxen
Hardware (Done)
Dell are a good brand to go for when doing Linux experimentation, as hardware support is good.
I paid 150GBP on eBay for a Dell Optiplex 760 with 2x3.16Ghz Intel Core 2 Duo Cpu E8500, 500Gb, 8Gb RAM, DVD-RW, WiFi Dongle, 19" Monitor, Keyboard and Mouse.
Statistical Languages
- R
- Brush up on Python
- R/Python interaction
- Check out BeakerNotebook for R/Python
- Julia
Machine learning
- Simple Bayesian classifiers (in python)
- Writeup: Work through standard text (already owned) and survey what I want to explore in more detail
Scalable Architectures
- Writeup: Storage/DFS
- Data Grids
- Confluence, Hazelcast, Memcached, EHCache, Gigaspaces
- Writeup: NoSQL history and current state
- Cassandra, MongoDB, HBase, CouchDB, Redis, ...
- http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
- Writeup: A survey of APIs
- Facebook API
- Twitter API: geographic mapping
- CartoDB [1]
- Data-oriented API Aggregators
- Writeup: A survey of large data sets
- Writeup: A survey of data set crawlers: Amazon, etc
- Writeup: Time Series
- KDB & Q Language: http://code.kx.com/wiki/Startingkdbplus/qlanguage
- Writeup: Topological Data Analysis (and other approaches?)
- Cloud Services
- Azure
- AWS
- Rackspace
- DevOps / Virtualisation / Containers
- DevOps myths: https://sethvargo.com/the-ten-myths-of-devops/
- VirtualBox + Vagrant
- Puppet + Chef
- VMWare
- Containers
- Docker
- Kubernetes
- [DONE] Writeup: Messaging Frameworks
- Broker vl. Brokerless – take a look at http://zeromq.org/ for inspiration.
- APMQ
- RabbitMQ is very popular and has some great tutorials for learning.
- [DONE] Writeup: Microservices Overview
Algorithms
- Some deep dives into Knuth
- Writeup: Sorting survey
- Writeup: Heaps
- Writeup: the effect of VM on algo selection, eg You're Doing It Wrong (Think you've mastered the art of server performance? Think again.)
- B-heap vs B-tree: accounting for VM pressure on how a heap is laid out
Linux
- DONE Main distribution families
- Kernel trends over the last 5 years
Algo Trading
- Linux kernel tuning
- Writeup: Current thinking on synchronisation techniques
- Fuller survey of classics (including spinlocks etc)
- Lock-free approches
- Transactional memory
- Writeup: Disruptor pattern and any related patterns
- Gdb and Microsoft C++ (Mingw)
Languages
I'll need to think about how to prioritise these. Most will be quick overviews rather than spending any serious time programming.
- Definitely
- Racket / Scheme
- Rust
- Brush up on Haskell
- School of Haskell: https://www.fpcomplete.com/school
- Very Possibly
- Clojure
- Scala
- Erlang
- Maybe
- F# (Visual Studio Community) - similar to OCaml?
- Write own toy Lisp interpreter
- ML (Coursera?)
- OCaml
iPhone Development
- Swift
Javascript
- ngrok: basic plan for 7 USD month
- cd /Applications/ngrok/2.0.19
- ./ngrok http -subdomain=arkestra 80
- see website at arkestra.ngrok.io subdomain
- Node.js
- Frameworks and resources: http://www.codeproject.com/Articles/596800/JavaScript-Frameworks-and-Resources
- www.meteor.com – (Fuzz) for when I don’t want to think about integrating a complete web stack (yet, again). Security; crud; ui; reactive etc
- Drawing libraries
- Examples: http://modeling-languages.com/javascript-drawing-libraries-diagrams/
- More Examples: http://www.jsgraphs.com
- UK Constituencies
- D3JS: http://d3js.org/
- Html5
- Development environments: http://www.infoq.com/articles/modern-javascript-toolbox
Other
- Open Source Licensing
- CLang/LLVM C++ refactoring
- Go
- Apache Thrift
(Web) API Design
- Writeup: what makes a good Web API?
- API Aggregation is a thing: http://www.programmableweb.com/news/api-aggregation-why-it-matters-and-eight-different-models/2013/12/13
- SDK vs REST: http://apiux.com/2013/10/18/why-no-one-wants-to-use-your-api/
- Writeup: API Specifications
- apiblueprint
- RAML
- Swagger
- Apiary
- RSpec
- Writeup: API Guidelines in general
Article Writing
- InfoQ
Resources to check out
- Coursera
- Big Data University – Online courses with free certification dedicated to the promotion of Big Data knowledge
- iTunes University – Santa Fe Institute lectures; “Networks, Technology and Innovation – Video”
- Join "Big Data" groups on LinkedIn
Reading List
- http://www.kegel.com/c10k.html#books
- https://scottlocklin.wordpress.com/2013/07/28/ruins-of-forgotten-empires-apl-languages/
- A Distributed Systems Reading List: https://dancres.github.io/Pages/
- Zeta Architecture: http://radar.oreilly.com/2015/04/zeta-architecture-hexagon-is-the-new-circle.html
- APIUX: http://apiux.com/
- InfoQ
NEXT