Practical Coding interview questions
Here is a list of practical Python interview questions designed to assess data processing skills, incorporating your specific requests. These questions emphasize reading, parsing, manipulating, and aggregating data from different common formats
General Instructions for the Candidate:
Focus on writing clean, readable, and reasonably efficient Python code.
Use standard Python libraries (
csv
,re
,datetime
,collections, functools
, etc.). Avoid external libraries like Pandas/csv unless explicitly permitted or requested for a specific role.Handle potential errors gracefully (e.g., malformed lines, incorrect data types).
Explain your approach and any assumptions you make.
Feel free to ask clarifying questions.
You can see questions of this pattern in following companies quite frequently:
Trade support engineer, SRE/DevOps in Hedge funds eg. WorldQuant, Hudon River Trading(HRT), Optiver, (not Two Sigma as they are hardcode leetcoders 🙈) etc
Meta PE, Google SRE
Question 1: File Metadata Analysis (Inspired by "Dinosaur Problem" Concept)
Concept: This question involves processing structured data, categorizing items, and performing aggregations, similar to how one might analyze a large dataset with various attributes (like the conceptual "dinosaur problem" often attributed to Meta interviews which involves handling large datasets/filtering/grouping).
Problem: You are given a file (
files.txt
) where each line represents a file entry with its path, size in bytes, and last modified timestamp (Unix epoch). The format is comma-separated:filepath,size,modified_timestamp
.Example
files.txt
:
/var/log/app.log,10240,1678886400
/home/user/data.csv,51200,1678886460
/etc/config.xml,1024,1678800000
/var/log/kernel.log,20480,1678890000
/home/user/report.pdf,204800,1678886520
/home/user/archive.zip,1024000,1678790000
/var/log/sys.log,15360,1678890060
Task: Write a Python script that reads this file and calculates the total size of files for each file extension type (e.g.,
.log
,.csv
,.xml
,.pdf
,.zip
). Ignore files with no extension.Output: Print the total size for each extension found. The output should be clear, like:
log: 46080 bytes
csv: 51200 bytes
xml: 1024 bytes
pdf: 204800 bytes
zip: 1024000 bytes
Skills Tested: File reading, string manipulation (extracting extension), data type conversion (string to integer), dictionary usage for aggregation, handling potential errors (lines with incorrect format, files without extensions).
Question 2 : Trade Log Aggregation
Problem: You are provided with a trade log file (
tradelog.csv
) in CSV (Comma-Separated Values) format.The columns are:
date
,process
,host
,log
,bytes
.The exchange name (e.g., 'cme', 'lse') is part of the
process
name string.
Sample
tradelog.csv
file content:Also test your code for this unordered column i.e modify the code to parse different column order eg. (date, process, host, log,bytes) => (process, date, log, bytes, host) as well
date,process,host,log,bytes
20140206,cme_trader_2,ny-host-01,0345-cme_trader_2.log.gz,1500
20140206,lse_orderrouter_1,ln-host-a,1120-lse_orderrouter_1.log.gz,800
20140206,cme_trader_2,ny-host-01,0346-cme_trader_2.log.gz,500
20140207,cme_feedhandler_1,ny-host-02,0900-cme_feedhandler_1.log.gz,2500
20140207,lse_orderrouter_1,ln-host-b,1305-lse_orderrouter_1.log.gz,1200
20140207,cme_trader_1,ny-host-03,1015-cme_trader_1.log.gz,1800
20140207,lse_feedhandler_1,ln-host-a,1400-lse_feedhandler_1.log.gz,100
Task: Write a Python script to process this log file and calculate:
The total number of bytes processed per day, order by date ascending
The total number of bytes processed per exchange, per day, order by date ascending
Output: Print the results clearly. A nested dictionary or formatted print statements are acceptable.
Example Output Structure (based on the sample data above):
Daily Totals:
20140206: 2800 bytes
20140207: 5600 bytes
Exchange Daily Totals:
20140206,cme, 2000
20140206,lse, 800
20140207,cme, 4300 bytes
20140207,lse,1300
Hint: Use `defaultdict` for creating nested dictionaries
result = defaultdict( lambda: defaultdict(int))
Skills Tested: File reading (CSV module recommended), string manipulation (parsing CSV lines, inferring and implementing exchange name extraction), date handling (grouping by date string), data type conversion (string to integer), nested dictionary usage for complex aggregation, handling potential errors (malformed lines, missing fields, non-integer bytes).
Question 3: Stock Data File Processing - Daily Range Calculation
Problem: You have a CSV file (
stock_data.csv
) containing daily stock price information. The columns are:Date
,Ticker
,Open
,High
,Low
,Close
,Volume
.Example
stock_data.csv
:
Date,Ticker,Open,High,Low,Close,Volume
2025-04-18,AAPL,170.50,172.80,170.10,172.50,55000000
2025-04-18,GOOG,140.10,141.50,139.80,141.20,25000000
2025-04-21,AAPL,172.60,173.50,171.90,173.00,48000000
2025-04-21,GOOG,141.30,142.00,140.50,140.80,22000000
2025-04-22,AAPL,173.10,175.00,172.80,174.90,61000000
2025-04-22,GOOG,140.90,141.20,139.50,139.90,28000000
Task: Write a Python script that reads this file and performs the following for a specific stock ticker (e.g.,
AAPL
):Calculate the daily price range (
High
-Low
) for each day the ticker appears.Find the date with the largest price range for that ticker.
Calculate the average trading volume for that ticker over the period present in the file.
Output: Print the date with the largest range and the calculated average volume for the specified ticker.
Example Output (for AAPL):
Ticker: AAPL
Date with largest price range: 2025-04-22 (Range: $2.20)
Average daily volume: 54666666.67
Skills Tested: File reading (CSV module), data filtering (selecting rows for a specific ticker), data type conversion (string to float/integer), arithmetic calculations, tracking maximum values, calculating averages, handling potential errors (missing data, non-numeric values).
Question 4: Nginx Log File Analysis
Problem: You are given a sample Nginx access log file (
access.log
). The format is the standard combined log format.Example
access.log
:
192.168.1.101 - - [21/Apr/2025:10:05:15 +0100] "GET /index.html HTTP/1.1" 200 512 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.168.1.102 - - [21/Apr/2025:10:05:20 +0100] "GET /styles/main.css HTTP/1.1" 200 1024 "http://example.com/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
192.168.1.101 - - [21/Apr/2025:10:05:21 +0100] "GET /images/logo.png HTTP/1.1" 200 2048 "http://example.com/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.168.1.103 - - [21/Apr/2025:10:06:05 +0100] "GET /products/item1 HTTP/1.1" 200 1800 "-" "Chrome/110.0.0.0"
192.168.1.104 - - [21/Apr/2025:10:06:30 +0100] "GET /nonexistentpage HTTP/1.1" 404 150 "-" "Firefox/109.0"
192.168.1.101 - - [21/Apr/2025:10:07:00 +0100] "POST /api/submit HTTP/1.1" 201 50 "http://example.com/form.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
192.168.1.102 - - [21/Apr/2025:10:07:15 +0100] "GET /index.html HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
Task: Write a Python script to parse this log file and determine:
The count of each HTTP status code (e.g., 200, 404, 304, 201).
The top 3 most requested resource paths (the part between the HTTP method and the HTTP version, e.g.,
/index.html
,/styles/main.css
). Ignore query parameters if present.
Output: Print the status code counts and the top 3 requested paths with their counts.
Example Output:
Status Code Counts:
200: 3
404: 1
201: 1
304: 1
Top 3 Requested Paths:
/index.html: 2
/styles/main.css: 1
/images/logo.png: 1
Skills Tested: File reading, string parsing (potentially using regular expressions (
re
module) for robustness, or careful splitting), data extraction, dictionary usage (collections.Counter
is ideal here), sorting dictionaries by value, handling log format variations (though the sample gives a clear structure).
Other questions
10 Real-World SRE Coding Problems You Won’t Find on LeetCode : Good summary on algorithms you should know
These questions provide a good mix of file handling, data parsing (CSV, TSV, custom, logs), string manipulation, data aggregation, and basic calculations, all within the scope of practical Python programming skills needed for data processing tasks.